CN117950816A - Job scheduling method, device and chip - Google Patents

Job scheduling method, device and chip Download PDF

Info

Publication number
CN117950816A
CN117950816A CN202211338117.1A CN202211338117A CN117950816A CN 117950816 A CN117950816 A CN 117950816A CN 202211338117 A CN202211338117 A CN 202211338117A CN 117950816 A CN117950816 A CN 117950816A
Authority
CN
China
Prior art keywords
resource
job
scheduler
scheduling
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211338117.1A
Other languages
Chinese (zh)
Inventor
申鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211338117.1A priority Critical patent/CN117950816A/en
Priority to PCT/CN2023/101052 priority patent/WO2024087663A1/en
Publication of CN117950816A publication Critical patent/CN117950816A/en
Pending legal-status Critical Current

Links

Abstract

A job scheduling method, device and chip are disclosed, relating to the field of computers. For a heterogeneous computing system (or cluster), a scheduling node sets a scheduler set for the heterogeneous computing system, where the scheduler set includes resource schedulers that can be used to perform multiple types of job scheduling processing, and the same cluster implements a function that supports multiple types of job scheduling processing. When the same cluster acquires the job scheduling command, the unified scheduler in the scheduling node can allocate the matched resource scheduler for the job according to the type of the job indicated by the job scheduling command, and instruct the resource scheduler matched with the type of the job to execute the data processing process of the job, so that the number of the types of the supported processing jobs in the same cluster is increased, and the suitability of the same cluster for different types of jobs is improved.

Description

Job scheduling method, device and chip
Technical Field
The present application relates to the field of computers, and in particular, to a job scheduling method, apparatus, and chip.
Background
A supercomputer refers to a cluster comprising a plurality of servers, which can be used to perform large-scale jobs or computing tasks, etc.; such as the supercomputer may be used to run high-performance computing (High performance computing, HPC) jobs, or artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) jobs, or the like. Typically, a scheduler in a supercomputer allocates different hardware resources for different jobs (e.g., HPC jobs or AI jobs), such as the hardware resources include: computing resources, storage resources, network resources, and the like. However, in the same cluster, only 1 type of scheduler is provided, which can allocate hardware resources for only one type of job, and cannot allocate hardware resources for other types of jobs, resulting in poor suitability of the cluster for other types of jobs. Therefore, how to provide a job scheduling method with high adaptability is a problem to be solved.
Disclosure of Invention
The application provides a job scheduling method, a job scheduling device and a job scheduling chip, which solve the problem of poor suitability caused by that a single type of scheduler in the same cluster in a supercomputer can only process one type of job.
In a first aspect, a job scheduling method is provided, the method is applicable to a heterogeneous computing power system including scheduling nodes, in the scheduling nodes, a two-layer scheduling architecture consisting of a unified scheduler and a scheduler set managed by the unified scheduler is adopted, and at least two resource schedulers simultaneously manage a peer-to-peer architecture of each computing node. The job scheduling method comprises the following steps: first, the unified scheduler acquires a job scheduling command. And secondly, determining a first type resource scheduler matched with the type of the first job from the scheduler set by the unified scheduler according to the type of the first job indicated by the job scheduling command. Finally, the unified scheduler instructs the first class resource scheduler to perform data processing of the first job according to the job scheduling command.
Wherein the job scheduling command is for performing a scheduling process of the first job, the scheduler set comprising a resource scheduler that is capable of performing at least two different types of job scheduling processes.
By way of example, the above data processing may include, but is not limited to: the first class resource scheduler allocates resources (or referred to as compute nodes) managed by the corresponding first class resource scheduler for the first job according to the resources required by the first job.
For a heterogeneous power system (or cluster), a scheduling node sets a scheduler set for the heterogeneous power system, and because the scheduler set includes a resource scheduler that can be used to perform multiple types of job scheduling processing, the function of supporting multiple types of job scheduling processing is implemented in the same cluster. When the same cluster acquires the job scheduling command, the unified scheduler in the scheduling node can allocate the matched resource scheduler for the job according to the type of the job indicated by the job scheduling command, and instruct the resource scheduler matched with the type of the job to execute the data processing process of the job, and the schedulers of different job types are supported in the same cluster, so that the suitability of the same cluster for different types of jobs is improved. In addition, due to the adoption of a two-layer scheduling architecture of the uniform scheduler and the resource scheduler, a scheduling mode of the resource scheduler is not needed, and the uniform resource scheduler distributes the jobs to the resource schedulers of corresponding types, so that the types of schedulers in the same cluster are ensured, the types of the jobs which can be scheduled are further increased, development workload brought by improvement of the original resource scheduler is avoided, the job processing of multiple job types can be supported in the same cluster, and the resource utilization rate in the same cluster and the efficiency of executing data processing by the jobs are improved.
For example, the job scheduling command may be sent by the terminal to the scheduling node.
In one possible implementation, the information carried by the job scheduling command is used to indicate the type of the first job.
The type of the first job is directly reflected by the job scheduling command of the first job, so that the unified scheduler can quickly determine the type of the first job according to the job scheduling command, and the efficiency of determining a matched resource scheduler for the first job by a scheduling node is improved.
Optionally, the first job is of the type HPC job, AI job, big data job, or the like.
In one possible implementation, a permission lock is maintained in the unified scheduler, where the permission lock is used to instruct the first type resource scheduler to schedule a resource managed by the first type resource scheduler, where the managed resource includes at least one of a computing resource, a network resource, and a storage resource. The unified scheduler may update the status of the rights lock according to the situation in the following example.
The status update procedure of the rights lock will be described below by taking LSF as an example of the first type of resource scheduler.
Example one, the process of obtaining the scheduling rights: the unified scheduler sends a resource scheduling command matched with the job scheduling command to the LSF, the LSF requests the scheduling authority of the computing node managed by the LSF to the unified scheduler according to the resource scheduling command, and the unified scheduler updates the state of the authority lock according to the request as follows: the LSF has the right to schedule resources managed by the LSF.
Example two, the release process of the scheduling rights: after the LSF allocates the corresponding resources for the first job, the LSF sends an authority release command to the unified scheduler; and the unified scheduler responds to the permission release command and updates the state of the permission lock as follows: the resource-free scheduler has scheduling rights at this time.
The permission lock can be used for controlling the scheduling permission of a plurality of resource schedulers to the resources in the same cluster, so that one resource scheduler can allocate computing nodes for the jobs at the same time in the same cluster, the suitability of the same cluster to different types of jobs is improved, the problem that the same resources are simultaneously called by the plurality of resource schedulers, the problem that the resource preemption and the execution waiting time of the jobs caused by the resource preemption are increased is solved, and the accuracy of job scheduling and the suitability of the same cluster to the plurality of different types of jobs are improved. The resource scheduler does not need to interact with other resource schedulers, and is beneficial to improving the efficiency of job scheduling.
In one possible implementation, before the unified scheduler instructs the first type resource scheduler to perform data processing of the first job, the job scheduling method further includes: the unified scheduler obtains the state of the resources managed by each resource scheduler in the scheduler set; the unified scheduler determines a resource status table according to the status of the resources managed by each resource scheduler.
By way of example, the above states may include: the operational state of the computing node, and the usage state of at least one of the computing resources, storage resources, or network resources comprised by the computing node. The resource status table may include a current status of the computing node indicating whether the computing node is available or unavailable. The current state of the computing node may be determined based on the operational state.
In this example, the resource status table may be used to indicate the usage situation of resources managed by each resource scheduler in the scheduler set, for example, multiple resource schedulers may determine the usage situation of resources according to the resource status table, and the unified scheduler may realize synchronization of resource statuses among the multiple resource schedulers, so as to avoid situations that different resource schedulers manage the same resource status and further ensure that multiple resource schedulers cannot preempt the same resource, which is beneficial to reducing execution waiting time of the job.
In one possible implementation, after instructing the first scheduler to perform the data processing of the first job, the job scheduling method further includes: when the unified scheduler receives a resource allocation result of the first job sent by the first type of resource scheduler, the unified scheduler updates a resource state table; wherein, the resource allocation result is used for indicating: the first type of resource scheduler allocates callable resources for the first job.
After the dispatching node finishes the allocation of the computing nodes for one job, the resource state table is updated in time, so that the dispatching node is ensured to allocate resources for other jobs according to the latest resource state table, the problem of resource preemption caused by the fact that a resource dispatcher allocates the job to the computing node with the use state that the residual resources are smaller than the resources required by the job is avoided, the rationality of the resource dispatcher for allocating the computing nodes for the job is improved, the waiting time of job execution is reduced, and the efficiency of the dispatching node for allocating the computing resources for the job is improved.
In one possible implementation, the job scheduling method further includes: and indicating other resource schedulers in the scheduler set to synchronize the updated resource state table, and determining resource allocation results for other jobs according to the updated resource state table. The other resource scheduler is any resource scheduler in the scheduler set other than the first type of resource scheduler.
In one possible example, the unified scheduler may send a resource synchronization command to other resource schedulers to instruct the other resource schedulers to synchronize the updated resource status table. And the other resource schedulers allocate resources for other jobs according to the updated resource state table and obtain corresponding resource allocation results. Wherein, the resource synchronization command is used for indicating: and synchronizing the updated resource state table by other resource schedulers.
The unified scheduler synchronizes the updated resource state table to other resource schedulers in the scheduler set, and the unified scheduler realizes synchronization of resource states among the plurality of resource schedulers, so that the problem that the same resource is possibly repeatedly called by the plurality of resource schedulers due to inconsistency of the resource states, resource preemption and the increase of the execution waiting time of the job caused by the resource preemption are solved, and the efficiency of executing data processing for the job by the same cluster is improved.
In a second aspect, there is provided a job scheduling apparatus for use in a scheduling node and adapted for use in a heterogeneous computing system comprising a scheduling node, the job scheduling apparatus comprising respective modules for performing the job scheduling method of the first aspect or any of the alternative designs of the first aspect. By way of example, the job scheduling apparatus includes: the system comprises an acquisition module, a selection module and an indication module, wherein the acquisition module is used for acquiring a job scheduling command; a selection module, configured to select a first type of resource scheduler matching the first job from a scheduler set according to a type of the first job, where the scheduler set includes resource schedulers capable of performing scheduling processing of at least two different types of jobs; and the indication module is used for indicating the first class resource scheduler to execute the data processing of the first job.
Wherein the job scheduling command is used for executing scheduling processing of the first job.
For example, the job scheduling command may be sent by the terminal to the scheduling node. The unified scheduler may send a job scheduling command to the first type resource manager to instruct the first type resource scheduler to perform data processing of the first job.
For more detailed implementation details of the job scheduling device, reference may be made to the description of any implementation manner of the above first aspect, and the following details of the specific implementation manner are not described herein.
In a third aspect, the present application provides a chip comprising: control circuitry and interface circuitry, said interface circuitry to obtain job scheduling commands, said control circuitry to perform the method of any one of the first aspect and the possible implementation of the first aspect in accordance with said job scheduling commands.
In a fourth aspect, the present application provides a scheduling node comprising a processor and a memory; the memory is for storing computer instructions that are executable by the processor to implement the method of the first aspect and any optional implementation of the first aspect.
In a fifth aspect, the present application provides a heterogeneous computing system comprising a scheduling node and a computing node; the scheduling node is configured to allocate a computing node for the first job, such that the scheduling node performs the method of the above-described first aspect and any optional implementation of the first aspect thereof.
In a sixth aspect, the present application provides a computer readable storage medium having stored therein a computer program or instructions which, when executed by a processing device, implement the method of any of the above first aspect and alternative implementations of the first aspect.
In a seventh aspect, the present application provides a computer program product comprising a computer program or instructions which, when executed by a processing device, performs the method of any of the alternative implementations of the first aspect and the first aspect described above.
The advantages of the second to seventh aspects above may be referred to in the first aspect or any implementation manner of the first aspect, and are not described here. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.
Drawings
FIG. 1 is an application scenario diagram of a heterogeneous computing system provided by the present application;
Fig. 2 is a flow chart of an initialization method of a scheduling node according to an embodiment of the present application;
FIG. 3 is a flow chart of a job scheduling method according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for updating a rights lock according to an embodiment of the present application;
FIG. 5 is a second flowchart of a method for updating a rights lock according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a status updating method according to an embodiment of the present application;
fig. 7A is a schematic structural diagram of a job scheduling device according to an embodiment of the present application;
Fig. 7B is a schematic structural diagram II of a job scheduling device according to an embodiment of the present application;
Fig. 8 is a schematic structural diagram of a control node according to an embodiment of the present application.
Detailed Description
The application provides a job scheduling method, which is suitable for a heterogeneous computing power system comprising scheduling nodes, wherein the scheduling nodes adopt a two-layer scheduling architecture consisting of a unified scheduler and a scheduler set managed by the unified scheduler, the scheduler set at least comprises a plurality of different types of resource schedulers, the plurality of types of resource schedulers respectively process data of resources (or called computing nodes) for specific types of jobs, and at least two resource schedulers simultaneously manage the peer-to-peer architecture of each computing node. The above-mentioned specific type of job refers to a type of job that the resource scheduler supports scheduling, and the non-specific type of job refers to a type of job that the resource scheduler does not support scheduling or has low scheduling efficiency.
For a heterogeneous power system, the heterogeneous power system comprises a plurality of clusters, each cluster supports job processing of a plurality of job types, and a scheduling node sets a scheduler set for the heterogeneous power system. When the same cluster acquires the job scheduling command, a unified scheduler in the scheduling node can allocate a matched resource scheduler for the job according to the type of the job indicated by the job scheduling command, and instruct the resource scheduler matched with the type of the job to execute the data processing process of the job, and the same cluster supports the job processing of a plurality of different job types. When the same cluster acquires the job scheduling command, the unified scheduler in the scheduling node can allocate the matched resource scheduler for the job according to the type of the job indicated by the job scheduling command, and instruct the resource scheduler matched with the type of the job to execute the data processing process of the job, and the schedulers of different job types are supported in the same cluster, so that the suitability of the same cluster for different types of jobs is improved. In addition, due to the adoption of a two-layer scheduling architecture of the uniform scheduler and the resource scheduler, a scheduling mode of the resource scheduler is not needed, and the uniform resource scheduler distributes the jobs to the resource schedulers of corresponding types, so that the types of schedulers in the same cluster are ensured, the types of the jobs which can be scheduled are further increased, development workload brought by improvement of the original resource scheduler is avoided, the job processing of multiple job types can be supported in the same cluster, and the resource utilization rate in the same cluster and the efficiency of executing data processing by the jobs are improved.
By way of example, a heterogeneous computing system is a computer cluster comprising a scheduling node and a plurality of computing nodes, the scheduling node being connectable to the computing nodes by wired or wireless means, the scheduling node being configured to allocate computing nodes for a job, the computing nodes providing computing support for the job. The job scheduling command is used to schedule the computing node that matches the first job. The unified scheduler may send a job scheduling command to the first class resource manager to instruct the first class resource scheduler to allocate resources in the heterogeneous computing system for the first job. The unified scheduler may be software running on the scheduling node or a hardware device deployed in the scheduling node.
Next, description will be given of a data monitoring analysis method provided in this embodiment, and first, description will be given of related art.
The resource scheduler refers to scheduling software for distributing various types of resources for the jobs or the applications, and for example, the resource scheduler can realize the functions of computing resource management, job scheduling and the like. In some cases, the resource scheduler refers to a processor or controller or the like that is deployed separately on a server or heterogeneous computing system (e.g., a cluster including multiple classes of processors, etc.); in other cases, the resource scheduler refers to a Virtual Machine (VM) deployed by a server, a container, or other software unit, which is not limited by the present application. Taking the example that the resource scheduler is a software element, the resource scheduler may be provided with an access port through which hardware or other software elements in the server send commands or instructions to the resource scheduler, or the resource scheduler schedules resources provided by the server or the heterogeneous computing system, etc. through the access port.
A scheduler set includes a plurality of different types of resource schedulers, and one or more resource schedulers may be included in each type of resource scheduler.
The High-performance computing cluster (HPC-performance Computing) is a computer capable of executing large data volume and High-speed operations that cannot be handled by a general personal computer.
The container orchestration platform (Kubernetes, k 8S) refers to a system that automates the operation and maintenance of management containers (dockers).
A job scheduling system (Load SHARING FACILITY, LSF) refers to a system for the management of computing resources and batch job scheduling.
The clustered task management system (Sun GRID ENGINE, SGE) is a system for queuing tasks posted by users and then delivering the tasks to the operational computing nodes for execution.
And the Agent (Agent) is deployed on the computing node and is used for communicating with the scheduling node and executing corresponding operation according to the communication content. Agents deployed on compute nodes include K8S agents and HPC agents, among others.
A computing node refers to a system for providing computing power, storage, network and other support for the operation; the computing node may include: central processor (Central Processing Unit, CPU) compute nodes, graphics processor (Graphics processing unit, GPU) compute nodes, and network processor (Neural-Network Processing Units, NPU) compute nodes, among others. The CPU compute nodes are configured with a large number of CPUs. GPU compute nodes are configured with a large number of parallel accelerators such as GPUs, field-Programmable gate arrays (GATE ARRAY, FPGA), application SPECIFIC INTEGRATED Circuits (ASICs), and the like. The Memory in the compute node may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like. The network of communication interface configurations of the computing nodes may be the internet, or other networks (e.g., ethernet). The network may include one or more network devices, e.g., the network devices may be routers or switches, etc.
To avoid that the resource scheduler does not support scheduling non-specific type of jobs when it performs resource scheduling on the non-specific type of jobs. The application adopts a double-layer dispatching framework formed by a unified dispatcher and a dispatcher set, and a peer-to-peer framework for simultaneously managing each computing node by at least two resource dispatchers in the dispatcher set. As shown in fig. 1, fig. 1 is an application scenario diagram of a heterogeneous computing system provided by the application. The heterogeneous computing system 100 may include a scheduling node 110 and n computing nodes 120, n being a positive integer. The terminals 200 and the scheduling nodes 110 and the computing nodes 120 may communicate with each other by wired means, such as ethernet, optical fiber, and various peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) buses disposed inside the heterogeneous computing system 100 for connecting the scheduling nodes 110 and the computing nodes 120; communication may also be via wireless means, such as the internet, wireless communication (WIFI), ultra Wide Band (UWB) technology, and the like. In one possible scenario, the heterogeneous computing system 100 described above may also include a terminal 200.
Illustratively, in the application scenario diagram of the heterogeneous computing system 100 shown in fig. 1, after the processor inputs the job scheduling command by the terminal 200, the terminal 200 sends the job scheduling command to the scheduling node 110, and the unified scheduler 111 in the scheduling node 110 determines, from the scheduler set 112, a first type resource scheduler 1121 matching the type of the first job according to the type of the first job indicated by the job scheduling command. The unified scheduler 111 will instruct the first-type resource scheduler 1121 to perform data processing of the first job according to the job scheduling command, the process of the data processing including: and the first class resource scheduler allocates corresponding computing nodes for the first job according to the resources required by the first job.
To ensure synchronization of the initial states of the computing nodes, the scheduling node will initialize. On the basis of the heterogeneous computing system 100 shown in fig. 1, a possible implementation manner of initializing the scheduling node 110 in the heterogeneous computing system 100 is provided, as shown in fig. 2, fig. 2 is a flow chart of an initialization method of the scheduling node provided in an embodiment of the present application, where a unified scheduler and a scheduler set are running in the scheduling node 110, where the scheduler set includes resource schedulers for performing scheduling processing of multiple types of jobs, such as LSF and K8S, and the resource schedulers included in the unified scheduler and the scheduler set may run on a processor in the scheduling node 110.
Optionally, the scheduling node 110 further comprises a memory. It is noted that fig. 2 is only an example provided by the present application and should not be construed as limiting the present application, and in some cases, more resource schedulers may be included in the scheduler set, and the heterogeneous computing system 100 may also include more or fewer computing nodes.
The content of the schedule node initialization provided in this embodiment includes steps S210 and S220.
S210, the unified scheduler 111 obtains the states of the computing nodes 120 managed by the resource schedulers in the scheduler set of the heterogeneous computing system 100.
In one possible scenario, this state is used to indicate: the operation state of the computing node, and the use states of the computing resources, the storage resources, the network resources and the like included in the computing node.
The computing resource is used for indicating floating point operation times and integer operation times in unit time; such as the number of floating point operations and the number of integer operations of the computing resource per unit time, are used to refer to the description of the remaining processing power. Illustratively, the above-described computing resources may be provided by a processor in the computing node 120, which may be an integrated circuit chip having signal processing capabilities, which may be a general-purpose processor, including CPU, GPU, NPU, FPGA, ASIC, etc., as described above.
The storage resource is used for indicating storage capacity, data read-write speed and the like; such as the data read-write speed of the memory resource, is used to describe the processing power of the memory resource. The storage resources may be provided by the memory in the computing node 120 shown in fig. 1, and for the content of the memory, reference may be made to the description of the memory of the computing node in the related art, which is not described herein.
The network resource is used for indicating the transmission bandwidth, and the transmission bandwidth refers to the maximum data quantity which can be transmitted in unit time; such as transmission bandwidth, is used to describe the processing power of the network resources. The network resources may be provided by a transmission bandwidth provided by the communication interface of the computing node 120, and the description of the communication interface of the computing node in the related art may be referred to for the content of the transmission bandwidth provided by the communication interface, which is not described herein.
The operation state of the computing node can be determined according to the fault condition of the computing node. For example, the fault condition may include, but is not limited to: disruption of communication links between the resource scheduler and the compute nodes, failure of devices (e.g., memory, processor cores, network devices, etc.) in the compute nodes, etc.
In one possible example, if a communication link between a computing node and a resource scheduler is broken, the operational state of the computing node is unavailable.
In another possible example, if some hardware in a computing node fails, the operational state of the computing node remains available, but the available resources are reduced.
For example, a compute node configured with a 128-core CPU can only provide 64-core computing power due to a partial core (non-primary core) failure, where the node is "available" in its operational state, but the available resources are 64 cores.
For the acquisition of the status, the present application gives the following two alternative examples.
In a first alternative example, the unified scheduler 111 sends status acquisition commands to the LSF and the K8S, and the LSF and the K8S acquire the statuses of the computing node 1, the computing node 2, the computing node 3, and the computing node 4, respectively, according to the status acquisition commands.
Illustratively, the unified scheduler 111 may send bhosts a command to the LSF to view the state of all computing nodes; the unified scheduler 111 may send a top command to the K8S to view the status of all compute nodes.
In a second alternative example, LSF and K8S, after acquiring the states of the compute nodes, actively send the states of all compute nodes to the unified scheduler 111.
S220, the unified scheduler 111 generates a resource state table according to the acquired state of the computing node.
The resource state table is used for tracking the use condition of the resources in the cluster, and the unified scheduler timely updates the resource state table according to the acquired state. The resource state table may be stored in a database in communication with the unified scheduler, or the resource state table may be stored in a scheduling node to which the unified scheduler belongs.
The resource state table may include the operational state of the compute node, the current state, and the like. After the unified scheduler obtains the resource state table, the resource state table is synchronized to all the resource schedulers in the scheduler set.
For the current state determination process of a compute node, one possible scenario is given below: the current state is determined according to the operation states of the computing nodes acquired by the resource schedulers, and the current state of the computing nodes is available only when the operation states of the computing nodes acquired by the resource schedulers are all available.
By way of example, the resource status table is shown in table 1 below.
TABLE 1
Wherein node_name represents the name of a computing node, each computing node having a unique number; node_state represents the current state of the compute node, which can be divided into two types: available/unavailable; k8s_state represents the operational state of the computing node acquired by K8S, and the operational state is divided into two types: available/unavailable; lsf_state represents the operational state of a compute node acquired by LSF, and the operational state is divided into two types: available/unavailable.
In one possible example, the resource status table also stores the usage status of at least one of the computing resource, the storage resource, and the network resource corresponding to each computing node, for example, the computing node with the node name of C01n01, and the remaining resources are 10 cores and 30% (500G) of storage space.
The resource state table can be used for indicating the use condition of resources managed by each resource scheduler in the scheduler set, for example, a plurality of resource schedulers can determine the use condition of the resources (such as the states of a plurality of computing nodes shown in the table 1) according to the resource state table, and the uniform scheduler realizes the synchronization of the resource states among the plurality of resource schedulers, so that the situation that different resource schedulers manage the same resource and the state of the same resource is different is avoided, further, the condition that the plurality of resource schedulers cannot preempt the same resource is ensured, and the execution waiting time of the job is reduced.
In another embodiment of the present application, the unified scheduler 111 and the resource schedulers comprised by the scheduler set may be in different scheduling nodes, respectively. If the unified scheduler is located at a first scheduling node, the scheduler set is located at a second scheduling node.
When the resource schedulers included in the unified scheduler 111 and the scheduler set can be respectively located in different scheduling nodes, the unified scheduler 111 configures ports of all the resource schedulers in the scheduler set 112, so that the unified scheduler 111 sends a job scheduling command, a state acquisition command and the like to the resource scheduler based on the ports of the resource schedulers; and all the resource schedulers are also configured with ports of the unified scheduler 111, so that the resource schedulers can send commands, data and the like to the unified scheduler 111 through the ports of the unified scheduler 111.
Illustratively, the unified scheduler 111 maintains IP-ports for all resource schedulers, such as resource scheduler 1, 14.17.32.211:1024, resource scheduler 2, 14.17.32.211:1025. accordingly, each resource scheduler maintains an IP-Port (e.g., 14.17.33.211:1024) of the unified scheduler 111.
After the unified scheduler 111 on the scheduling node is initialized, the scheduling node 110 will schedule resources for the job, such as allocation to the computing node. As shown in fig. 3, fig. 3 is a flow chart of a job scheduling method according to an embodiment of the present application, and the job scheduling method can be applied to the heterogeneous computing system 100 shown in fig. 1. The job scheduling method is performed by the scheduling node 110 shown in fig. 1, and a unified scheduler 111, a scheduler set 112 including a first type resource scheduler 1121, a second type resource scheduler 1122, and the like are disposed in the scheduling node 110.
Each type of resource scheduler may include one or more resource schedulers, e.g., the first type of resource scheduler 1121 includes a resource scheduler 1121A and the resource scheduler 1121B is a K8S scheduler, and e.g., the second type of resource scheduler 1122 includes a resource scheduler 1122A and the resource scheduler 1122B is an LSF scheduler.
The above resource scheduler refers to a system for scheduling resources for jobs deployed in the scheduling node 110, and for the specific description of the resource scheduler, reference may be made to the content of the resource scheduler in the description of the related art.
In a possible example, a scheme is provided in which the scheduling node 110 allocates a computing node to a job, and referring to fig. 3, the job scheduling method provided in this embodiment includes the following steps S310 to S330.
S310, the unified scheduler 111 in the scheduling node 110 acquires the job scheduling command.
The user inputs a job scheduling command for the first job through the terminal 200, and then transmits the job scheduling command to the scheduling node 110. The job scheduling command is used for executing scheduling processing of the first job, and instructs the resource scheduler to schedule the computing node required for the first job. By way of example, a user may enter a job scheduling Command at a Command line interface (Command-LINE INTERFACE, CLI) on terminal 200.
Alternatively, the type of the first job may be: HPC jobs, containerized jobs (e.g., AI jobs), or big data jobs, etc. The first job may also refer to other types of jobs, and the application is not limited thereto.
The format of the job scheduling command may take many different implementations, and the present application gives two possible examples when the scheduler set includes only K8S and LSF.
In a first possible example, the job scheduling command is a native K8S command or LSF command.
Illustratively, native LSF commands, such as bjobs-R/-a/-p, busb-J/-N/-R span, and the like. Native K8S commands, such as Kubectl create/delete/get/run commands, etc.
In a second possible example, the job scheduling command is a command encapsulating a native K8S command or LSF command.
By way of example, the job scheduling commands are shown in Table 2 below.
TABLE 2
The "LSF command" and "K8S command" refer to a native LSF command and a native K8S command, and for the content of the native LSF command and the native K8S command, reference may be made to the description of the K8S command and the native LSF command in the first possible example, which is not repeated herein.
With continued reference to fig. 3, the job scheduling method provided in the present embodiment further includes step S320.
S320, the unified scheduler 111 determines, from the scheduler set 112, a first type resource scheduler 1121 matching the first job according to the type of the first job indicated by the job scheduling command.
The unified scheduler 111 in the scheduling node 110 identifies a type of the first job indicated in the job scheduling command, and the unified scheduler 111 determines a resource scheduler matching the type of the first job from a scheduler set 112 according to the type of the first job, wherein the resource schedulers in the scheduler set 112 are used for scheduling computing nodes in the heterogeneous computing system 100.
For example, when the type of the first job is an AI job, the unified scheduler determines K8S for assigning a computing node to the AI job from LSF and K8S.
Optionally, the unified scheduler 111 determines the type of the first job according to the information carried by the job scheduling command.
The present embodiment provides the following two possible examples for determining the type of the first job from the job scheduling command with respect to the unified scheduler 111.
In a first possible example, the correspondence between the job scheduling command and the type of the first job may be determined according to a set mapping relationship table. The mapping relation table is used for indicating: correspondence between commands and job types. The aforementioned commands may indicate the command header to which the command corresponds.
With respect to the above-described mapping relation table, table 3 below gives one possible example.
TABLE 3 Table 3
Command Job type
kubectl AI operation
busb HPC job
bjobs HPC job
For example, when the job scheduling command is a native "LSF command", for example bsub-n z-q QUEUENAME-iinputfile-o outputfile COMMAND, the unified scheduler 111 may determine that the type of the first job corresponding to the job scheduling command is an HPC job according to busb of the job scheduling command and querying the mapping table.
Where z represents the number of cpus needed to commit the job, -q specifies the queue to which the job is committed, and if the-q option is not employed, the system commits the job to the default job queue. inputfile represents the file name that the program needs to read in (e.g., namelist, etc.), outputfile represents a file into which standard output information will be saved after the job is submitted. COMMAND is a program to be run by the user.
When the job scheduling command is the native "K8S command", for example kubectl run nginx-replicas =3-labels= "app=sample" - -image=nginx: 1.10-port=80, the unified scheduler 111 may determine that the type of the first job corresponding to the job scheduling command is an AI job or a containerized job by querying the mapping relation table according to kubectl of the job scheduling command.
The job scheduling command represents a container instance with the name of nginx, the number of copies being 3, the label being app=sample, the mirror image being nginx 1.10 and the port being 80.
In a second possible example, the unified scheduler 111 may directly determine the type of job corresponding to the job scheduling command according to the type of job, such as HPC or AI, carried in the job scheduling command.
For example, when the job scheduling command is musb-HPC "LSF command" as in Table 2 above, the unified scheduler 111 may determine that the job type of the aforementioned job scheduling command is an HPC job based on musb-HPC of the job scheduling command.
When the job scheduling command is musb-AI "K8S command" in Table 2 above, the unified scheduler 111 can determine that the job type of the aforementioned job scheduling command is AI job based on musb-AI of the job scheduling command.
The above examples are merely examples of the alternatives for determining the type of the first job provided in the present embodiment, and should not be construed as limiting the present application.
The type of the first job is directly reflected by the job scheduling command of the first job, so that the unified scheduler can quickly determine the type of the first job according to the job scheduling command, and the efficiency of determining a matched resource scheduler for the first job by a scheduling node is improved.
With continued reference to fig. 3, the job scheduling method provided in the present embodiment further includes step S330.
S330, the unified scheduler 111 instructs the first-type resource scheduler 1121 to execute data processing of the first job.
In one possible scenario, unified scheduler 111 may instruct first type resource scheduler 1121 to schedule the first job by sending a resource scheduling command to the first type resource scheduler that matches the job scheduling command.
Wherein the resource scheduling command is used for indicating: the first type resource scheduler 1121 allocates compute nodes in the cluster for the first job. The resource scheduling command is intercepted from the job scheduling command.
For example, when the job scheduling command is musb-HPC bsub-n z-q QUEUENAME-i inputfile-ooutputfile COMMAND, the unified scheduler deletes musb-AI of the job scheduling command to obtain the resource scheduling command bsub-n z-q QUEUENAME-i inputfile-o outputfile COMMAND.
The above examples are merely alternative ways to obtain the resource scheduling command provided in the present embodiment, and should not be construed as limiting the present application.
For the first type of resource scheduler 1121 to allocate compute nodes in a cluster for a first job, one possible example is provided by the present application for illustration.
For example, when the LSF receives the resource scheduling command corresponding to the first job as "bsub-n z-q QUEUENAME-i inputfile-o outputfile COMMAND", the LSF allocates the computing node to the first job according to the required cpu number indicated in the resource scheduling command, the information specifying the queue to which the job is submitted, and the resource status table shown in table 1. The computing node operates the first job according to the program to be operated, the data to be read, the address stored by the output data and other information.
The first type resource scheduler 1121, after determining the computing node allocated for the first job, will send information of the first job to the agent on the computing node, such as the HPC agent or the K8S agent, to instruct the computing node to run the first job. The information of the first job may include a program to be run, input data, and the like.
For a heterogeneous computing system (or cluster), a scheduling node sets a scheduler set for the heterogeneous computing system, where the scheduler set includes K8S and LSF, and since the scheduler set includes a resource scheduler that can be used to perform job scheduling processing of multiple types (such as HPC jobs and AI jobs), a function supporting the job scheduling processing of multiple types is implemented in the same cluster.
When the same cluster acquires the job scheduling command, the unified scheduler in the scheduling node can allocate the matched resource scheduler for the job according to the type of the job indicated by the job scheduling command (for example, the unified scheduler allocates the job with the job type of HPC to LSF), and instructs the resource scheduler matched with the type of the job to execute the data processing process of the job, so that the number of the types of the supported processing job in the same cluster is increased, the suitability of the same cluster for different types of jobs is improved, and the unified scheduler allocates the job to the resource scheduler with the corresponding type, thereby being beneficial to improving the efficiency of the same cluster to execute the data processing for the job.
In the above embodiment, the scheduler set 112 includes LSFs or K8S, which schedule HPC jobs or AI jobs respectively, and belongs to two types of resource schedulers, and in another embodiment of the present application, the first type of resource scheduler 1121 may include a plurality of resource schedulers, for example, the first type of resource scheduler 1121 includes a resource scheduler 1121A and a resource scheduler 1121B, where LSFs may be referred to as resource schedulers 1121A and sge may be referred to as resource schedulers 1121B.
In this embodiment, the unified scheduler 111 obtains the job scheduling command and determines the content of the first-class resource scheduler 1121 matching the type of the first job according to the type of the first job indicated by the job scheduling command, and reference may be made to the content of S310 and S320 described above, which will not be described herein.
After determining the first type resource scheduler 1121 matching the type of the first job, the unified scheduler 111 determines job queue conditions of LSF and SGE in the first type resource scheduler, and the unified scheduler 111 sends a resource scheduling command to the resource scheduler with fewer job queues based on the job queue conditions of LSF and SGE.
For example, when the job queue of the LSF is 3 jobs to be allocated, the job queue of the SGE is 5 jobs to be allocated, and the unified scheduler sends a resource scheduling command to the LSF based on the job queue. After receiving the resource scheduling command, the LSF allocates the content of the computing node to the first job, and the content of the resource scheduling command corresponding to the first job may be referred to the LSF, which is not described herein.
In an alternative implementation, the process of instructing the first type resource scheduler to perform data processing of the first job may include:
Taking the LSF as an example of the first type of resource scheduler, the unified scheduler 111 sends a resource scheduling command matching the job scheduling command to the LSF, the LSF requests the unified scheduler 111 for the scheduling authority of the computing node managed by the LSF according to the resource scheduling command, and the LSF allocates resources for the first job according to the scheduling authority and the resource requirement indicated by the resource scheduling command. The computing node comprises at least one of a computing resource, a network resource and a storage resource. For the contents of the computing resource, the network resource, and the storage resource, reference may be made to the description of the computing resource, the network resource, and the storage resource in S210, which are not described herein.
Illustratively, the LSF request scheduling authority may be implemented through command interaction with the unified scheduler 111. On the basis of the relationship between the unified scheduler and the multiple types of resource schedulers in the scheduler set shown in fig. 3, an interaction process between the resource scheduler and the unified scheduler before the job scheduling process is illustrated, as shown in fig. 4, fig. 4 is a schematic flow chart one of the rights lock updating method provided by the embodiment of the application.
S410, the LSF requests the unified scheduler 111 for the scheduling authority of the callable computing node indicated in the foregoing resource status table in response to the resource scheduling command of the unified scheduler 111.
Specifically, the LSF may determine the computing nodes that may be invoked from the resource status table as shown in table 1, from among the computing nodes managed by the LSF. The operational state of the callable computing node is available and has remaining resources.
For example, the LSF may send a first command to the unified scheduler requesting scheduling rights for the compute node 120.
S420, the unified scheduler 111 responds to the LSF request to update the state of the authority lock maintained by the unified scheduler 111.
The unified scheduler 111 updates the status of the rights lock as follows: the LSF has the right to schedule the compute nodes.
In one possible example, the unified scheduler 111 may perform an update of the rights lock state according to the configured scheduling policy. If the LSF has a higher priority than the K8S, when the first command of the LSF and the K8S is received simultaneously, the unified scheduler 111 updates the status of the rights lock to: the LSF has the right to schedule the compute node 120.
S430, the unified scheduler 111 sends a second command to the LSF.
Wherein the second command is to indicate: the LSF is capable of scheduling the compute nodes 120 in the heterogeneous computing system 100 according to the scheduling rights.
The LSF assigns computing nodes to the first job based on the second command, and the LSF assigns computing nodes to the first job, and the content of the computing nodes 120 in the heterogeneous computing system 100 may be assigned to the first job by referring to the example LSF in S330 described above, which is not described herein.
In one possible scenario, the unified scheduler 111 sends a first command to the unified scheduler 111 after updating the status of the rights lock in step S520 as described above. The unified scheduler 111 returns information of call failure to the K8S according to the state of the rights lock. After the K8S receives the call failure information, the K8S periodically sends the first command until obtaining a third command sent by the unified scheduler 111, where the third command is used to indicate: the K8S is capable of scheduling the compute nodes 120 in the heterogeneous computing system 100 according to the scheduling rights.
Notably, the process of the first type of resource scheduler performing data processing of the first job may further include:
Taking LSF as an example of the first type of resource scheduler, as shown in fig. 5, fig. 5 is a flow chart diagram of a rights lock updating method according to an embodiment of the present application.
S510, after the LSF allocates the corresponding resource for the first job, the LSF sends a permission release command to the unified scheduler 111.
The permission release command is used for indicating that the LSF has released the scheduling permission.
S520, the unified scheduler 111 responds to the permission release command and updates the state of the permission lock.
For example, the status of the rights lock may be updated as: the resource-free scheduler has scheduling rights at this time. At this time, when other resource schedulers, such as K8S, send a command of permission request to the unified scheduler 111, the unified scheduler 111 updates the status of the permission lock to: the K8S has the right to schedule the computing nodes.
The permission lock can be used for controlling the scheduling permission of a plurality of resource schedulers to the resources in the same cluster, so that one resource scheduler (only LSF in the example) can allocate computing nodes for the jobs at the same time in the same cluster, the suitability of the same cluster to different types of jobs is improved, the problem that the same resources are simultaneously called by the plurality of resource schedulers, the problem that the resource preemption and the execution waiting time of the jobs caused by the resource preemption are increased is solved, and the accuracy of job scheduling and the suitability of the same cluster to the plurality of different types of jobs are improved. The resource scheduler does not need to interact with other resource schedulers, and is beneficial to improving the efficiency of job scheduling.
After the first-class resource scheduler releases the scheduling authority, the job scheduling method provided in this embodiment further includes the unified scheduler 111 updating the resource status table.
Fig. 6 is a flowchart of a status updating method according to an embodiment of the present application, as shown in fig. 6. Taking the first type resource scheduler as an LSF and the second type resource scheduler as a K8S as an example for explanation.
S610, the LSF transmits the resource allocation result of the first job to the unified scheduler 111.
The resource allocation result is used for indicating the resource which can be called and is allocated by the LSF for the first job.
Illustratively, the LSF assigns the first job to the compute node with the node name C01n01 as in Table 1 above, and assigns the first job 4 cores in C01n01, and 10% of the storage resources.
S620, the unified scheduler 111 updates the resource status table according to the resource allocation result.
For example, the unified scheduler updates the usage state corresponding to the name C01n01 in the resource state table shown in table 1 according to the above-mentioned resource allocation result "4 cores in C01n01 and 10% of storage resources".
After the dispatching node finishes the allocation of the computing node for one job, the use state and the running state of the computing node in the resource state table shown in the table 1 are updated in time, so that the dispatching node is ensured to allocate resources for other jobs according to the latest resource state table, the problem that the resource scheduler allocates the job to the computing node with the use state that the residual resources are smaller than the resources required by the job and the problem of resource preemption is solved, the rationality of the resource scheduler for allocating the computing node for the job is improved, the waiting time of job execution is reduced, and the efficiency of the dispatching node for allocating the computing resources for the job is improved.
For synchronizing the states of the computing nodes managed by all the resource schedulers in the scheduler set, one possible implementation is given in this embodiment: the unified scheduler 111 instructs other resource schedulers in the scheduler set to synchronize the updated resource status table and determine the resource allocation result for other jobs according to the updated resource status table.
Wherein the other resource schedulers are any resource schedulers except the first type of resource scheduler in the scheduler set.
In one possible scenario, the unified scheduler 111 instructs other resource schedulers to synchronize the updated resource status table by sending a resource synchronization command to the other resource schedulers, such as K8S in fig. 6. And the other resource schedulers allocate resources for other jobs according to the updated resource state table and obtain corresponding resource allocation results.
The resource synchronization command is used for indicating: and synchronizing the updated resource state table by other resource schedulers.
In one possible example, the resource synchronization command is provided with a complete updated resource status table, and the other resource schedulers replace the existing stored resource status table after receiving the resource synchronization command.
In another possible example, the resource synchronization command is provided with a partial resource state table, and the other resource schedulers update the existing stored resource state table based on the partial resource state table after receiving the resource synchronization command. The partial resource status table includes only the resource allocation result of the first job.
The above examples are only alternatives of the synchronized resource status table provided by the present embodiment and should not be construed as limiting the application.
Examples are the resource synchronization command with data to be updated, such as a complete resource state table, or a partial resource state table; in other cases of the present application, the unified scheduler 111 may send the resource synchronization command and the data to be updated to other resource schedulers, respectively.
The unified scheduler synchronizes the updated resource state table to other resource schedulers (such as K8S) in the scheduler set, and the unified scheduler realizes synchronization of the resource states among the plurality of resource schedulers, so that the problem that the same resource is possibly repeatedly called by the plurality of resource schedulers due to inconsistency of the resource states, resource preemption occurs, and the execution waiting time of the job caused by the resource preemption is increased, and the efficiency of executing data processing for the job by the same cluster is improved.
The method for scheduling jobs provided according to the present application is described in detail above with reference to fig. 1 to 6, and the job scheduling device provided according to the present application will be described below with reference to fig. 7A, where fig. 7A is a schematic structural diagram of a job scheduling device provided according to the present application. The job scheduling device 700 may be used to implement the functions of scheduling nodes in the above method embodiments, so that the beneficial effects of the above method embodiments may also be implemented.
As shown in fig. 7A, the job scheduling apparatus 700 includes a command acquisition module 710, a selection module 720, and an instruction module 730; the job scheduling device 700 is used to implement the functions of scheduling nodes in the method embodiments corresponding to fig. 2 to 6. In one possible example, the specific process of the job scheduling apparatus 700 for implementing the job scheduling method described above includes the following processes:
An acquisition module 710, configured to acquire a job scheduling command. The terminal 200 receives a job scheduling command of a first job input by a user, and then sends the job scheduling command to the unified scheduler. The job scheduling command is used to schedule resources that match the first job. The job scheduling command is for executing a scheduling process of the first job.
A selection module 720, configured to determine a first type of resource scheduler matching the first job from the scheduler set according to the type of the first job indicated by the job scheduling command. The unified scheduler in the scheduling node identifies the type of the first job indicated in the job scheduling command, and determines a resource scheduler matched with the type of the first job from a scheduler set according to the type of the first job, wherein the resource scheduler in the scheduler set is used for scheduling the computing nodes in the heterogeneous computing system.
An instruction module 730, configured to instruct the first type resource scheduler to perform data processing of the first job. For example, the unified scheduler may instruct the first type resource scheduler to allocate computing nodes in the heterogeneous computing power system for the first job by sending a resource scheduling command to the first type resource scheduler that matches the job scheduling command. Wherein the resource scheduling command is used for indicating: the first type of resource scheduler allocates compute nodes in the cluster for the first job. The resource scheduling command is intercepted from the job scheduling command.
To further achieve the functionality described above in the method embodiments shown in fig. 2-6. The present application also provides a job scheduling device, as shown in fig. 7B, fig. 7B is a schematic structural diagram of a job scheduling device provided by the present application, where the job scheduling device 700 further includes a status updating module 740, a table determining module 750, a table updating module 760, and an indication synchronization module 770.
The state update module 740 is configured to update a state of a permission lock, where the permission lock is configured to instruct a first type resource scheduler to schedule permission for a resource managed by the first type resource scheduler, and the managed resource includes at least one of a computing resource, a network resource, and a storage resource. A table determining module 750, configured to obtain a state of a resource managed by each resource scheduler in the scheduler set; and determining a resource state table according to the state of the resource, wherein the resource state table is used for indicating the use condition of the resource managed by each resource scheduler in the scheduler set. A table updating module 760, configured to update the resource status table when receiving a resource allocation result of the first job sent by the first type resource scheduler; wherein, the resource allocation result is used for indicating: the first type of resource scheduler allocates callable resources for the first job. The indication synchronization module 770 instructs other resource schedulers in the scheduler set to synchronize the updated resource status table and determines a resource allocation result for other jobs based on the updated resource status table.
It should be understood that the scheduling node of the foregoing embodiment may correspond to the job scheduling apparatus 700 and may correspond to the respective bodies corresponding to performing the methods of fig. 2 to 6 according to the embodiments of the present application, and the operations and/or functions of the respective modules in the job scheduling apparatus 700 are respectively for implementing the respective flows of the respective methods of the corresponding embodiments of fig. 2 to 6, which are not repeated herein for brevity.
Illustratively, when the task scheduler 700 is implemented by the aforementioned scheduling node 110, the scheduling node 110 may include one or more hardware, as shown in fig. 8, and fig. 8 is a schematic structural diagram of a scheduling node according to the present application. The dispatch node 800 may be employed in the heterogeneous power system 100 shown in fig. 1.
As shown in fig. 8, the scheduling node 800 may include a processor 810, a memory 820, a communication interface 830, a bus 840, a unified scheduler 850, and the like, with the processor 810, the memory 820, and the communication interface 830 being connected by the bus 840.
Processor 810 is the operational core and control core of scheduling node 800. Processor 810 may be a very large scale integrated circuit. An operating system and other software programs are installed in processor 810, such that processor 810 enables access to memory 820 and various PCIe devices. The processor 810 includes one or more processor cores (cores). The processor core in processor 810 is, for example, a CPU or other application specific integrated circuit ASIC. The processor 810 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processing, DSP), FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. In practice, the scheduling node 800 may also comprise a plurality of processors. The unified scheduler described above may be software executing in the processor 810.
Memory 820 may be used to store computer-executable program code that includes instructions. The processor 810 executes various functional applications of the scheduling node 800 and data processing by executing instructions stored in the internal memory 820. The memory 820 may include a stored program area and a stored data area. The storage program area may store, among other things, an operating system, an application program required for at least one function (e.g., identifying a job scheduling command, sending a function, etc.), and the like. The storage data area may store data created during use of the processing device 800 (e.g., a resource status table), etc. In addition, the internal memory 820 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash memory (universal flash storage, UFS), and the like.
Communication interface 830 is used to enable communication of scheduling node 800 with external devices or means. In this embodiment, the communication interface 830 is used to interact data with the computing node 120 and the terminal 200.
Bus 840 may include a path for transferring information between components (e.g., processor 810, memory 820, communication interface 830). The bus 840 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But for clarity of illustration the various buses are labeled as bus 840 in the drawing. Bus 840 may be a PCIe bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (unified bus, ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. For example, the processor 810 may access these I/O devices over a PCIe bus. The processor 810 is connected to the memory 820 via a Double Data Rate (DDR) bus. Here, different memories 820 may communicate with the processor 810 using different data buses, and thus, the DDR bus may be replaced with other types of data buses, and the present application is not limited by the type of bus.
As one possible embodiment, the present application also provides a unified scheduler, for example, the unified scheduler 111 in the foregoing scheduling node 110, where the unified scheduler 111 may include one or more hardware, and the unified scheduler 850 is deployed on the scheduling node 800 as shown in fig. 8.
The unified scheduler 850 includes a processor 851, the processor 851 including one or more processor cores (cores). The processor 851 may perform the methods shown in fig. 2 to 6 according to the acquired job scheduling command. The unified scheduler 850 may store the resource status table to memory 820 in the scheduler node.
Optionally, the unified scheduler 850 further includes a memory 852, and the processor 851 may store the obtained resource status table to the memory 852.
For more details of the unified scheduler 850, reference may be made to the content of the job scheduling method described above, and details thereof are not described herein.
It should be noted that, in fig. 8, only the scheduling node 800 includes 1 processor 810 and 1 memory 820 as an example, where the processor 810 and the memory 820 are used to indicate a type of device or apparatus, respectively, and in a specific embodiment, the number of each type of device or apparatus may be determined according to service requirements.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; but also optical media such as digital video discs (digital video disc, DVD); but also semiconductor media such as Solid State Drives (SSDs) STATE DRIVE.
While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (11)

1. A method of job scheduling, the method comprising:
Acquiring a job scheduling command, wherein the job scheduling command is used for executing scheduling processing of a first job;
Selecting a first type of resource scheduler matched with the first job from a scheduler set according to the type of the first job, wherein the scheduler set comprises resource schedulers capable of executing scheduling processing of at least two different types of jobs;
and instructing the first type resource scheduler to execute the data processing of the first job.
2. The method according to claim 1, wherein the method further comprises:
Updating the state of a permission lock, wherein the permission lock is used for indicating the scheduling permission of the first-class resource scheduler to the resource managed by the first-class resource scheduler, and the managed resource comprises at least one of a computing resource, a network resource and a storage resource.
3. The method according to claim 1 or 2, wherein before the instructing the first-type resource scheduler to perform data processing of the first job, the method further comprises:
acquiring the state of resources managed by each resource scheduler in the scheduler set;
and determining a resource state table according to the state of the resource, wherein the resource state table is used for indicating the use condition of the resource managed by each resource scheduler in the scheduler set.
4. A method according to claim 3, wherein after the instructing the first type resource scheduler to perform data processing of the first job, the method further comprises:
When a resource allocation result of the first job sent by the first type resource scheduler is received, updating the resource state table;
Wherein, the resource allocation result is used for indicating: the first class resource scheduler allocates callable resources for the first job.
5. The method according to claim 4, wherein the method further comprises:
And indicating other resource schedulers in the scheduler set to synchronously update the resource state table, and determining resource allocation results for other jobs according to the updated resource state table.
6. A job scheduling device, the device comprising:
the acquisition module is used for acquiring a job scheduling command, wherein the job scheduling command is used for executing scheduling processing of a first job;
a selection module, configured to select a first type of resource scheduler matching the first job from a scheduler set according to the type of the first job, where the scheduler set includes resource schedulers that can execute scheduling processing of at least two different types of jobs;
And the indication module is used for indicating the first type resource scheduler to execute the data processing of the first job.
7. The apparatus of claim 6, wherein the apparatus further comprises:
The system comprises a state updating module, a first type resource scheduler and a second type resource scheduler, wherein the state updating module is used for indicating the state of an updating permission lock, the permission lock is used for indicating the scheduling permission of the first type resource scheduler to resources managed by the first type resource scheduler, and the managed resources comprise at least one of computing resources, network resources and storage resources.
8. The apparatus according to claim 6 or 7, characterized in that the apparatus further comprises:
A table determining module, configured to obtain a state of resources managed by each resource scheduler in the scheduler set; and determining a resource state table according to the state of the resource, wherein the resource state table is used for indicating the use condition of the resource managed by each resource scheduler in the scheduler set.
9. The apparatus of claim 8, wherein the apparatus comprises:
A table updating module, configured to update the resource status table when a resource allocation result of the first job sent by the first type resource scheduler is received; wherein, the resource allocation result is used for indicating: the first class resource scheduler allocates callable resources for the first job.
10. The apparatus according to claim 9, characterized in that the apparatus comprises:
And the indication synchronization module is used for indicating other resource schedulers in the scheduler set to synchronize the updated resource state table and determining resource allocation results for other jobs according to the updated resource state table.
11. A chip comprising control circuitry and interface circuitry, the interface circuitry to obtain a job scheduling command, the control circuitry to perform the method of any one of claims 1 to 5 in accordance with the job scheduling command.
CN202211338117.1A 2022-10-28 2022-10-28 Job scheduling method, device and chip Pending CN117950816A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211338117.1A CN117950816A (en) 2022-10-28 2022-10-28 Job scheduling method, device and chip
PCT/CN2023/101052 WO2024087663A1 (en) 2022-10-28 2023-06-19 Job scheduling method and apparatus, and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211338117.1A CN117950816A (en) 2022-10-28 2022-10-28 Job scheduling method, device and chip

Publications (1)

Publication Number Publication Date
CN117950816A true CN117950816A (en) 2024-04-30

Family

ID=90800635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211338117.1A Pending CN117950816A (en) 2022-10-28 2022-10-28 Job scheduling method, device and chip

Country Status (2)

Country Link
CN (1) CN117950816A (en)
WO (1) WO2024087663A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9229774B1 (en) * 2012-07-13 2016-01-05 Google Inc. Systems and methods for performing scheduling for a cluster
CN103744734B (en) * 2013-12-24 2017-09-26 中国科学院深圳先进技术研究院 A kind of Mission Operations processing method, apparatus and system
WO2017018978A1 (en) * 2015-07-24 2017-02-02 Hewlett Packard Enterprise Development Lp Scheduling jobs in a computing cluster
US10686728B2 (en) * 2017-07-06 2020-06-16 Huawei Technologies Co., Ltd. Systems and methods for allocating computing resources in distributed computing
CN113918270A (en) * 2020-07-08 2022-01-11 电科云(北京)科技有限公司 Cloud resource scheduling method and system based on Kubernetes

Also Published As

Publication number Publication date
WO2024087663A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
US10467725B2 (en) Managing access to a resource pool of graphics processing units under fine grain control
US20200081745A1 (en) System and method for reducing cold start latency of serverless functions
US9934073B2 (en) Extension of resource constraints for service-defined containers
CN108293041B (en) Distributed system, resource container allocation method, resource manager and application controller
US11625274B1 (en) Hyper-convergence with scheduler extensions for software-defined container storage solutions
CN113037538B (en) System and method for local scheduling of low-delay nodes in distributed resource management
US20200250006A1 (en) Container management
CN109564528B (en) System and method for computing resource allocation in distributed computing
US11334372B2 (en) Distributed job manager for stateful microservices
US20230127141A1 (en) Microservice scheduling
WO2020234792A1 (en) An fpga-as-a-service system for accelerated serverless computing
WO2020108337A1 (en) Cpu resource scheduling method and electronic equipment
CN112039963B (en) Processor binding method and device, computer equipment and storage medium
US11182189B2 (en) Resource optimization for virtualization environments
US11630834B2 (en) Label-based data representation I/O process and system
CN117950816A (en) Job scheduling method, device and chip
CN110399206B (en) IDC virtualization scheduling energy-saving system based on cloud computing environment
Thiyyakat et al. Megha: Decentralized federated scheduling for data center workloads
CN113608861A (en) Software load computing resource virtualization distribution method and device
CN113515356A (en) Lightweight distributed resource management and task scheduler and method
US20240160487A1 (en) Flexible gpu resource scheduling method in large-scale container operation environment
US11915041B1 (en) Method and system for sequencing artificial intelligence (AI) jobs for execution at AI accelerators
US20230236897A1 (en) On-demand clusters in container computing environment
US11126452B2 (en) Performance modeling for virtualization environments
Guan et al. CIVSched: Communication-aware inter-VM scheduling in virtual machine monitor based on the process

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination