WO2022166480A1

WO2022166480A1 - Task scheduling method, apparatus and system

Info

Publication number: WO2022166480A1
Application number: PCT/CN2021/142532
Authority: WO
Inventors: 苏磊; 孙宏伟; 贺波
Original assignee: 华为技术有限公司
Priority date: 2021-02-07
Filing date: 2021-12-29
Publication date: 2022-08-11
Also published as: CN114911586A

Abstract

The present application provides a task scheduling method, apparatus and system, relating to the technical field of computers. In the solution provided by the present application, a target computing node can obtain the intermediate representation and runtime plugin of a target task. Because the intermediate representation is a code independent of the chip architecture of a processor, the target computing node can compile the intermediate representation into an executable code of a target chip architecture by means of the runtime plugin, and run the executable code in the processor of the target chip architecture. Accordingly, when scheduling the target task, the scheduler in a heterogeneous cluster will not be limited by the architecture of the compiled executable code in the target task, but can flexibly determine, on the basis of the resource usage of each computing node in the heterogeneous cluster, the computing node for executing the target task. Thus, it can be ensured that the loads of all computing nodes are relatively balanced, thereby effectively increasing the resource utilization rate of the heterogeneous cluster.

Description

Task scheduling method, device and system

technical field

The present application relates to the field of computer technology, and in particular, to a task scheduling method, device and system.

Background technique

With the rapid development of chip technology, the types of chip architectures (also called processor architectures) are becoming more and more abundant. For example, common processors of different chip architectures include: a central processing unit (CPU) that supports general-purpose computing, a graphics processor (graphics processing unit, GPU) that supports image rendering and high-performance computing, and a neural network that supports neural networks. Computational neural network processor (neural-network processing unit, NPU), etc. Among them, the chip architecture of the CPU can be further divided into the X86 architecture and the advanced RISC machine (AMR) architecture.

A heterogeneous cluster refers to a cluster composed of computing nodes with different chip architectures. For example, the processors of some computing nodes in a heterogeneous cluster are CPUs, and the processors of some computing nodes are GPUs or NPUs. Since the processor in the computing node can only run the executable code of the same chip architecture type, the scheduler in the heterogeneous cluster needs a task-based executable code architecture to schedule the task to the processor when scheduling tasks. A compute node whose chip architecture matches the architecture of the executable code.

However, since the architecture of the executable code used by a large number of tasks received by the heterogeneous cluster may be unbalanced, the above task scheduling method may lead to unbalanced load of each computing node in the heterogeneous cluster, and the heterogeneous cluster resource utilization is low.

SUMMARY OF THE INVENTION

The present application provides a resource scheduling method, device and system, which can solve the technical problem of low resource utilization of heterogeneous clusters. The technical solutions are as follows:

In one aspect, a task scheduling method is provided, which is applied to a target computing node in a heterogeneous cluster, where the heterogeneous cluster includes a scheduler and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures , the target computing node belongs to the plurality of computing nodes; the method includes: receiving the scheduling instruction for the target task sent by the scheduler, and obtaining the intermediate representation of the target task and the runtime plug-in of the target task, based on the scheduling instruction , compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in, and run the executable code in the processor of the target chip architecture through the runtime plugin; wherein, the intermediate representation is the target chip architecture The source code of the task is compiled to obtain the code independent of the chip architecture, and the target computing node includes the processor of the target chip architecture.

Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, it can ensure that the load of each computing node is relatively balanced, and effectively improve the resource utilization of heterogeneous clusters.

Optionally, the process of acquiring the intermediate representation of the target task and the runtime plugin by the target computing node may include: based on the scheduling instruction, acquiring the intermediate representation of the target task and the runtime plugin from the file manager of the heterogeneous cluster; Alternatively, receive the intermediate representation of the target task and the runtime plugin sent by the scheduler.

Since the data volume of the intermediate representation and the runtime plugin of the target task is relatively large, the intermediate representation and the runtime plugin can be stored by the file manager in the heterogeneous cluster, thereby reducing the storage performance requirements of the scheduler . Also, since the scheduler does not need to forward intermediate representations and runtime plugins, it avoids any impact on its scheduling performance.

Alternatively, the intermediate representation and the runtime plug-in can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.

Optionally, the method may further include: receiving the architecture identifier of the target chip architecture sent by the scheduler; correspondingly, the process of compiling the intermediate representation into executable code of the target chip architecture through the runtime plug-in may include: Based on the architecture identifier of the target chip architecture sent by the scheduler, the intermediate representation is compiled into executable code of the target chip architecture through the runtime plug-in.

Since the target computing node may include processors of multiple different chip architectures, the scheduler may also send the architecture identifier of the target chip architecture to the target computing node, so that the target computing node can determine the executable that the intermediate representation needs to be compiled into The architecture of the code.

Optionally, the method may further include: acquiring input data of the target task; the process of running the executable code in the processor of the target chip architecture through the runtime plug-in may include: using the runtime plug-in to input the input data The data is used as the input of the executable code, the executable code is executed in the processor of the target chip architecture, and the execution result of the executable code is obtained; the method may further include: sending the execution result to the scheduler.

The scheduler can then send the running result to the host providing the target task, so that the host can perform subsequent processing on the running result. For example, the host can perform reduction processing on the running results provided by multiple computing nodes.

Optionally, the process of acquiring the input data of the target task by the target computing node may include: based on the scheduling instruction, acquiring the input data of the target task from the file manager of the heterogeneous cluster; or, receiving a message sent by the scheduler The input data of the target task.

Since the data volume of the input data is relatively large, the input data can be stored by the file manager, thereby reducing the storage performance requirements of the scheduler. Alternatively, the input data can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.

In another aspect, a task scheduling method is provided, which is applied to a scheduler in a heterogeneous cluster, where the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the The method includes: receiving scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task; and determining a target from a plurality of computing nodes based on the scheduling requirement information A computing node, the amount of idle resources of the processor of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures; sending scheduling for the target task to the target computing node instruction, the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor. Executable code, where the intermediate representation is the code independent of the chip architecture obtained by compiling the source code of the target task.

Optionally, the scheduling requirement information may further include: priorities of the at least two chip architectures; correspondingly, based on the scheduling requirement information, the scheduler determines the process of the target computing node from the plurality of computing nodes may include: According to the order of priority of the at least two chip architectures from high to low, sequentially detect whether the amount of idle resources of the processors of the corresponding chip architectures in the plurality of computing nodes meets the resource requirements; if the processor of the target chip architecture is detected If the amount of idle resources satisfies the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.

Since processors of different chip architectures are good at processing different types of tasks, the priorities of the at least two chip architectures can be defined in the scheduling requirement information, and the chip architecture with a higher priority is more suitable for processing the target task . Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can effectively ensure the execution efficiency of the target task.

Optionally, the method may further include: sending the architecture identifier of the target chip architecture to the target computing node.

Optionally, the method may further include: receiving the intermediate representation of the target task and the runtime plug-in of the target task, and sending the intermediate representation and the runtime plug-in to the target computing node.

Optionally, the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks; The process of determining the target computing node may include: if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, then based on the sum of the resource requirements of the multiple parallel tasks, determining the target computing node from the multiple computing nodes, The sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks; if the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, based on the target task The target computing node is determined from the multiple computing nodes; the synchronous parallel mode means that multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that multiple parallel tasks do not need to be executed synchronously.

In the solution provided by the present application, the scheduler may determine the target computing node in different ways based on the parallel scheduling modes of multiple parallel tasks to ensure that the multiple parallel tasks can be reliably executed according to the required scheduling mode.

In yet another aspect, a task scheduling method is provided, which can be applied to a host, and the method includes: compiling a source code of a target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, the intermediate Represented as code independent of the chip architecture; send the intermediate representation and the runtime plug-in; send the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, the scheduling requirement information includes the resource requirements of the target task, and the At least two chip architectures supported by the target task; wherein the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to The target task is scheduled to a target computing node of the at least two computing nodes, the idle resources of processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code of the target chip architecture.

Optionally, the process of sending the intermediate representation and the runtime plugin may include: sending the intermediate representation and the runtime plugin to the scheduler; or, sending the intermediate representation and the runtime plugin to a file manager in the heterogeneous cluster Runtime plugin.

In another aspect, a target computing node is provided, which is applied to a heterogeneous cluster. The heterogeneous cluster includes a scheduler and multiple computing nodes. At least two computing nodes in the multiple computing nodes have different chip architectures. The node belongs to the plurality of computing nodes; the target computing node includes a processor of the target chip architecture; the target computing node further includes at least one module, the at least one module is used to implement the task scheduling method applied to the target computing node provided by the above aspects .

In another aspect, a scheduler is provided, applied to a heterogeneous cluster, the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the scheduler includes at least one A module, where the at least one module is used to implement the task scheduling method applied to the scheduler provided by the above aspects.

In another aspect, a host is provided, the host includes at least one module, and the at least one module is used to implement the task scheduling method applied to the host provided by the above aspects.

In yet another aspect, a computer device is provided, the computer device comprising: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the above aspect when the processor executes the computer program. The task scheduling method applied to the target computing node, or the task scheduling method applied to the scheduler provided by the above aspect, or the task scheduling method applied to the host provided by the above aspect.

In yet another aspect, a computer-readable storage medium is provided, where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node provided in the above aspect, or, to achieve The task scheduling method applied to the scheduler provided by the above aspect, or, the task scheduling method applied to the host provided by the above aspect is implemented.

In another aspect, a computer program product is provided, which, when running on a computer, can cause the computer to execute the task scheduling method applied to the target computing node provided in the above aspect, or execute the application provided in the above aspect. The task scheduling method of the scheduler, or alternatively, execute the task scheduling method applied to the host provided by the above aspects.

In another aspect, a task scheduling system is provided, the task scheduling system includes: the host provided in the above aspect, the scheduler provided in the above aspect, and multiple computing nodes; at least one computing node in the multiple computing nodes is A target computing node as provided in the above aspect.

In another aspect, a task scheduling system is provided, the task scheduling system includes: a host computer, a scheduler, and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures;

The host is used for compiling the source code of the target task, obtaining the intermediate representation of the target task and the runtime plug-in of the target task, sending the intermediate representation and the runtime plug-in, and sending the information of the target task to the programmer Scheduling requirement information, wherein the intermediate representation is code independent of chip architecture, and the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;

The scheduler is configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, and send a scheduling instruction for the target task to the target computing node, wherein the target computing node in the target chip architecture The amount of idle resources of the processor meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures;

The target computing node is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code on the processor of the target chip architecture through the runtime plug-in code.

The solution provided by this application has at least the following beneficial effects:

The present application provides a task scheduling method, device and system, and a target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.

Description of drawings

1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application;

2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application;

3 is a flowchart of a task scheduling method provided by an embodiment of the present application;

4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application;

5 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application;

6 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application;

7 is a schematic diagram of a compilation process provided by an embodiment of the present application;

8 is a schematic diagram of a task scheduling process provided by an embodiment of the present application;

9 is a flowchart of a method for determining a target computing node provided by an embodiment of the present application;

10 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application;

11 is a schematic structural diagram of a target computing node provided by an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a scheduler provided by an embodiment of the present application;

13 is a schematic structural diagram of a host provided by an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

Detailed ways

The task scheduling method, device, and system provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application. As shown in FIG. 1 , the application scenario includes a heterogeneous cluster, and the heterogeneous cluster includes: a management node 01 and a plurality of computing nodes 02 connected to the management node 01 . At least two computing nodes 02 of the plurality of computing nodes 02 adopt different chip architectures of processors. For example, among the plurality of computing nodes 02, some computing nodes 02 use a CPU, some computing nodes 02 use a GPU, and other computing nodes 02 use a NPU.

Referring to FIG. 1 , a scheduler 011 is deployed in the management node 01 , and the application scenario may further include a host 03 . The host 03 may send the target task to be scheduled to the scheduler 011, and the scheduler 011 may then schedule the target task to at least one computing node 02 for execution.

FIG. 2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application. As shown in FIG. 2 , an acceleration library 031 is deployed in the host 03 , and the acceleration library 031 is a software collection for optimizing the performance of the processor. The acceleration library 031 can be used to send the target task to be scheduled to the scheduler 011 .

It can be understood that any computing node 02 in the heterogeneous cluster can also serve as a host to send tasks to be scheduled to the scheduler 011 . Correspondingly, the application scenario may also not include the host 03 independent of the heterogeneous cluster.

It can also be understood that the management node 01 may also have the function of a computing node, that is, the management node 01 can not only schedule tasks, but also execute tasks.

In the related art, the scheduler in the heterogeneous cluster needs to record the type of chip architecture of each computing node in advance. When the host submits a task to the scheduler, it also needs to mark the architecture of the executable code used by the task in the submitted task. After receiving the task, the scheduler can schedule the task to be executed in a computing node matching the chip architecture according to the architecture of the executable code marked in the task. However, if among a large number of tasks submitted by the host, the architecture of the executable code adopted by each task is unbalanced, based on the above task scheduling method, the load of each computing node in the heterogeneous cluster will be unbalanced.

For example, assuming that the architecture of the executable code in the task submitted by the host is the X86 architecture, the task can only be scheduled to run on a computing node whose processor architecture is X86. If the computing node whose processor architecture is X86 in the heterogeneous cluster has no available idle resources, but the computing node whose processor architecture is ARM has idle resources. In this scenario, the idle resources of the heterogeneous cluster cannot be used to process the task, resulting in low resource utilization.

Alternatively, a host-submitted task may include multiple executables of different architectures. After receiving the task, the scheduler can determine a target computing node for executing the task according to the load of each computing node 02, and schedule the task to the target computing node. Since the task includes executable codes of multiple different architectures, the target computing node can execute executable codes whose architecture is the same as the chip architecture of its processor. However, this scheduling method requires the host to implement executable codes of various architectures, resulting in high cost.

The embodiment of the present application provides a distributed middleware for implementing adaptive task scheduling in a heterogeneous cluster. The adaptive task scheduling refers to scheduling tasks that are adaptive based on resource usage of heterogeneous clusters. Correspondingly, the heterogeneous cluster may also be referred to as an adaptive cluster, and the distributed middleware may also be referred to as adaptive middleware. 1 and 2 , the distributed middleware may include: a middleware programming interface 032 deployed in the host 03 , a scheduler 011 deployed in the management node 01 , and a cluster agent 021 deployed in the computing node 02 .

The middleware programming interface 032 is used to provide the acceleration library 031 with the ability to access heterogeneous clusters, that is, the acceleration library 031 can exchange data with components in the heterogeneous cluster by calling the middleware programming interface 032 . For example, the acceleration library 031 may send the scheduling requirement information of the target task, the intermediate representation of the target task, and the runtime plug-in of the target task to the scheduler 011 through the middleware programming interface 032 .

The intermediate representation may also be called intermediate language or intermediate code, which is an equivalent internal representation code of the source code. Moreover, the intermediate representation is independent of the chip architecture of the processor, that is, the intermediate representation can be compiled into executable codes (also referred to as object codes) of different architectures.

Runtime is the runtime environment of a programming language, which is a virtual environment that can provide software services for running programs. The runtime plugin refers to a component capable of providing the runtime environment of the intermediate representation. Since the runtime plug-in provided by the embodiment of the present application supports the application to run in a heterogeneous device environment, it may also be called a heterogeneous runtime plug-in. The runtime plugin can provide a plugin interface for being called by the cluster agent 021 in the distributed middleware so that the cluster agent 021 can initialize the runtime plugin, deinitialize the runtime plugin, run the runtime plugin, and exit Clean up the runtime plugin.

The scheduler 011 is used to schedule tasks based on the usage of heterogeneous resources in the heterogeneous group. As shown in FIG. 2 , the scheduler 011 mainly includes a task management and scheduler 0111 and a resource management and scheduler 0112 . The resource management and scheduler 0112 is used to manage and schedule the resources of each computing node 02 in the heterogeneous cluster, and the resources include at least processor resources, and may also include memory resources and the like. The task management and scheduler 0111 is configured to send a resource scheduling request to the resource management and scheduler 0112 based on the resource requirement of the target task sent by the acceleration library 031 . The resource manager and scheduler 0112 can allocate resources to target tasks based on the resource scheduling request. Assuming that the resource management and scheduler 0112 is based on the resource usage of each computing node 02, and the resources allocated for the target task include the processor resources of the target chip architecture in the target computing node 02, the task management and scheduler 0111 can be based on the allocated resources. The resource distributes the target task to the target computing node 02 .

The cluster agent 021 is mainly used to start task service instances and manage runtime plug-ins. As shown in FIG. 2 , the cluster agent 021 includes a resource-level agent 0211 and a task-level agent 0212 . Among them, the resource layer agent 0211 is used to collect the resource information of the computing node 02 and report it to the resource management and scheduler 0112, so that the computing node 02 joins the heterogeneous cluster. The task layer agent 0212 is used to start a task service instance based on the resources provided by the resource layer agent 0211, and the task service instance runs a runtime plug-in of the target task, or it can be understood that the task service instance includes a runtime plug-in instance. The task layer proxy 022 can also be used to send the intermediate representation of the target task and the target chip architecture determined by the resource manager and scheduler 0112 to the runtime plug-in. The runtime plug-in can then compile the intermediate representation into executable code of the target chip architecture, and run the executable code in the processor of the target chip architecture, thereby realizing the running of the target task.

Since the intermediate representation of the target task provided by the acceleration library 031 is code independent of the chip architecture, the scheduler 011 in the management node 01 can schedule the target task based on the resource usage of each computing node 02 without considering the target task's The architecture of the executable code. In this way, the resource utilization rate of the heterogeneous cluster can be effectively improved without the need for the host 03 to provide multiple executable codes of different architectures. In addition, since the scheduler 011 does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, it can effectively reduce the complexity of resource management and scheduling, thereby improving the efficiency of task scheduling.

It can be understood that, in the embodiment of the present application, acceleration libraries in different fields can be combined with the above-mentioned distributed middleware to realize adaptive scheduling of target tasks in heterogeneous clusters.

The embodiment of the present application provides a task scheduling method, and the method can be applied to the application scenarios provided by the foregoing embodiments. Referring to FIG. 3 , the task scheduling method provided by the embodiment of the present application includes:

Step 101: The host compiles the source code of the target task to obtain an intermediate representation and a runtime plug-in of the target task.

In the embodiment of the present application, as shown in FIG. 1 and FIG. 2 , the acceleration library 031 in the host 03 can compile the source code of the target task to obtain the intermediate representation and runtime plug-in of the target task.

FIG. 4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application. As shown in FIG. 4 , the acceleration library 031 may be an acceleration library in different fields. Exemplarily, the acceleration library 031 may be a parallel programming acceleration library, such as an open multi-processing (open multi-processing, OpenMP) acceleration library. Alternatively, the acceleration library 031 may also be other types of acceleration libraries, such as a numerical calculation acceleration library, a graph calculation acceleration library, a data frame (data frame) acceleration library, or a machine learning acceleration library. Optionally, the numerical computation acceleration library may include a numerical Python (numerical Python, NumPy) acceleration library, the data frame acceleration library may be a pandas acceleration library, and the machine learning acceleration library may include a Scikit-learn (Scikit-learn) acceleration library. Among them, pandas is a data analysis package for Python.

Referring to FIG. 4 , the target task can be a task of applications in different fields such as computer vision (computer vision, CV) applications, natural language processing (natural language processing, NLP) applications, or machine learning prediction applications. The acceleration library 031 can compile the source code of the target task into an intermediate representation independent of the chip architecture through a compiler, and obtain a runtime plug-in related to the compiler. Wherein, the intermediate representation may be a standard portable intermediate representation (standard portable intermediate representation-V, SPIR-V) or a web component (WebAssembly, WASM) intermediate representation, or the like. The runtime plugin may be a tensor virtual machine (tensor virtual machine, TVM) runtime plugin, a SPIR-V runtime plugin, or a WASM runtime plugin, or the like.

Continuing to refer to FIG. 4 , the programming framework of the compiler adopted by the acceleration library 031 may include any one of the following frameworks: Python, Java, Go, numerical domain specified language (DSL), table structure DSL, Distributed parallel DSL and C++ heterogeneous programming framework, etc. Among them, Python, Java and Go are the names of computer programming languages.

Optionally, as shown in FIG. 5 , the source code of the target task may be a code fragment in a certain program, and the code fragment may also be referred to as a code fragment to be accelerated, or an accelerated kernel (kernel) code fragment. The developer can mark the code fragment in advance by using the device guide, and the acceleration library 031 can compile the marked code fragment to obtain the intermediate representation of the target task and the runtime plug-in.

For example, assuming that the acceleration library is an OpenMP acceleration library, the program running in the OpenMP acceleration library for implementing the matrix multiplication (matmul) operation is as follows:

In the above program, float represents a floating-point data type, int represents an integer type, A and B represent two input matrices, and C represents an output matrix, that is, matrix C is equal to the product of matrix A and matrix B. "pragma omp parallel for" is a directive in OpenMP, indicating that the following for loop will be executed by multiple threads.

In the embodiment of the present application, if the data volume of the two input matrices A and B is relatively large, in order to improve the calculation efficiency of the above-mentioned matrix multiplication operation, the for in the above-mentioned program may be executed in parallel by the computing nodes in the heterogeneous cluster cycle. That is, the code fragment of the for loop can be unloaded to a heterogeneous cluster for execution, and correspondingly, the source code of the target task is the for loop in the program.

For example, the developer can add a device directive to the above-mentioned matrix multiplication program to mark the for loop. The program with the device directive added is:

In the above program, "pragma omp target device(ADAPTIVE_CLUSTER)" is the added device guideline, which means to unload the subsequent code fragment from the host to the target device (target device) for execution. In this embodiment of the present application, the target device is a heterogeneous cluster. "ADAPTIVE_CLUSTER" is the name of the target device defined in this embodiment of the application. In the process of compiling the above program, the OpenMP acceleration library can compile the code fragment (for loop) marked by the device directive into an intermediate representation. Then, in the process of running the executable code of the above program, when the OpenMP acceleration library detects that the intermediate representation of the code fragment marked by the device guideline is to be executed, the intermediate representation can be unloaded to the heterogeneous cluster for execution.

Optionally, as shown in FIG. 6 , the acceleration library 031 may obtain a fat binary (fat binary) file by compiling the source code of the target task through a low-level virtual machine (low level virtual machine, LLVM). The fat binary file contains host code (such as the main function) and an intermediate representation independent of the chip architecture. The file format of the fat binary file may be an executable and linkable format (executable and linkable format, ELF).

FIG. 7 is a schematic diagram of a programming framework provided by an embodiment of the present application. As shown in FIG. 7 , the acceleration library 031 can use a DSL compiler to compile the source code of the target task. The compilation process may include steps such as algorithm abstraction, computation graph optimization, data graph optimization, communication graph optimization, and abstract syntax tree generation. The above steps can be automatically scheduled by the acceleration library 031, or scheduled by the user.

Step 102: The host sends the intermediate representation of the target task, the runtime plug-in, and the input data of the target task to the file manager in the heterogeneous cluster.

As shown in FIG. 5 and FIG. 8 , the heterogeneous cluster may further include a file manager 04, and the acceleration library 031 in the host 01 can send the middleware of the target task to the file manager 04 by calling the middleware programming interface 032 representation, the runtime plugin, and the input data for the target task. For example, assuming that the source code of the target task is a for loop in a matrix multiplication operation, the input data may include an input matrix A and an input matrix B.

Optionally, referring to FIG. 5 , the heterogeneous cluster may further include a gateway 05 , and the gateway 05 is respectively connected with the scheduler 011 and the file manager 04 . As shown in step S1 in FIG. 5 , the acceleration library 031 can send the intermediate representation of the target task and the runtime plug-in to the gateway 05 by calling the software development kit (SDK) interface provided by the gateway 05 and input data. The gateway 05 can in turn forward the received data to the file manager 04 . Among them, the main component of the SDK interface is the middleware programming interface 032 .

It can be understood that the file manager 04 may include one or more storage devices with file storage functions. Each computing node 02 in the heterogeneous cluster has established a communication connection with the file manager 04 and can obtain data from the file manager 04 .

Step 103: The host sends the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster.

The acceleration library 031 in the host 03 can send the scheduling requirement information of the target task to the scheduler 011 in the management node 01 by calling the middleware programming interface 032 . For example, referring to FIG. 8 , the acceleration library 031 may send scheduling requirement information to the task management and scheduler 011 in the scheduler 011 . The scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task. And, the scheduling requirement information can be configured by the acceleration library 031 .

As an example, assuming that the amount of processor resources required to execute the target task is x, and the chip architectures supported by the target task include X86, ARM, and GPU, as shown in step S2 in FIG. 5, the acceleration library 031 can call the middleware by calling The programming interface 032 sends to the gateway 05 the resource requirements of the target task: the amount of processor resources X, and three chip architectures: X86, ARM and GPU. The gateway 05 can then forward the received scheduling requirement information to the scheduler 011 .

Step 104: The scheduler determines a target computing node from a plurality of computing nodes based on the scheduling requirement information.

In the embodiment of the present application, after the scheduler 011 receives the scheduling requirement information of the target task to be scheduled sent by the acceleration library 031, it can, based on the resource usage of each computing node 02 in the heterogeneous cluster, select the data from the multiple computing nodes. In 02, a target computing node that satisfies the execution conditions of the target task is determined. Wherein, the amount of idle resources of the processor of the target chip architecture in the target computing node meets the resource requirement of the target task, and the target chip architecture belongs to the at least two chip architectures.

Optionally, the scheduling requirement information of the target task sent by the acceleration library 031 may further include: priorities of the at least two chip architectures. Correspondingly, the scheduler 011 can sequentially detect the amount of idle resources of processors of each chip architecture in the heterogeneous cluster in the order of the priority from high to low, and determine the target computing from the plurality of computing nodes 02 node.

Because processors of different chip architectures are good at different types of tasks, for example, CPUs are good at scalar operations, GPUs are good at vector operations, and NPUs are good at matrix operations. Therefore, in the solution provided by the present application, the priorities of the at least two chip architectures may be defined in the scheduling requirement information, wherein the chip architecture with a higher priority is more suitable for processing the target task. Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can ensure the execution efficiency of the target task as much as possible while improving the resource utilization of the heterogeneous cluster.

Optionally, in order to improve the execution efficiency of the task, the acceleration library 031 in the host 03 can split the task to be executed into multiple parallel tasks, so that each computing node 02 in the heterogeneous cluster can execute the multiple parallel tasks in parallel. . Correspondingly, the target task is a parallel task among the multiple parallel tasks, and the scheduling requirement information may further include: parallel scheduling modes of the multiple parallel tasks. The parallel scheduling mode may include a synchronous parallel mode and an ideal parallel mode.

The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the multiple parallel tasks, it is necessary to ensure that the multiple parallel tasks are scheduled to be executed in processors of the same chip architecture. The ideal parallel mode means that the multiple parallel tasks do not require synchronous execution, that is, the multiple parallel tasks can be executed synchronously, or some parallel tasks can be executed first, and then the remaining parallel tasks can be executed. Therefore, when scheduling the multiple parallel tasks, the multiple parallel tasks can be scheduled to processors of different chip architectures for execution. This ideal parallel can also be called embarrassing parallel.

The implementation process of the foregoing step 104 is described below by taking the scheduling requirement information including the priorities of the at least two chip architectures and the parallel scheduling modes of the multiple parallel tasks as an example. As shown in FIG. 9, this step 104 may include:

Step 1041: Determine the parallel scheduling mode of multiple parallel tasks.

The scheduler 011 may determine the parallel scheduling mode of the multiple parallel tasks based on the received scheduling requirement information. If the parallel scheduling mode of the multiple parallel tasks is the ideal parallel mode, the scheduler 011 may execute the following

steps

1042a and 1043a; if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the scheduler 011 may execute

Steps

1042b and 1043b are described below.

Step 1042a, in order of priority from high to low, sequentially detect whether the amount of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the resource requirements of the target task.

If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, the scheduler 011 may directly determine the target computing node based on the resource requirements of the target tasks. That is, when the scheduler 011 schedules the target task, it only needs to ensure that the idle resources of processors of a certain chip architecture in the heterogeneous cluster can meet the resource requirements of the target task, without ensuring that the The sum of idle resources of processors of a certain chip architecture satisfies the sum of resource requirements of the multiple parallel tasks.

For example, it is assumed that the priorities of the three chip architectures supported by the target task satisfy: X86>ARM>GPU. Then, the scheduler 011 can sequentially detect whether the amount of idle resources of processors of each chip architecture in the multiple computing nodes meets the resource requirements of the target task in the order of X86, ARM and GPU.

For example, the scheduler 011 may first detect whether the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task. If the amount of idle resources of the processor of the X86 architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processor of the X86 architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processor of the ARM architecture in the heterogeneous cluster meets the resource requirement. If the amount of idle resources of the processor of the ARM architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processors of the ARM architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processors of the CPU architecture in the heterogeneous cluster meets the resource requirement.

Step 1043a: If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.

The scheduler 011 is in order of priority from high to low, if it detects that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, it can determine a computing node that includes the processor of the target chip architecture as the target computing node node. For example, if the scheduler 011 detects that the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task, a computing node including the processors of the X86 architecture may be determined as the target computing node. The amount of idle resources of the processor of the X86 architecture in the target computing node satisfies the resource requirement.

Step 1042b , in order of priority from high to low, sequentially detect whether the sum of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the sum of resource requirements of the multiple parallel tasks.

If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the scheduler 011 may determine that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the target task, the scheduler 011 needs to ensure that the sum of the idle resources of the processors of a certain chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks. That is, the scheduler 011 needs to determine the target computing node for executing the target task from the plurality of computing nodes based on the sum of the resource requirements of the multiple parallel tasks.

For example, it is assumed that the priorities of the three chip architectures supported by the target task satisfy: X86>ARM>GPU. Then, the scheduler 011 can sequentially detect whether the sum of idle resources of processors of each chip architecture in the multiple computing nodes satisfies the sum of resource requirements of the multiple parallel tasks in the order of X86, ARM and GPU.

Step 1043b: If it is detected that the sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements, a computing node including the processors of the target chip architecture is determined as the target computing node.

In order of priority from high to low, the scheduler 011 detects that the sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster needs to meet the sum of the resource requirements of the multiple parallel tasks, and can include A computing node of the processor of the target chip architecture is determined as the target computing node.

For example, if the scheduler 011 detects that the sum of the idle resources of the processors of the ARM architecture in the multiple computing nodes satisfies the sum of the resource requirements of the multiple parallel tasks, it can determine a computing node including the processors of the ARM architecture. target compute node. The amount of idle resources of the processor of the ARM architecture in the target computing node satisfies the resource requirement of the target task.

It can be understood that, in the

above steps

1043a and 1043b, if the scheduler 011 detects that among the plurality of computing nodes, there are at least two candidate computing nodes that satisfy the execution conditions of the target task. Then the scheduler 011 may randomly select one of the at least two candidate computing nodes as the target computing node. Alternatively, the scheduler 011 may select one of the at least two candidate computing nodes as the target computing node based on a preconfigured resource scheduling policy. Wherein, satisfying the execution condition of the target task means that the computing node includes a processor of the target chip architecture, and the amount of idle resources of the processor meets the resource requirement of the target task.

For example, referring to FIG. 2 , the scheduler 011 may include a task management and scheduler 0111 and a resource management and scheduler 0112 . Referring to FIG. 8 , after receiving the scheduling requirement information of the target task, the task management and scheduler 0111 can send a resource invocation request to the resource management and scheduler 0112 based on the resource requirements of the target task. The resource management and scheduler 0112 can further allocate resources for the target task based on a preconfigured resource scheduling policy, that is, determine a target computing node from a plurality of computing nodes 02 . As shown in FIG. 2 and FIG. 4 , the resource scheduling strategy may include: heterogeneous awareness, priority preemption, affinity and anti-affinity, bin packing algorithm or accelerator sharing, and the like.

It can also be understood that, if the task management and scheduler 0111 receives multiple tasks including the target task, after the resource management and scheduler 0112 completes the resource scheduling of the multiple tasks, the task management and scheduling The controller 0111 can perform task scheduling on the multiple tasks based on a preconfigured task scheduling policy. For example, referring to FIG. 2 and FIG. 4 , the task scheduling strategy may include: directed acyclic graph (directed acyclic graph, DAG) scheduling, priority scheduler or priority scheduling, and the like.

For example, it is assumed that the chip architecture of each computing node in the heterogeneous cluster includes GPU, NPU, and CPU, and the acceleration ratio of the processors of the three chip architectures is 2:2:1. If the task management and scheduler 0111 receives 100 parallel tasks, the parallel scheduling mode of the 100 parallel tasks is the ideal parallel mode, and the resources of 10 GPUs, 10 NPUs and 100 CPUs in the current heterogeneous cluster are idle, Among them, 50 CPUs of X86 architecture are idle in computing node A, and 50 CPUs of ARM architecture are idle in computing node B). Then the scheduler 011 can schedule 20 parallel tasks to be executed in the computing node containing GPU, 20 parallel tasks can be scheduled to be executed in the computing node containing NPU, 30 parallel tasks can be scheduled to be executed in the computing node A, and 30 parallel tasks are scheduled to compute node B for execution. Among them, each GPU and each NPU are used to execute 2 parallel tasks, and each CPU of X86 architecture and each CPU of ARM architecture are used to execute 1 parallel task.

Step 105: The scheduler sends a scheduling instruction for the target task to the target computing node.

After the scheduler 011 determines the target computing node 02 for executing the target task, the scheduler 011 may send a scheduling instruction for the target task to the target computing node 02 . The scheduling instruction may carry the identifier of the target task. The scheduling instruction is used to instruct the target computing node 02 to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor of the target chip architecture. executable code.

For example, assuming that the scheduler 011 receives N parallel tasks (N is an integer greater than 1), after the scheduler 011 determines the computing node 02 for executing each parallel task, as shown in step S3 in FIG. 5 . As shown, the scheduler 011 may respectively send scheduling instructions to the N computing nodes 02 for executing the N parallel tasks. For example, referring to FIG. 8 , the task management and scheduler 0111 in the scheduler 011 may send the scheduling instruction to the task layer agent 0212 in the computing node 02 .

Step 106: The scheduler sends the architecture identifier of the target chip architecture to the target computing node.

In this embodiment of the present application, one or more computing nodes in a heterogeneous cluster may include processors of various chip architectures, for example, may include NPU and CPU of X86 architecture, or may include GPU and CPU of X86 architecture. Therefore, in order to facilitate the target computing node to determine the chip architecture of the processor used for running the target task, the scheduler may also send the architecture identifier of the target chip architecture to the target computing node.

Optionally, as shown in FIG. 8 , after the resource management and scheduler 0112 in the scheduler 011 determines the target chip architecture, it can send the architecture identifier of the target chip architecture to the resource layer agent 0211 in the target computing node 02 . The resource layer agent 0211 can then send the architecture identifier of the target chip architecture to the task layer agent 0212.

Alternatively, after the resource management and scheduler 0112 in the scheduler 011 determines the target chip architecture, the architecture identifier of the target chip architecture may be sent to the task management and scheduler 0111 . The task manager and scheduler 011 can then send the architecture identification of the target chip architecture to the task layer agent 0212 in the target computing node 02 .

It can be understood that this step 106 can also be performed before step 105 . Alternatively, step 106 may be performed synchronously with step 105, for example, the scheduling instruction sent by the scheduler may carry the architecture identifier of the target chip architecture.

Step 107: Based on the scheduling instruction, the target computing node obtains the intermediate representation of the target task, the runtime plug-in and the input data from the file manager of the heterogeneous cluster.

Referring to step S4 in FIG. 5 , after receiving the scheduling instruction for the target task sent by the scheduler 011, the target computing node 02 can obtain the target task from the file manager 04 based on the identifier of the target task in the scheduling instruction intermediate representations, runtime plugins, and input data.

Since the data volume of the intermediate representation, runtime plug-in and input data of the target task is relatively large, the intermediate representation and the runtime plug-in can be stored by the file manager in the heterogeneous cluster, which can reduce the storage of the scheduler performance requirements. Also, since the scheduler does not need to forward intermediate representations, runtime plugins, and input data, it avoids any impact on its scheduling performance.

Optionally, in the above step 102, the acceleration library 031 in the host 03 may also directly send at least one of the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011. Correspondingly, in step 107 , the scheduler 011 can send the above at least one kind of data to the target computing node 02 , that is, the target computing node 02 can receive the above at least one kind of data sent by the scheduler 011 .

For example, referring to FIG. 6, the heterogeneous cluster may also not include the file manager 04. Then in the above step 102, the acceleration library 031 in the host 03 can send the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011. Correspondingly, in step 107, the target computing node 02 can receive the intermediate representation of the target task, the runtime plug-in and the input data sent by the scheduler 011. Since there is no need to additionally set a file manager in the heterogeneous cluster, the structure of the heterogeneous cluster can be simplified, and the deployment cost of the heterogeneous cluster can be reduced.

Step 108: Based on the scheduling instruction, the target computing node compiles the intermediate representation into executable code of the target chip architecture through the runtime plug-in.

In the embodiment of the present application, after the target computing node 02 obtains the runtime plug-in, it can run the runtime plug-in. The runtime plug-in may in turn compile the intermediate representation of the target task into executable code for the target chip architecture. That is, the runtime plugin can compile the intermediate representation online. Referring to Figure 7, it can be seen that the runtime plugin supports compiling the intermediate representation into executable codes of various chip architectures. For example, the runtime plugin can compile the intermediate representation into executable codes of NPU, GPU, X86 or ARM architectures. code.

For example, referring to FIG. 6 , assuming that the target computing node is a computing node A, and the target chip architecture is an X86 architecture, the computing node A can run the runtime plug-in on a processor of the X86 architecture. The runtime plugin may in turn compile the intermediate representation of the target task into executable code for the X86 architecture. Alternatively, if the target computing node is a computing node B and the target chip architecture is an NPU architecture, the computing node B can run the runtime plug-in in the NPU. The runtime plugin may in turn compile the intermediate representation of the target task into executable code for the NPU architecture.

Optionally, as shown in FIG. 2 and FIG. 8 , after the task layer agent 0212 in the target computing node 02 receives the scheduling instruction, it can first start the task service instance through the runtime plug-in manager, and the task service instance runs the task service instance. Runtime plugin. The runtime plugin in the running state can then compile the intermediate representation into executable code for the target chip architecture. For example, referring to FIG. 8 , the runtime plug-in in the running state can obtain the intermediate representation of the target task from the file manager 04, and compile the intermediate representation to obtain executable code. Alternatively, the intermediate representation can also be obtained by the task layer agent 0212 from the file manager 04 and sent to the runtime plug-in.

Step 109: The target computing node uses the input data as the input of the executable code, runs the executable code in the processor of the target chip architecture, and obtains the running result of the executable code.

After the target computing node compiles the intermediate representation into executable code of the target chip architecture through the runtime plug-in, the input data can be provided to the runtime plug-in. The runtime plug-in can further use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain the running result of the executable code.

For example, as shown in FIG. 8 , after starting the task service instance and running the runtime plug-in, the task layer agent 0212 in the target computing node 02 may provide input data to the runtime plug-in. For example, the input data may be input matrices A and B, and after the runtime plug-in runs the map function in the for loop, the running result obtained is the operation result of the matrix multiplication operation.

Optionally, in this embodiment of the present application, after compiling the intermediate representation of the target task to obtain executable code, the runtime plug-in may further cache the executable code of the target task. Therefore, when there is another target task of the same type to be executed subsequently, there is no need to perform online compilation of the intermediate representation of the target task, thereby avoiding the extra overhead introduced by the online compilation.

For example, after receiving the scheduling instruction for the target task, the target computing node detects that the executable code of the target task has been cached locally, and the chip architecture of the executable code is the same as that of the target chip sent by the scheduler, then The target computing node can directly use the runtime plug-in to take the input data of the target task as the input of the executable code, and run the executable code in the processor of the target chip architecture.

It can be understood that there may be dependencies between tasks executed in different computing nodes 02, that is, tasks allocated to different computing nodes 02 may require interactive data during the execution process. Therefore, the task service instance started by the task-level agent 0212 also has the function of communicating with the task-level agents 0212 in other computing nodes 02, thereby facilitating the acquisition of necessary data from other computing nodes 02 during task execution.

Step 110: The target computing node sends the running result to the scheduler.

In order to facilitate the host to further process the running result, referring to step S5 in FIG. 5 , the target computing node 02 may send the running result to the scheduler 011 .

Step 111: The scheduler sends the running result to the host.

Continuing to refer to step S5 in FIG. 5 , after receiving the running result, the scheduler 011 can send the running result to the acceleration library 031 in the host through the gateway 05 so that the acceleration library 031 can further process the running result.

For example, referring to FIG. 5 , assuming that the target task is one of N parallel tasks, after the N computing nodes 02 for executing the N parallel tasks obtain the running result, they can send the result to the scheduler 011 respectively. operation result. The scheduler 011 can then send the N running results to the acceleration library 031 through the gateway 05 . The acceleration library 031 can further perform reduction processing on the received N running results.

Optionally, as shown in FIG. 6 , the management node 01 in the heterogeneous cluster may further include a historical information collection module 012, and the historical information collection module 012 may be used to collect and store scheduling information and execution information of historical tasks.

FIG. 10 is a schematic diagram of an application scenario of still another task scheduling method provided by an embodiment of the present application. As shown in FIG. 10 , the host 03 may include a CPU, and the CPU is used for running the acceleration library 031 to compile the source code of the target task to obtain a fat binary file. Referring to Figure 10, the fat binary includes host code and an intermediate representation, the host code may be CPU host code.

Continuing to refer to FIG. 10 , the host 03 may also run a target-independent device plugin framework and an adaptive cluster plugin. For example, if the acceleration library 031 is an OpenMP acceleration library, the target agnostic device plug-in framework may be a target agnostic wrapper. The target-independent device plug-in framework is used to interface with the adaptive cluster plug-in, and the adaptive cluster plug-in is used to interact with the distributed middleware. For example, the adaptive cluster plug-in can send data to the scheduler 011 in the heterogeneous cluster by invoking the middleware programming interface, so as to offload the target task to the heterogeneous cluster for execution. Correspondingly, the adaptive cluster plug-in may also be referred to as an uninstall plug-in.

It can be understood that, the steps of the task scheduling method provided by the embodiments of the present application may be added or deleted according to circumstances. For example, the above-mentioned step 103 may be performed before the step 102 . Alternatively, if the sending object of each data in the above step 102 is the scheduler, then the step 102 and the step 103 may also be performed synchronously. Alternatively, if the target computing node only includes processors with one chip architecture, the above step 106 may also be deleted according to the situation. Alternatively, if the target task is not a parallel task, the

above steps

1041, 1042b and 1043b may also be deleted according to the situation.

It can also be understood that multiple parallel tasks (tasks) received by the scheduler may also be referred to as one job (job), and the method provided in this embodiment of the present application can not only implement task scheduling at the single-task level, but also implement job scheduling. level task scheduling.

To sum up, the embodiment of the present application provides a task scheduling method, and the target computing node can obtain the intermediate representation and the runtime plug-in of the target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.

Moreover, since the scheduler does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, the complexity of resource management and scheduling can be effectively reduced, thereby improving the efficiency of task scheduling. In addition, since the host does not need to provide executable codes of various architectures, the operation and maintenance cost and development cost on the host side can be effectively reduced.

Embodiments of the present application further provide a target computing node, which can be applied to the heterogeneous cluster provided by the foregoing embodiments, and can be used to implement the steps performed by the target computing node in the foregoing method embodiments. As shown in FIG. 1 , FIG. 2 , FIG. 6 , FIG. 6 , and FIG. 9 , the heterogeneous cluster includes a scheduler 011 and a plurality of computing nodes 02 , and the chip architecture of at least two computing nodes 02 in the plurality of computing nodes 02 Differently, the target computing node belongs to the plurality of computing nodes 02 . Referring to Figure 11, the target computing node may also include:

The receiving module 201 is configured to receive the scheduling instruction for the target task sent by the scheduler. For the function implementation of the receiving module 201, reference may be made to the relevant description of step 105 in the foregoing method embodiments.

The obtaining module 202 is configured to obtain the intermediate representation of the target task and the runtime plug-in of the target task. For the functional realization of the obtaining module 202, reference may be made to the relevant description of step 107 in the foregoing method embodiment.

A processing module 203, configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code in the processor of the target chip architecture, the target computing A node includes a processor of the target chip architecture. For the function implementation of the processing module 203, reference may be made to the relevant description of step 108 in the foregoing method embodiments.

Optionally, the obtaining module 202 may be configured to: receive the intermediate representation and the runtime plug-in of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the target task from the file manager of the heterogeneous cluster Intermediate representation of , and runtime plugins.

Optionally, the receiving module 201 may be further configured to: receive the architecture identifier of the target chip architecture sent by the scheduler. For the function implementation of the receiving module 201, reference may be made to the relevant description of step 106 in the foregoing method embodiments.

Correspondingly, the processing module 203 may compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the architecture identifier of the target chip architecture.

Optionally, the obtaining module 202 can also be used to obtain the input data of the target task.

The processing module 203 is configured to use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain a running result of the executable code. For the functional implementation of the processing module 203, reference may also be made to the relevant description of step 109 in the foregoing method embodiments.

Optionally, as shown in Figure 11, the target computing node further includes:

The sending module 204 is configured to send the running result to the scheduler after the processing module 203 obtains the running result of the executable code. For the function implementation of the sending module 204, reference may also be made to the relevant description of step 110 in the foregoing method embodiments.

Optionally, the obtaining module 202 may be configured to: receive the input data of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster.

To sum up, the embodiment of the present application provides a target computing node, and the target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.

The embodiment of the present application provides a scheduler, and the scheduler can be applied to the heterogeneous cluster provided by the foregoing embodiment, for example, can be applied to the management node 01 in the heterogeneous cluster. Moreover, the scheduler may be used to implement the steps performed by the scheduler in the above method embodiments. Referring to FIG. 1 , FIG. 2 , FIG. 6 , FIG. 6 , and FIG. 9 , the heterogeneous cluster further includes multiple computing nodes 02 , and at least two computing nodes 02 of the multiple computing nodes 02 have different chip architectures. As shown in Figure 12, the scheduler may include:

The receiving module 301 is configured to receive scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task. For the functional realization of the receiving module 301, reference may be made to the relevant description of step 103 in the foregoing method embodiments.

The determining module 302 is configured to determine, based on the scheduling requirement information, a target computing node from the plurality of computing nodes, where the idle resources of the processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target Chip architectures belong to the at least two chip architectures. For the functional implementation of the determining module 302, reference may be made to the relevant description of step 104 in the foregoing method embodiments.

The sending module 303 is used to send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into the target through the runtime plug-in of the target task The executable code of the chip architecture is executed in the processor of the target chip architecture. For the function implementation of the sending module 303, reference may be made to the relevant description of step 105 in the foregoing method embodiments.

Optionally, the scheduling requirement information may further include: priorities of the at least two chip architectures; the determining module 302 may be used for:

According to the order of priorities of the at least two chip architectures from high to low, sequentially detecting whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirement;

If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.

For the function implementation of the determination module 302, reference may be made to the relevant descriptions of

steps

1042a and 1043a in the above method embodiments.

Optionally, the sending module 303 may be further configured to send the architecture identifier of the target chip architecture to the target computing node. For the function implementation of the sending module 303, reference may also be made to the relevant description of step 106 in the foregoing method embodiments.

Optionally, the receiving module 301 may also be configured to receive the intermediate representation of the target task and the runtime plug-in of the target task. Correspondingly, the sending module 303 may also be configured to send the intermediate representation of the target task and the runtime plug-in to the target computing node.

Optionally, the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks; the determining module 302 may be used for:

If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, then based on the sum of the resource requirements of the multiple parallel tasks, the target computing node is determined from the multiple computing nodes, and the target computing node in the heterogeneous cluster is the target chip architecture. The sum of the idle resources of the processor satisfies the sum of the resource requirements of the multiple parallel tasks;

If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;

The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously. For the functional implementation of the determining module 302, reference may also be made to the relevant descriptions of step 1041, step 1042b and step 1043b in the above method embodiments.

To sum up, the embodiment of the present application provides a scheduler. Since the target computing node can obtain the intermediate representation and the runtime plug-in of the target task, and the intermediate representation is code independent of the chip architecture of the processor, the The target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.

An embodiment of the present application further provides a host, which can be applied to the task scheduling system provided in the foregoing embodiment, and can be used to implement the steps performed by the host in the foregoing method embodiment. Referring to Figure 13, the host may include:

The compiling module 401 is used for compiling the source code of the target task to obtain the intermediate representation of the target task and the runtime plug-in of the target task. For the functional realization of the compiling module 401, reference may be made to the relevant description of step 101 in the foregoing method embodiment.

The first sending module 402 is configured to send the intermediate representation and the runtime plug-in. For the function implementation of the first sending module 402, reference may be made to the relevant description of step 102 in the foregoing method embodiments.

The second sending module 403 is configured to send scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.

Wherein, the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to the at least two computing nodes The target computing node in the node, the amount of idle resources of the processor of the target chip architecture in the target computing node meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures, and the runtime plug-in is used to provide The target computing node compiles the intermediate representation into executable code for the target chip architecture.

For the function implementation of the second sending module 403, reference may be made to the relevant description of step 103 in the foregoing method embodiments.

Optionally, the first sending module 402 can be used to:

The intermediate representation and the runtime plugin are sent to the scheduler; alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.

To sum up, the embodiments of the present application provide a host, which can provide an intermediate representation of a target task and a runtime plug-in to a target computing node. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of each module in the target computing node, the scheduler and the host described above can refer to the corresponding process in the foregoing method embodiment, in the This will not be repeated here.

It should be understood that, the above-mentioned target computing nodes, schedulers, and hosts provided in the embodiments of the present application may all be implemented by application-specific integrated circuits (application-specific integrated circuits, ASICs), or programmable logic devices (programmable logic devices, PLDs). , the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.

Of course, the task scheduling method provided by the above method embodiments may also be implemented by software. When the task scheduling method provided by the above method embodiments is implemented by software, the target computing node, the scheduler and the host may all include components for implementing the above method. software module.

An embodiment of the present application further provides a computer device, and the computer device can be applied to the task scheduling system provided by the above embodiment. The computer device may be the target computing node, scheduler or host provided in the above embodiment. Referring to FIG. 14 , the computer device may include: a processor 501 , a memory 502 , a network interface 503 and a bus 504 . The bus 504 is used for connecting the processor 501 , the memory 502 and the network interface 503 . The communication connection with other devices can be realized through the network interface 503 (which may be wired or wireless). The memory 502 stores a computer program 5021 for realizing various application functions.

It should be understood that in this embodiment of the present application, the processor 501 may be a CPU, and the processor 501 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays ( FPGA), GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like.

Memory 502 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).

In addition to the data bus, the bus 504 may also include a power bus, a control bus, a status signal bus, and the like. However, for the sake of clarity, the various buses are labeled as bus 504 in the figure.

The processor 501 is configured to execute the computer program stored in the memory 502, and the processor 501 implements the task scheduling method shown in the above method embodiments by executing the computer program 5021.

For example, if the computer device is a target computing node, the processor 501 may implement the steps performed by the target computing node in the above method embodiments by executing the computer program 5021. If the computer device is a scheduler, the processor 501 may implement the steps performed by the scheduler in the above method embodiments by executing the computer program 5021 . If the computer device is a host, the processor 501 may implement the steps performed by the host in the above method embodiments by executing the computer program 5021 .

Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node in the foregoing method embodiments, or The task scheduling method applied to the scheduler in the above method embodiment is implemented, or the task scheduling method applied to the host in the above method embodiment is implemented.

The embodiments of the present application also provide a computer program product containing instructions, when the computer program product runs on a computer, the computer program product enables the computer to implement the task scheduling method applied to the target computing node in the above method embodiments, or to implement the above method. In the example, the task scheduling method applied to the scheduler may be implemented, or the task scheduling method applied to the host in the above method embodiments may be implemented.

The embodiment of the present application also provides a task scheduling system, as shown in FIG. 1 , FIG. 2 and FIG. 10 , the system may include: a host 03 , a scheduler 011 , and multiple computing nodes 02 , the multiple computing nodes 02 The chip architectures of at least two computing nodes 02 are different.

Wherein, at least one computing node among the plurality of computing nodes 02 is the target computing node provided in the foregoing embodiment, for example, may be the target computing node shown in FIG. 11 or FIG. 14 .

The scheduler 011 is the scheduler provided in the foregoing embodiment, and may be, for example, the scheduler shown in FIG. 12 or FIG. 14 .

The host 03 is the host provided in the above embodiment, for example, the host shown in FIG. 13 or FIG. 14 .

The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).

In this application, the terms "first", "second" and other words are used to distinguish the same or similar items with basically the same function and function, and it should be understood that between "first", "second" and "nth" There are no logical or timing dependencies, and no restrictions on the number and execution order.

The meaning of the term "at least one" in this application means at least one, and the meaning of the term "plurality" in this application means two or more. The terms "system" and "network" are often used interchangeably herein.

The above are only optional embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions of the present application shall be included within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A task scheduling method, characterized in that it is applied to a target computing node in a heterogeneous cluster, the heterogeneous cluster includes a scheduler and a plurality of computing nodes, and the chip architecture of at least two computing nodes in the plurality of computing nodes is different, the target computing node belongs to the multiple computing nodes; the method includes:

receiving the scheduling instruction for the target task sent by the scheduler;

obtaining an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is a chip architecture-independent code obtained by compiling the source code of the target task;

Based on the scheduling instruction, the intermediate representation is compiled into executable code of a target chip architecture through the runtime plug-in, and the target computing node includes a processor of the target chip architecture;

The executable code is executed in the processor of the target chip architecture by the runtime plug-in.
The method according to claim 1, wherein the obtaining the intermediate representation of the target task and the runtime plug-in of the target task comprises:

Based on the scheduling instruction, obtain the intermediate representation of the target task and the runtime plug-in of the target task from the file manager of the heterogeneous cluster;

Or, receive the intermediate representation of the target task and the runtime plug-in of the target task sent by the scheduler.
The method according to claim 1 or 2, wherein the method further comprises: receiving an architecture identifier of the target chip architecture sent by the scheduler;

The compiling of the intermediate representation into executable code of the target chip architecture by the runtime plug-in includes:

Based on the architecture identification of the target chip architecture, the intermediate representation is compiled into executable code of the target chip architecture by the runtime plug-in.
The method according to any one of claims 1 to 3, wherein the method further comprises: acquiring input data of the target task;

The executing the executable code in the processor of the target chip architecture by using the runtime plug-in includes: using the runtime plug-in to use the input data as the input of the executable code, in the Running the executable code in the processor of the target chip architecture to obtain a running result of the executable code;

The method further includes: sending the running result to the scheduler.
The method according to claim 4, wherein the acquisition of the input data of the target task comprises:

Based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster;

Or, receive the input data of the target task sent by the scheduler.
A task scheduling method, characterized in that it is applied to a scheduler in a heterogeneous cluster, the heterogeneous cluster further includes a plurality of computing nodes, and at least two computing nodes in the plurality of computing nodes have different chip architectures; The methods described include:

receiving scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;

Based on the scheduling requirement information, a target computing node is determined from the plurality of computing nodes, the idle resource amount of the processor of the target chip architecture in the target computing node satisfies the resource requirement of the target task, and the target chip The architecture belongs to the at least two chip architectures;

Send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into the target task through the runtime plug-in of the target task The executable code of the target chip architecture is executed in the processor of the target chip architecture, wherein the intermediate representation is obtained by compiling the source code of the target task and is independent of the chip architecture code.
The method according to claim 6, wherein the scheduling requirement information further comprises: priorities of the at least two chip architectures; the determining from the plurality of computing nodes based on the scheduling requirement information Target compute nodes, including:

According to the order of the priorities of the at least two chip architectures from high to low, sequentially detecting whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirements;

If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
The method according to claim 6 or 7, wherein the method further comprises:

receiving an intermediate representation of the target task and a runtime plugin for the target task;

The intermediate representation and the runtime plugin are sent to the target compute node.
The method according to any one of claims 6 to 8, wherein the method further comprises:

The architecture identifier of the target chip architecture is sent to the target computing node.
The method according to any one of claims 6 to 9, wherein the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further comprises: parallel scheduling modes of the multiple parallel tasks ;

The determining a target computing node from the plurality of computing nodes based on the scheduling requirement information includes:

If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the target computing node is determined from the multiple computing nodes based on the sum of the resource requirements of the multiple parallel tasks. The sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements of the multiple parallel tasks;

If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;

The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously.
A task scheduling method, characterized in that the method comprises:

Compile the source code of the target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is code independent of the chip architecture;

sending the intermediate representation and the runtime plugin;

Sending scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;

The heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to A target computing node in the at least two computing nodes, the amount of idle resources of the processors of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code for the target chip architecture.
The method of claim 11, wherein the sending the intermediate representation and the runtime plug-in comprises:

sending the intermediate representation and the runtime plugin to the scheduler;

Alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
A target computing node, characterized in that it is applied to a heterogeneous cluster, the heterogeneous cluster includes a scheduler and a plurality of computing nodes, and at least two computing nodes in the plurality of computing nodes have different chip architectures, and the target The computing node belongs to the plurality of computing nodes; the target computing node further includes:

a receiving module, configured to receive the scheduling instruction for the target task sent by the scheduler;

an acquisition module for acquiring an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is a chip architecture-independent code obtained by compiling the source code of the target task;

a processing module, configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code in the processor of the target chip architecture code, wherein the target computing node includes a processor of the target chip architecture.
The target computing node according to claim 13, wherein the acquisition module is used for:

Based on the scheduling instruction, obtain the intermediate representation of the target task and the runtime plug-in of the target task from the file manager of the heterogeneous cluster;

Or, receive the intermediate representation of the target task and the runtime plug-in of the target task sent by the scheduler.
The target computing node according to claim 13 or 14, wherein the receiving module is further configured to receive the architecture identifier of the target chip architecture sent by the scheduler;

The processing module is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the architecture identifier of the target chip architecture.
The target computing node according to any one of claims 13 to 15, wherein the obtaining module is further configured to obtain input data of the target task;

The processing module is configured to use the input data as the input of the executable code through the runtime plug-in, run the executable code in the processor of the target chip architecture, and obtain the executable code the result of the operation;

The target computing node also includes:

A sending module, configured to send the running result to the scheduler after the processing module obtains the running result of the executable code.
The target computing node according to claim 16, wherein the acquisition module is used for:

Based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster;

Or, receive the input data of the target task sent by the scheduler.
A scheduler, characterized in that it is applied to a heterogeneous cluster, wherein the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the scheduler includes:

a receiving module, configured to receive scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;

a determining module, configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, where the idle resource amount of the processor of the target chip architecture in the target computing node satisfies the resource requirement of the target task, and the target chip architecture belongs to the at least two chip architectures;

A sending module, configured to send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to transfer the middle of the target task through the runtime plug-in of the target task. Indicates that the executable code is compiled into the target chip architecture, and the executable code is run in the processor of the target chip architecture, and the intermediate representation is obtained by compiling the source code of the target task and the chip Architecture-independent code.
The scheduler according to claim 18, wherein the scheduling requirement information further comprises: priorities of the at least two chip architectures; and the determining module is configured to:

According to the order of the priorities of the at least two chip architectures from high to low, sequentially detect whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirements;

If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
The scheduler according to claim 18 or 19, wherein the receiving module is further configured to receive the intermediate representation of the target task and the runtime plug-in of the target task;

The sending module is further configured to send the intermediate representation and the runtime plug-in to the target computing node.
The scheduler according to any one of claims 18 to 20, wherein the sending module is further configured to send the architecture identifier of the target chip architecture to the target computing node.
The scheduler according to any one of claims 18 to 21, wherein the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further comprises: parallel scheduling of the multiple parallel tasks model;

The determining module is used for:

If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the target computing node is determined from the multiple computing nodes based on the sum of the resource requirements of the multiple parallel tasks. The sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements of the multiple parallel tasks;

If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;

The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously.
A host, characterized in that the host comprises:

a compilation module, used for compiling the source code of the target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is code independent of the chip architecture;

a first sending module for sending the intermediate representation and the runtime plug-in;

The second sending module is configured to send the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes the resource requirement of the target task and at least two chips supported by the target task architecture;

The heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to A target computing node in the at least two computing nodes, the amount of idle resources of the processors of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code for the target chip architecture.
The host according to claim 23, wherein the first sending module is configured to:

sending the intermediate representation and the runtime plugin to the scheduler;

Alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions, and the instructions are executed by a processor to implement the task scheduling method according to any one of claims 1 to 12.
A computer device, characterized in that the computer device comprises: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the computer program when the processor executes the computer program. The task scheduling method according to any one of claims 1 to 12.
A task scheduling system, characterized in that the task scheduling system comprises: the host according to claim 23 or 24, the scheduler according to any one of claims 18 to 22, and a plurality of computing nodes;

At least one computing node among the plurality of computing nodes is the target computing node according to any one of claims 13 to 17 .
A task scheduling system, characterized in that the task scheduling system comprises: a host computer, a scheduler, and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures;

The host is used to compile the source code of the target task, obtain the intermediate representation of the target task and the runtime plug-in of the target task, send the intermediate representation and the runtime plug-in, and send the intermediate representation and the runtime plug-in to the host. The server sends the scheduling requirement information of the target task, wherein the intermediate representation is code independent of the chip architecture, and the scheduling requirement information includes the resource requirements of the target task and at least two chips supported by the target task. architecture;

The scheduler is configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, and send a scheduling instruction for the target task to the target computing node, wherein the target computing node The amount of idle resources of the processor of the target chip architecture in the node meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures;

The target computing node is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and use the runtime plug-in to execute the code in the target chip architecture. run the executable code in a processor.