WO2022166480A1 - Task scheduling method, apparatus and system - Google Patents

Task scheduling method, apparatus and system Download PDF

Info

Publication number
WO2022166480A1
WO2022166480A1 PCT/CN2021/142532 CN2021142532W WO2022166480A1 WO 2022166480 A1 WO2022166480 A1 WO 2022166480A1 CN 2021142532 W CN2021142532 W CN 2021142532W WO 2022166480 A1 WO2022166480 A1 WO 2022166480A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
task
computing node
scheduling
target task
Prior art date
Application number
PCT/CN2021/142532
Other languages
French (fr)
Chinese (zh)
Inventor
苏磊
孙宏伟
贺波
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022166480A1 publication Critical patent/WO2022166480A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular, to a task scheduling method, device and system.
  • processor architectures also called processor architectures
  • common processors of different chip architectures include: a central processing unit (CPU) that supports general-purpose computing, a graphics processor (graphics processing unit, GPU) that supports image rendering and high-performance computing, and a neural network that supports neural networks.
  • Computational neural network processor neural-network processing unit, NPU
  • the chip architecture of the CPU can be further divided into the X86 architecture and the advanced RISC machine (AMR) architecture.
  • a heterogeneous cluster refers to a cluster composed of computing nodes with different chip architectures.
  • the processors of some computing nodes in a heterogeneous cluster are CPUs, and the processors of some computing nodes are GPUs or NPUs. Since the processor in the computing node can only run the executable code of the same chip architecture type, the scheduler in the heterogeneous cluster needs a task-based executable code architecture to schedule the task to the processor when scheduling tasks.
  • a compute node whose chip architecture matches the architecture of the executable code.
  • the above task scheduling method may lead to unbalanced load of each computing node in the heterogeneous cluster, and the heterogeneous cluster resource utilization is low.
  • the present application provides a resource scheduling method, device and system, which can solve the technical problem of low resource utilization of heterogeneous clusters.
  • the technical solutions are as follows:
  • a task scheduling method is provided, which is applied to a target computing node in a heterogeneous cluster, where the heterogeneous cluster includes a scheduler and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures , the target computing node belongs to the plurality of computing nodes; the method includes: receiving the scheduling instruction for the target task sent by the scheduler, and obtaining the intermediate representation of the target task and the runtime plug-in of the target task, based on the scheduling instruction , compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in, and run the executable code in the processor of the target chip architecture through the runtime plugin; wherein, the intermediate representation is the target chip architecture
  • the source code of the task is compiled to obtain the code independent of the chip architecture, and the target computing node includes the processor of the target chip architecture.
  • the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, it can ensure that the load of each computing node is relatively balanced, and effectively improve the resource utilization of heterogeneous clusters.
  • the process of acquiring the intermediate representation of the target task and the runtime plugin by the target computing node may include: based on the scheduling instruction, acquiring the intermediate representation of the target task and the runtime plugin from the file manager of the heterogeneous cluster; Alternatively, receive the intermediate representation of the target task and the runtime plugin sent by the scheduler.
  • the intermediate representation and the runtime plugin of the target task can be stored by the file manager in the heterogeneous cluster, thereby reducing the storage performance requirements of the scheduler . Also, since the scheduler does not need to forward intermediate representations and runtime plugins, it avoids any impact on its scheduling performance.
  • the intermediate representation and the runtime plug-in can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.
  • the method may further include: receiving the architecture identifier of the target chip architecture sent by the scheduler; correspondingly, the process of compiling the intermediate representation into executable code of the target chip architecture through the runtime plug-in may include: Based on the architecture identifier of the target chip architecture sent by the scheduler, the intermediate representation is compiled into executable code of the target chip architecture through the runtime plug-in.
  • the scheduler may also send the architecture identifier of the target chip architecture to the target computing node, so that the target computing node can determine the executable that the intermediate representation needs to be compiled into The architecture of the code.
  • the method may further include: acquiring input data of the target task; the process of running the executable code in the processor of the target chip architecture through the runtime plug-in may include: using the runtime plug-in to input the input data The data is used as the input of the executable code, the executable code is executed in the processor of the target chip architecture, and the execution result of the executable code is obtained; the method may further include: sending the execution result to the scheduler.
  • the scheduler can then send the running result to the host providing the target task, so that the host can perform subsequent processing on the running result.
  • the host can perform reduction processing on the running results provided by multiple computing nodes.
  • the process of acquiring the input data of the target task by the target computing node may include: based on the scheduling instruction, acquiring the input data of the target task from the file manager of the heterogeneous cluster; or, receiving a message sent by the scheduler The input data of the target task.
  • the input data can be stored by the file manager, thereby reducing the storage performance requirements of the scheduler.
  • the input data can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.
  • a task scheduling method is provided, which is applied to a scheduler in a heterogeneous cluster, where the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the The method includes: receiving scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task; and determining a target from a plurality of computing nodes based on the scheduling requirement information A computing node, the amount of idle resources of the processor of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures; sending scheduling for the target task to the target computing node instruction, the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor. Executable code, where the intermediate representation is
  • the scheduling requirement information may further include: priorities of the at least two chip architectures; correspondingly, based on the scheduling requirement information, the scheduler determines the process of the target computing node from the plurality of computing nodes may include: According to the order of priority of the at least two chip architectures from high to low, sequentially detect whether the amount of idle resources of the processors of the corresponding chip architectures in the plurality of computing nodes meets the resource requirements; if the processor of the target chip architecture is detected If the amount of idle resources satisfies the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
  • the priorities of the at least two chip architectures can be defined in the scheduling requirement information, and the chip architecture with a higher priority is more suitable for processing the target task . Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can effectively ensure the execution efficiency of the target task.
  • the method may further include: sending the architecture identifier of the target chip architecture to the target computing node.
  • the method may further include: receiving the intermediate representation of the target task and the runtime plug-in of the target task, and sending the intermediate representation and the runtime plug-in to the target computing node.
  • the target task is a parallel task among multiple parallel tasks
  • the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks
  • the process of determining the target computing node may include: if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, then based on the sum of the resource requirements of the multiple parallel tasks, determining the target computing node from the multiple computing nodes, The sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks; if the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, based on the target task
  • the target computing node is determined from the multiple computing nodes; the synchronous parallel mode means that multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that multiple parallel tasks do not need to be executed synchronously.
  • the scheduler may determine the target computing node in different ways based on the parallel scheduling modes of multiple parallel tasks to ensure that the multiple parallel tasks can be reliably executed according to the required scheduling mode.
  • a task scheduling method which can be applied to a host, and the method includes: compiling a source code of a target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, the intermediate Represented as code independent of the chip architecture; send the intermediate representation and the runtime plug-in; send the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, the scheduling requirement information includes the resource requirements of the target task, and the At least two chip architectures supported by the target task; wherein the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to The target task is scheduled to a target computing node of the at least two computing nodes, the idle resources of processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing
  • the process of sending the intermediate representation and the runtime plugin may include: sending the intermediate representation and the runtime plugin to the scheduler; or, sending the intermediate representation and the runtime plugin to a file manager in the heterogeneous cluster Runtime plugin.
  • a target computing node which is applied to a heterogeneous cluster.
  • the heterogeneous cluster includes a scheduler and multiple computing nodes. At least two computing nodes in the multiple computing nodes have different chip architectures.
  • the node belongs to the plurality of computing nodes; the target computing node includes a processor of the target chip architecture; the target computing node further includes at least one module, the at least one module is used to implement the task scheduling method applied to the target computing node provided by the above aspects .
  • a scheduler is provided, applied to a heterogeneous cluster, the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the scheduler includes at least one A module, where the at least one module is used to implement the task scheduling method applied to the scheduler provided by the above aspects.
  • a host in another aspect, the host includes at least one module, and the at least one module is used to implement the task scheduling method applied to the host provided by the above aspects.
  • a computer device comprising: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the above aspect when the processor executes the computer program.
  • a computer-readable storage medium where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node provided in the above aspect, or, to achieve The task scheduling method applied to the scheduler provided by the above aspect, or, the task scheduling method applied to the host provided by the above aspect is implemented.
  • a computer program product which, when running on a computer, can cause the computer to execute the task scheduling method applied to the target computing node provided in the above aspect, or execute the application provided in the above aspect.
  • the task scheduling method of the scheduler or alternatively, execute the task scheduling method applied to the host provided by the above aspects.
  • a task scheduling system includes: the host provided in the above aspect, the scheduler provided in the above aspect, and multiple computing nodes; at least one computing node in the multiple computing nodes is A target computing node as provided in the above aspect.
  • a task scheduling system includes: a host computer, a scheduler, and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures;
  • the host is used for compiling the source code of the target task, obtaining the intermediate representation of the target task and the runtime plug-in of the target task, sending the intermediate representation and the runtime plug-in, and sending the information of the target task to the programmer Scheduling requirement information, wherein the intermediate representation is code independent of chip architecture, and the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;
  • the scheduler is configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, and send a scheduling instruction for the target task to the target computing node, wherein the target computing node in the target chip architecture
  • the amount of idle resources of the processor meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures;
  • the target computing node is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code on the processor of the target chip architecture through the runtime plug-in code.
  • the present application provides a task scheduling method, device and system, and a target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
  • FIG. 1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a task scheduling method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a compilation process provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a task scheduling process provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a method for determining a target computing node provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a target computing node provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a host provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application.
  • the application scenario includes a heterogeneous cluster, and the heterogeneous cluster includes: a management node 01 and a plurality of computing nodes 02 connected to the management node 01 .
  • At least two computing nodes 02 of the plurality of computing nodes 02 adopt different chip architectures of processors. For example, among the plurality of computing nodes 02, some computing nodes 02 use a CPU, some computing nodes 02 use a GPU, and other computing nodes 02 use a NPU.
  • a scheduler 011 is deployed in the management node 01 , and the application scenario may further include a host 03 .
  • the host 03 may send the target task to be scheduled to the scheduler 011, and the scheduler 011 may then schedule the target task to at least one computing node 02 for execution.
  • FIG. 2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application.
  • an acceleration library 031 is deployed in the host 03
  • the acceleration library 031 is a software collection for optimizing the performance of the processor.
  • the acceleration library 031 can be used to send the target task to be scheduled to the scheduler 011 .
  • any computing node 02 in the heterogeneous cluster can also serve as a host to send tasks to be scheduled to the scheduler 011 .
  • the application scenario may also not include the host 03 independent of the heterogeneous cluster.
  • management node 01 may also have the function of a computing node, that is, the management node 01 can not only schedule tasks, but also execute tasks.
  • the scheduler in the heterogeneous cluster needs to record the type of chip architecture of each computing node in advance.
  • the host submits a task to the scheduler, it also needs to mark the architecture of the executable code used by the task in the submitted task.
  • the scheduler can schedule the task to be executed in a computing node matching the chip architecture according to the architecture of the executable code marked in the task.
  • the architecture of the executable code adopted by each task is unbalanced, based on the above task scheduling method, the load of each computing node in the heterogeneous cluster will be unbalanced.
  • the task can only be scheduled to run on a computing node whose processor architecture is X86. If the computing node whose processor architecture is X86 in the heterogeneous cluster has no available idle resources, but the computing node whose processor architecture is ARM has idle resources. In this scenario, the idle resources of the heterogeneous cluster cannot be used to process the task, resulting in low resource utilization.
  • a host-submitted task may include multiple executables of different architectures.
  • the scheduler can determine a target computing node for executing the task according to the load of each computing node 02, and schedule the task to the target computing node. Since the task includes executable codes of multiple different architectures, the target computing node can execute executable codes whose architecture is the same as the chip architecture of its processor. However, this scheduling method requires the host to implement executable codes of various architectures, resulting in high cost.
  • the embodiment of the present application provides a distributed middleware for implementing adaptive task scheduling in a heterogeneous cluster.
  • the adaptive task scheduling refers to scheduling tasks that are adaptive based on resource usage of heterogeneous clusters.
  • the heterogeneous cluster may also be referred to as an adaptive cluster
  • the distributed middleware may also be referred to as adaptive middleware. 1 and 2
  • the distributed middleware may include: a middleware programming interface 032 deployed in the host 03 , a scheduler 011 deployed in the management node 01 , and a cluster agent 021 deployed in the computing node 02 .
  • the middleware programming interface 032 is used to provide the acceleration library 031 with the ability to access heterogeneous clusters, that is, the acceleration library 031 can exchange data with components in the heterogeneous cluster by calling the middleware programming interface 032 .
  • the acceleration library 031 may send the scheduling requirement information of the target task, the intermediate representation of the target task, and the runtime plug-in of the target task to the scheduler 011 through the middleware programming interface 032 .
  • the intermediate representation may also be called intermediate language or intermediate code, which is an equivalent internal representation code of the source code.
  • the intermediate representation is independent of the chip architecture of the processor, that is, the intermediate representation can be compiled into executable codes (also referred to as object codes) of different architectures.
  • Runtime is the runtime environment of a programming language, which is a virtual environment that can provide software services for running programs.
  • the runtime plugin refers to a component capable of providing the runtime environment of the intermediate representation. Since the runtime plug-in provided by the embodiment of the present application supports the application to run in a heterogeneous device environment, it may also be called a heterogeneous runtime plug-in.
  • the runtime plugin can provide a plugin interface for being called by the cluster agent 021 in the distributed middleware so that the cluster agent 021 can initialize the runtime plugin, deinitialize the runtime plugin, run the runtime plugin, and exit Clean up the runtime plugin.
  • the scheduler 011 is used to schedule tasks based on the usage of heterogeneous resources in the heterogeneous group. As shown in FIG. 2 , the scheduler 011 mainly includes a task management and scheduler 0111 and a resource management and scheduler 0112 .
  • the resource management and scheduler 0112 is used to manage and schedule the resources of each computing node 02 in the heterogeneous cluster, and the resources include at least processor resources, and may also include memory resources and the like.
  • the task management and scheduler 0111 is configured to send a resource scheduling request to the resource management and scheduler 0112 based on the resource requirement of the target task sent by the acceleration library 031 .
  • the resource manager and scheduler 0112 can allocate resources to target tasks based on the resource scheduling request.
  • the resource management and scheduler 0112 is based on the resource usage of each computing node 02, and the resources allocated for the target task include the processor resources of the target chip architecture in the target computing node 02, the task management and scheduler 0111 can be based on the allocated resources.
  • the resource distributes the target task to the target computing node 02 .
  • the cluster agent 021 is mainly used to start task service instances and manage runtime plug-ins. As shown in FIG. 2 , the cluster agent 021 includes a resource-level agent 0211 and a task-level agent 0212 . Among them, the resource layer agent 0211 is used to collect the resource information of the computing node 02 and report it to the resource management and scheduler 0112, so that the computing node 02 joins the heterogeneous cluster.
  • the task layer agent 0212 is used to start a task service instance based on the resources provided by the resource layer agent 0211, and the task service instance runs a runtime plug-in of the target task, or it can be understood that the task service instance includes a runtime plug-in instance.
  • the task layer proxy 022 can also be used to send the intermediate representation of the target task and the target chip architecture determined by the resource manager and scheduler 0112 to the runtime plug-in.
  • the runtime plug-in can then compile the intermediate representation into executable code of the target chip architecture, and run the executable code in the processor of the target chip architecture, thereby realizing the running of the target task.
  • the scheduler 011 in the management node 01 can schedule the target task based on the resource usage of each computing node 02 without considering the target task's The architecture of the executable code. In this way, the resource utilization rate of the heterogeneous cluster can be effectively improved without the need for the host 03 to provide multiple executable codes of different architectures.
  • the scheduler 011 since the scheduler 011 does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, it can effectively reduce the complexity of resource management and scheduling, thereby improving the efficiency of task scheduling.
  • acceleration libraries in different fields can be combined with the above-mentioned distributed middleware to realize adaptive scheduling of target tasks in heterogeneous clusters.
  • the embodiment of the present application provides a task scheduling method, and the method can be applied to the application scenarios provided by the foregoing embodiments.
  • the task scheduling method provided by the embodiment of the present application includes:
  • Step 101 The host compiles the source code of the target task to obtain an intermediate representation and a runtime plug-in of the target task.
  • the acceleration library 031 in the host 03 can compile the source code of the target task to obtain the intermediate representation and runtime plug-in of the target task.
  • FIG. 4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application.
  • the acceleration library 031 may be an acceleration library in different fields.
  • the acceleration library 031 may be a parallel programming acceleration library, such as an open multi-processing (open multi-processing, OpenMP) acceleration library.
  • the acceleration library 031 may also be other types of acceleration libraries, such as a numerical calculation acceleration library, a graph calculation acceleration library, a data frame (data frame) acceleration library, or a machine learning acceleration library.
  • the numerical computation acceleration library may include a numerical Python (numerical Python, NumPy) acceleration library
  • the data frame acceleration library may be a pandas acceleration library
  • the machine learning acceleration library may include a Scikit-learn (Scikit-learn) acceleration library.
  • pandas is a data analysis package for Python.
  • the target task can be a task of applications in different fields such as computer vision (computer vision, CV) applications, natural language processing (natural language processing, NLP) applications, or machine learning prediction applications.
  • the acceleration library 031 can compile the source code of the target task into an intermediate representation independent of the chip architecture through a compiler, and obtain a runtime plug-in related to the compiler.
  • the intermediate representation may be a standard portable intermediate representation (standard portable intermediate representation-V, SPIR-V) or a web component (WebAssembly, WASM) intermediate representation, or the like.
  • the runtime plugin may be a tensor virtual machine (tensor virtual machine, TVM) runtime plugin, a SPIR-V runtime plugin, or a WASM runtime plugin, or the like.
  • the programming framework of the compiler adopted by the acceleration library 031 may include any one of the following frameworks: Python, Java, Go, numerical domain specified language (DSL), table structure DSL, Distributed parallel DSL and C++ heterogeneous programming framework, etc.
  • Python, Java and Go are the names of computer programming languages.
  • the source code of the target task may be a code fragment in a certain program, and the code fragment may also be referred to as a code fragment to be accelerated, or an accelerated kernel (kernel) code fragment.
  • the developer can mark the code fragment in advance by using the device guide, and the acceleration library 031 can compile the marked code fragment to obtain the intermediate representation of the target task and the runtime plug-in.
  • the program running in the OpenMP acceleration library for implementing the matrix multiplication (matmul) operation is as follows:
  • float represents a floating-point data type
  • int represents an integer type
  • a and B represent two input matrices
  • C represents an output matrix, that is, matrix C is equal to the product of matrix A and matrix B.
  • “pragma omp parallel for” is a directive in OpenMP, indicating that the following for loop will be executed by multiple threads.
  • the for in the above-mentioned program may be executed in parallel by the computing nodes in the heterogeneous cluster cycle. That is, the code fragment of the for loop can be unloaded to a heterogeneous cluster for execution, and correspondingly, the source code of the target task is the for loop in the program.
  • the developer can add a device directive to the above-mentioned matrix multiplication program to mark the for loop.
  • the program with the device directive added is:
  • “pragma omp target device(ADAPTIVE_CLUSTER)” is the added device guideline, which means to unload the subsequent code fragment from the host to the target device (target device) for execution.
  • the target device is a heterogeneous cluster.
  • "ADAPTIVE_CLUSTER” is the name of the target device defined in this embodiment of the application.
  • the OpenMP acceleration library can compile the code fragment (for loop) marked by the device directive into an intermediate representation. Then, in the process of running the executable code of the above program, when the OpenMP acceleration library detects that the intermediate representation of the code fragment marked by the device guideline is to be executed, the intermediate representation can be unloaded to the heterogeneous cluster for execution.
  • the acceleration library 031 may obtain a fat binary (fat binary) file by compiling the source code of the target task through a low-level virtual machine (low level virtual machine, LLVM).
  • the fat binary file contains host code (such as the main function) and an intermediate representation independent of the chip architecture.
  • the file format of the fat binary file may be an executable and linkable format (executable and linkable format, ELF).
  • FIG. 7 is a schematic diagram of a programming framework provided by an embodiment of the present application.
  • the acceleration library 031 can use a DSL compiler to compile the source code of the target task.
  • the compilation process may include steps such as algorithm abstraction, computation graph optimization, data graph optimization, communication graph optimization, and abstract syntax tree generation. The above steps can be automatically scheduled by the acceleration library 031, or scheduled by the user.
  • Step 102 The host sends the intermediate representation of the target task, the runtime plug-in, and the input data of the target task to the file manager in the heterogeneous cluster.
  • the heterogeneous cluster may further include a file manager 04, and the acceleration library 031 in the host 01 can send the middleware of the target task to the file manager 04 by calling the middleware programming interface 032 representation, the runtime plugin, and the input data for the target task.
  • the input data may include an input matrix A and an input matrix B.
  • the heterogeneous cluster may further include a gateway 05 , and the gateway 05 is respectively connected with the scheduler 011 and the file manager 04 .
  • the acceleration library 031 can send the intermediate representation of the target task and the runtime plug-in to the gateway 05 by calling the software development kit (SDK) interface provided by the gateway 05 and input data.
  • SDK software development kit
  • the gateway 05 can in turn forward the received data to the file manager 04 .
  • the main component of the SDK interface is the middleware programming interface 032 .
  • the file manager 04 may include one or more storage devices with file storage functions.
  • Each computing node 02 in the heterogeneous cluster has established a communication connection with the file manager 04 and can obtain data from the file manager 04 .
  • Step 103 The host sends the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster.
  • the acceleration library 031 in the host 03 can send the scheduling requirement information of the target task to the scheduler 011 in the management node 01 by calling the middleware programming interface 032 .
  • the acceleration library 031 may send scheduling requirement information to the task management and scheduler 011 in the scheduler 011 .
  • the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.
  • the scheduling requirement information can be configured by the acceleration library 031 .
  • the acceleration library 031 can call the middleware by calling
  • the programming interface 032 sends to the gateway 05 the resource requirements of the target task: the amount of processor resources X, and three chip architectures: X86, ARM and GPU.
  • the gateway 05 can then forward the received scheduling requirement information to the scheduler 011 .
  • Step 104 The scheduler determines a target computing node from a plurality of computing nodes based on the scheduling requirement information.
  • the scheduler 011 after the scheduler 011 receives the scheduling requirement information of the target task to be scheduled sent by the acceleration library 031, it can, based on the resource usage of each computing node 02 in the heterogeneous cluster, select the data from the multiple computing nodes.
  • a target computing node that satisfies the execution conditions of the target task is determined.
  • the amount of idle resources of the processor of the target chip architecture in the target computing node meets the resource requirement of the target task, and the target chip architecture belongs to the at least two chip architectures.
  • the scheduling requirement information of the target task sent by the acceleration library 031 may further include: priorities of the at least two chip architectures.
  • the scheduler 011 can sequentially detect the amount of idle resources of processors of each chip architecture in the heterogeneous cluster in the order of the priority from high to low, and determine the target computing from the plurality of computing nodes 02 node.
  • processors of different chip architectures are good at different types of tasks, for example, CPUs are good at scalar operations, GPUs are good at vector operations, and NPUs are good at matrix operations. Therefore, in the solution provided by the present application, the priorities of the at least two chip architectures may be defined in the scheduling requirement information, wherein the chip architecture with a higher priority is more suitable for processing the target task. Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can ensure the execution efficiency of the target task as much as possible while improving the resource utilization of the heterogeneous cluster.
  • the acceleration library 031 in the host 03 can split the task to be executed into multiple parallel tasks, so that each computing node 02 in the heterogeneous cluster can execute the multiple parallel tasks in parallel.
  • the target task is a parallel task among the multiple parallel tasks
  • the scheduling requirement information may further include: parallel scheduling modes of the multiple parallel tasks.
  • the parallel scheduling mode may include a synchronous parallel mode and an ideal parallel mode.
  • the synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the multiple parallel tasks, it is necessary to ensure that the multiple parallel tasks are scheduled to be executed in processors of the same chip architecture.
  • the ideal parallel mode means that the multiple parallel tasks do not require synchronous execution, that is, the multiple parallel tasks can be executed synchronously, or some parallel tasks can be executed first, and then the remaining parallel tasks can be executed. Therefore, when scheduling the multiple parallel tasks, the multiple parallel tasks can be scheduled to processors of different chip architectures for execution. This ideal parallel can also be called embarrassing parallel.
  • this step 104 may include:
  • Step 1041 Determine the parallel scheduling mode of multiple parallel tasks.
  • the scheduler 011 may determine the parallel scheduling mode of the multiple parallel tasks based on the received scheduling requirement information. If the parallel scheduling mode of the multiple parallel tasks is the ideal parallel mode, the scheduler 011 may execute the following steps 1042a and 1043a; if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the scheduler 011 may execute Steps 1042b and 1043b are described below.
  • Step 1042a in order of priority from high to low, sequentially detect whether the amount of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the resource requirements of the target task.
  • the scheduler 011 may directly determine the target computing node based on the resource requirements of the target tasks. That is, when the scheduler 011 schedules the target task, it only needs to ensure that the idle resources of processors of a certain chip architecture in the heterogeneous cluster can meet the resource requirements of the target task, without ensuring that the The sum of idle resources of processors of a certain chip architecture satisfies the sum of resource requirements of the multiple parallel tasks.
  • the scheduler 011 can sequentially detect whether the amount of idle resources of processors of each chip architecture in the multiple computing nodes meets the resource requirements of the target task in the order of X86, ARM and GPU.
  • the scheduler 011 may first detect whether the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task. If the amount of idle resources of the processor of the X86 architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processor of the X86 architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processor of the ARM architecture in the heterogeneous cluster meets the resource requirement. If the amount of idle resources of the processor of the ARM architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processors of the ARM architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processors of the CPU architecture in the heterogeneous cluster meets the resource requirement.
  • Step 1043a If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
  • the scheduler 011 is in order of priority from high to low, if it detects that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, it can determine a computing node that includes the processor of the target chip architecture as the target computing node node. For example, if the scheduler 011 detects that the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task, a computing node including the processors of the X86 architecture may be determined as the target computing node. The amount of idle resources of the processor of the X86 architecture in the target computing node satisfies the resource requirement.
  • Step 1042b in order of priority from high to low, sequentially detect whether the sum of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the sum of resource requirements of the multiple parallel tasks.
  • the scheduler 011 may determine that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the target task, the scheduler 011 needs to ensure that the sum of the idle resources of the processors of a certain chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks. That is, the scheduler 011 needs to determine the target computing node for executing the target task from the plurality of computing nodes based on the sum of the resource requirements of the multiple parallel tasks.
  • the scheduler 011 can sequentially detect whether the sum of idle resources of processors of each chip architecture in the multiple computing nodes satisfies the sum of resource requirements of the multiple parallel tasks in the order of X86, ARM and GPU.
  • Step 1043b If it is detected that the sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements, a computing node including the processors of the target chip architecture is determined as the target computing node.
  • the scheduler 011 detects that the sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster needs to meet the sum of the resource requirements of the multiple parallel tasks, and can include A computing node of the processor of the target chip architecture is determined as the target computing node.
  • the scheduler 011 detects that the sum of the idle resources of the processors of the ARM architecture in the multiple computing nodes satisfies the sum of the resource requirements of the multiple parallel tasks, it can determine a computing node including the processors of the ARM architecture. target compute node. The amount of idle resources of the processor of the ARM architecture in the target computing node satisfies the resource requirement of the target task.
  • the scheduler 011 may randomly select one of the at least two candidate computing nodes as the target computing node.
  • the scheduler 011 may select one of the at least two candidate computing nodes as the target computing node based on a preconfigured resource scheduling policy.
  • satisfying the execution condition of the target task means that the computing node includes a processor of the target chip architecture, and the amount of idle resources of the processor meets the resource requirement of the target task.
  • the scheduler 011 may include a task management and scheduler 0111 and a resource management and scheduler 0112 .
  • the task management and scheduler 0111 can send a resource invocation request to the resource management and scheduler 0112 based on the resource requirements of the target task.
  • the resource management and scheduler 0112 can further allocate resources for the target task based on a preconfigured resource scheduling policy, that is, determine a target computing node from a plurality of computing nodes 02 .
  • the resource scheduling strategy may include: heterogeneous awareness, priority preemption, affinity and anti-affinity, bin packing algorithm or accelerator sharing, and the like.
  • the task management and scheduler 0111 can perform task scheduling on the multiple tasks based on a preconfigured task scheduling policy.
  • the task scheduling strategy may include: directed acyclic graph (directed acyclic graph, DAG) scheduling, priority scheduler or priority scheduling, and the like.
  • the chip architecture of each computing node in the heterogeneous cluster includes GPU, NPU, and CPU, and the acceleration ratio of the processors of the three chip architectures is 2:2:1.
  • the parallel scheduling mode of the 100 parallel tasks is the ideal parallel mode, and the resources of 10 GPUs, 10 NPUs and 100 CPUs in the current heterogeneous cluster are idle, Among them, 50 CPUs of X86 architecture are idle in computing node A, and 50 CPUs of ARM architecture are idle in computing node B).
  • the scheduler 011 can schedule 20 parallel tasks to be executed in the computing node containing GPU, 20 parallel tasks can be scheduled to be executed in the computing node containing NPU, 30 parallel tasks can be scheduled to be executed in the computing node A, and 30 parallel tasks are scheduled to compute node B for execution.
  • each GPU and each NPU are used to execute 2 parallel tasks
  • each CPU of X86 architecture and each CPU of ARM architecture are used to execute 1 parallel task.
  • Step 105 The scheduler sends a scheduling instruction for the target task to the target computing node.
  • the scheduler 011 may send a scheduling instruction for the target task to the target computing node 02 .
  • the scheduling instruction may carry the identifier of the target task.
  • the scheduling instruction is used to instruct the target computing node 02 to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor of the target chip architecture. executable code.
  • the scheduler 011 may respectively send scheduling instructions to the N computing nodes 02 for executing the N parallel tasks.
  • the task management and scheduler 0111 in the scheduler 011 may send the scheduling instruction to the task layer agent 0212 in the computing node 02 .
  • Step 106 The scheduler sends the architecture identifier of the target chip architecture to the target computing node.
  • one or more computing nodes in a heterogeneous cluster may include processors of various chip architectures, for example, may include NPU and CPU of X86 architecture, or may include GPU and CPU of X86 architecture. Therefore, in order to facilitate the target computing node to determine the chip architecture of the processor used for running the target task, the scheduler may also send the architecture identifier of the target chip architecture to the target computing node.
  • the resource management and scheduler 0112 in the scheduler 011 determines the target chip architecture, it can send the architecture identifier of the target chip architecture to the resource layer agent 0211 in the target computing node 02 .
  • the resource layer agent 0211 can then send the architecture identifier of the target chip architecture to the task layer agent 0212.
  • the architecture identifier of the target chip architecture may be sent to the task management and scheduler 0111 .
  • the task manager and scheduler 011 can then send the architecture identification of the target chip architecture to the task layer agent 0212 in the target computing node 02 .
  • step 106 can also be performed before step 105 .
  • step 106 may be performed synchronously with step 105, for example, the scheduling instruction sent by the scheduler may carry the architecture identifier of the target chip architecture.
  • Step 107 Based on the scheduling instruction, the target computing node obtains the intermediate representation of the target task, the runtime plug-in and the input data from the file manager of the heterogeneous cluster.
  • the target computing node 02 can obtain the target task from the file manager 04 based on the identifier of the target task in the scheduling instruction intermediate representations, runtime plugins, and input data.
  • the intermediate representation and the runtime plug-in can be stored by the file manager in the heterogeneous cluster, which can reduce the storage of the scheduler performance requirements. Also, since the scheduler does not need to forward intermediate representations, runtime plugins, and input data, it avoids any impact on its scheduling performance.
  • the acceleration library 031 in the host 03 may also directly send at least one of the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011.
  • the scheduler 011 can send the above at least one kind of data to the target computing node 02 , that is, the target computing node 02 can receive the above at least one kind of data sent by the scheduler 011 .
  • the heterogeneous cluster may also not include the file manager 04.
  • the acceleration library 031 in the host 03 can send the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011.
  • the target computing node 02 can receive the intermediate representation of the target task, the runtime plug-in and the input data sent by the scheduler 011. Since there is no need to additionally set a file manager in the heterogeneous cluster, the structure of the heterogeneous cluster can be simplified, and the deployment cost of the heterogeneous cluster can be reduced.
  • Step 108 Based on the scheduling instruction, the target computing node compiles the intermediate representation into executable code of the target chip architecture through the runtime plug-in.
  • the target computing node 02 after the target computing node 02 obtains the runtime plug-in, it can run the runtime plug-in.
  • the runtime plug-in may in turn compile the intermediate representation of the target task into executable code for the target chip architecture. That is, the runtime plugin can compile the intermediate representation online.
  • the runtime plugin supports compiling the intermediate representation into executable codes of various chip architectures. For example, the runtime plugin can compile the intermediate representation into executable codes of NPU, GPU, X86 or ARM architectures. code.
  • the computing node A can run the runtime plug-in on a processor of the X86 architecture.
  • the runtime plugin may in turn compile the intermediate representation of the target task into executable code for the X86 architecture.
  • the target computing node is a computing node B and the target chip architecture is an NPU architecture
  • the computing node B can run the runtime plug-in in the NPU.
  • the runtime plugin may in turn compile the intermediate representation of the target task into executable code for the NPU architecture.
  • the task layer agent 0212 in the target computing node 02 can first start the task service instance through the runtime plug-in manager, and the task service instance runs the task service instance.
  • Runtime plugin The runtime plugin in the running state can then compile the intermediate representation into executable code for the target chip architecture.
  • the runtime plug-in in the running state can obtain the intermediate representation of the target task from the file manager 04, and compile the intermediate representation to obtain executable code.
  • the intermediate representation can also be obtained by the task layer agent 0212 from the file manager 04 and sent to the runtime plug-in.
  • Step 109 The target computing node uses the input data as the input of the executable code, runs the executable code in the processor of the target chip architecture, and obtains the running result of the executable code.
  • the input data can be provided to the runtime plug-in.
  • the runtime plug-in can further use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain the running result of the executable code.
  • the task layer agent 0212 in the target computing node 02 may provide input data to the runtime plug-in.
  • the input data may be input matrices A and B, and after the runtime plug-in runs the map function in the for loop, the running result obtained is the operation result of the matrix multiplication operation.
  • the runtime plug-in may further cache the executable code of the target task. Therefore, when there is another target task of the same type to be executed subsequently, there is no need to perform online compilation of the intermediate representation of the target task, thereby avoiding the extra overhead introduced by the online compilation.
  • the target computing node After receiving the scheduling instruction for the target task, the target computing node detects that the executable code of the target task has been cached locally, and the chip architecture of the executable code is the same as that of the target chip sent by the scheduler, then The target computing node can directly use the runtime plug-in to take the input data of the target task as the input of the executable code, and run the executable code in the processor of the target chip architecture.
  • the task service instance started by the task-level agent 0212 also has the function of communicating with the task-level agents 0212 in other computing nodes 02, thereby facilitating the acquisition of necessary data from other computing nodes 02 during task execution.
  • Step 110 The target computing node sends the running result to the scheduler.
  • the target computing node 02 may send the running result to the scheduler 011 .
  • Step 111 The scheduler sends the running result to the host.
  • the scheduler 011 can send the running result to the acceleration library 031 in the host through the gateway 05 so that the acceleration library 031 can further process the running result.
  • the target task is one of N parallel tasks
  • the N computing nodes 02 for executing the N parallel tasks can send the result to the scheduler 011 respectively. operation result.
  • the scheduler 011 can then send the N running results to the acceleration library 031 through the gateway 05 .
  • the acceleration library 031 can further perform reduction processing on the received N running results.
  • the management node 01 in the heterogeneous cluster may further include a historical information collection module 012, and the historical information collection module 012 may be used to collect and store scheduling information and execution information of historical tasks.
  • FIG. 10 is a schematic diagram of an application scenario of still another task scheduling method provided by an embodiment of the present application.
  • the host 03 may include a CPU, and the CPU is used for running the acceleration library 031 to compile the source code of the target task to obtain a fat binary file.
  • the fat binary includes host code and an intermediate representation, the host code may be CPU host code.
  • the host 03 may also run a target-independent device plugin framework and an adaptive cluster plugin.
  • the target agnostic device plug-in framework may be a target agnostic wrapper.
  • the target-independent device plug-in framework is used to interface with the adaptive cluster plug-in, and the adaptive cluster plug-in is used to interact with the distributed middleware.
  • the adaptive cluster plug-in can send data to the scheduler 011 in the heterogeneous cluster by invoking the middleware programming interface, so as to offload the target task to the heterogeneous cluster for execution.
  • the adaptive cluster plug-in may also be referred to as an uninstall plug-in.
  • the steps of the task scheduling method provided by the embodiments of the present application may be added or deleted according to circumstances.
  • the above-mentioned step 103 may be performed before the step 102 .
  • the step 102 and the step 103 may also be performed synchronously.
  • the target computing node only includes processors with one chip architecture, the above step 106 may also be deleted according to the situation.
  • the above steps 1041, 1042b and 1043b may also be deleted according to the situation.
  • multiple parallel tasks (tasks) received by the scheduler may also be referred to as one job (job), and the method provided in this embodiment of the present application can not only implement task scheduling at the single-task level, but also implement job scheduling. level task scheduling.
  • the embodiment of the present application provides a task scheduling method, and the target computing node can obtain the intermediate representation and the runtime plug-in of the target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
  • the scheduler does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, the complexity of resource management and scheduling can be effectively reduced, thereby improving the efficiency of task scheduling.
  • the host does not need to provide executable codes of various architectures, the operation and maintenance cost and development cost on the host side can be effectively reduced.
  • Embodiments of the present application further provide a target computing node, which can be applied to the heterogeneous cluster provided by the foregoing embodiments, and can be used to implement the steps performed by the target computing node in the foregoing method embodiments.
  • the heterogeneous cluster includes a scheduler 011 and a plurality of computing nodes 02 , and the chip architecture of at least two computing nodes 02 in the plurality of computing nodes 02 Differently, the target computing node belongs to the plurality of computing nodes 02 .
  • the target computing node may also include:
  • the receiving module 201 is configured to receive the scheduling instruction for the target task sent by the scheduler.
  • the receiving module 201 reference may be made to the relevant description of step 105 in the foregoing method embodiments.
  • the obtaining module 202 is configured to obtain the intermediate representation of the target task and the runtime plug-in of the target task.
  • the obtaining module 202 For the functional realization of the obtaining module 202, reference may be made to the relevant description of step 107 in the foregoing method embodiment.
  • a processing module 203 configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code in the processor of the target chip architecture, the target computing A node includes a processor of the target chip architecture.
  • the processing module 203 For the function implementation of the processing module 203, reference may be made to the relevant description of step 108 in the foregoing method embodiments.
  • the obtaining module 202 may be configured to: receive the intermediate representation and the runtime plug-in of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the target task from the file manager of the heterogeneous cluster Intermediate representation of , and runtime plugins.
  • the receiving module 201 may be further configured to: receive the architecture identifier of the target chip architecture sent by the scheduler.
  • the receiving module 201 For the function implementation of the receiving module 201, reference may be made to the relevant description of step 106 in the foregoing method embodiments.
  • the processing module 203 may compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the architecture identifier of the target chip architecture.
  • the obtaining module 202 can also be used to obtain the input data of the target task.
  • the processing module 203 is configured to use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain a running result of the executable code.
  • the processing module 203 reference may also be made to the relevant description of step 109 in the foregoing method embodiments.
  • the target computing node further includes:
  • the sending module 204 is configured to send the running result to the scheduler after the processing module 203 obtains the running result of the executable code.
  • the sending module 204 For the function implementation of the sending module 204, reference may also be made to the relevant description of step 110 in the foregoing method embodiments.
  • the obtaining module 202 may be configured to: receive the input data of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster.
  • the embodiment of the present application provides a target computing node, and the target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
  • the embodiment of the present application provides a scheduler, and the scheduler can be applied to the heterogeneous cluster provided by the foregoing embodiment, for example, can be applied to the management node 01 in the heterogeneous cluster. Moreover, the scheduler may be used to implement the steps performed by the scheduler in the above method embodiments. Referring to FIG. 1 , FIG. 2 , FIG. 6 , FIG. 6 , and FIG. 9 , the heterogeneous cluster further includes multiple computing nodes 02 , and at least two computing nodes 02 of the multiple computing nodes 02 have different chip architectures. As shown in Figure 12, the scheduler may include:
  • the receiving module 301 is configured to receive scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.
  • the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.
  • the determining module 302 is configured to determine, based on the scheduling requirement information, a target computing node from the plurality of computing nodes, where the idle resources of the processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target Chip architectures belong to the at least two chip architectures.
  • the determining module 302 reference may be made to the relevant description of step 104 in the foregoing method embodiments.
  • the sending module 303 is used to send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into the target through the runtime plug-in of the target task
  • the executable code of the chip architecture is executed in the processor of the target chip architecture.
  • the scheduling requirement information may further include: priorities of the at least two chip architectures; the determining module 302 may be used for:
  • a computing node including the processor of the target chip architecture is determined as the target computing node.
  • the sending module 303 may be further configured to send the architecture identifier of the target chip architecture to the target computing node.
  • the sending module 303 may also be made to the relevant description of step 106 in the foregoing method embodiments.
  • the receiving module 301 may also be configured to receive the intermediate representation of the target task and the runtime plug-in of the target task.
  • the sending module 303 may also be configured to send the intermediate representation of the target task and the runtime plug-in to the target computing node.
  • the target task is a parallel task among multiple parallel tasks
  • the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks; the determining module 302 may be used for:
  • the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode
  • the target computing node is determined from the multiple computing nodes, and the target computing node in the heterogeneous cluster is the target chip architecture.
  • the sum of the idle resources of the processor satisfies the sum of the resource requirements of the multiple parallel tasks
  • the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;
  • the synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously.
  • the determining module 302 reference may also be made to the relevant descriptions of step 1041, step 1042b and step 1043b in the above method embodiments.
  • the embodiment of the present application provides a scheduler. Since the target computing node can obtain the intermediate representation and the runtime plug-in of the target task, and the intermediate representation is code independent of the chip architecture of the processor, the The target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
  • An embodiment of the present application further provides a host, which can be applied to the task scheduling system provided in the foregoing embodiment, and can be used to implement the steps performed by the host in the foregoing method embodiment.
  • the host may include:
  • the compiling module 401 is used for compiling the source code of the target task to obtain the intermediate representation of the target task and the runtime plug-in of the target task.
  • the compiling module 401 For the functional realization of the compiling module 401, reference may be made to the relevant description of step 101 in the foregoing method embodiment.
  • the first sending module 402 is configured to send the intermediate representation and the runtime plug-in.
  • the first sending module 402 reference may be made to the relevant description of step 102 in the foregoing method embodiments.
  • the second sending module 403 is configured to send scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.
  • the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to the at least two computing nodes
  • step 103 For the function implementation of the second sending module 403, reference may be made to the relevant description of step 103 in the foregoing method embodiments.
  • the first sending module 402 can be used to:
  • the intermediate representation and the runtime plugin are sent to the scheduler; alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
  • the embodiments of the present application provide a host, which can provide an intermediate representation of a target task and a runtime plug-in to a target computing node. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code.
  • the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
  • the above-mentioned target computing nodes, schedulers, and hosts provided in the embodiments of the present application may all be implemented by application-specific integrated circuits (application-specific integrated circuits, ASICs), or programmable logic devices (programmable logic devices, PLDs).
  • the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
  • the task scheduling method provided by the above method embodiments may also be implemented by software.
  • the target computing node, the scheduler and the host may all include components for implementing the above method. software module.
  • An embodiment of the present application further provides a computer device, and the computer device can be applied to the task scheduling system provided by the above embodiment.
  • the computer device may be the target computing node, scheduler or host provided in the above embodiment.
  • the computer device may include: a processor 501 , a memory 502 , a network interface 503 and a bus 504 .
  • the bus 504 is used for connecting the processor 501 , the memory 502 and the network interface 503 .
  • the communication connection with other devices can be realized through the network interface 503 (which may be wired or wireless).
  • the memory 502 stores a computer program 5021 for realizing various application functions.
  • the processor 501 may be a CPU, and the processor 501 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays ( FPGA), GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGA field programmable gate arrays
  • GPU GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • Memory 502 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • bus 504 may also include a power bus, a control bus, a status signal bus, and the like. However, for the sake of clarity, the various buses are labeled as bus 504 in the figure.
  • the processor 501 is configured to execute the computer program stored in the memory 502, and the processor 501 implements the task scheduling method shown in the above method embodiments by executing the computer program 5021.
  • the processor 501 may implement the steps performed by the target computing node in the above method embodiments by executing the computer program 5021. If the computer device is a scheduler, the processor 501 may implement the steps performed by the scheduler in the above method embodiments by executing the computer program 5021 . If the computer device is a host, the processor 501 may implement the steps performed by the host in the above method embodiments by executing the computer program 5021 .
  • Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node in the foregoing method embodiments, or The task scheduling method applied to the scheduler in the above method embodiment is implemented, or the task scheduling method applied to the host in the above method embodiment is implemented.
  • the embodiments of the present application also provide a computer program product containing instructions, when the computer program product runs on a computer, the computer program product enables the computer to implement the task scheduling method applied to the target computing node in the above method embodiments, or to implement the above method.
  • the task scheduling method applied to the scheduler may be implemented, or the task scheduling method applied to the host in the above method embodiments may be implemented.
  • the embodiment of the present application also provides a task scheduling system, as shown in FIG. 1 , FIG. 2 and FIG. 10 , the system may include: a host 03 , a scheduler 011 , and multiple computing nodes 02 , the multiple computing nodes 02 The chip architectures of at least two computing nodes 02 are different.
  • At least one computing node among the plurality of computing nodes 02 is the target computing node provided in the foregoing embodiment, for example, may be the target computing node shown in FIG. 11 or FIG. 14 .
  • the scheduler 011 is the scheduler provided in the foregoing embodiment, and may be, for example, the scheduler shown in FIG. 12 or FIG. 14 .
  • the host 03 is the host provided in the above embodiment, for example, the host shown in FIG. 13 or FIG. 14 .
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present application provides a task scheduling method, apparatus and system, relating to the technical field of computers. In the solution provided by the present application, a target computing node can obtain the intermediate representation and runtime plugin of a target task. Because the intermediate representation is a code independent of the chip architecture of a processor, the target computing node can compile the intermediate representation into an executable code of a target chip architecture by means of the runtime plugin, and run the executable code in the processor of the target chip architecture. Accordingly, when scheduling the target task, the scheduler in a heterogeneous cluster will not be limited by the architecture of the compiled executable code in the target task, but can flexibly determine, on the basis of the resource usage of each computing node in the heterogeneous cluster, the computing node for executing the target task. Thus, it can be ensured that the loads of all computing nodes are relatively balanced, thereby effectively increasing the resource utilization rate of the heterogeneous cluster.

Description

任务调度方法、装置及系统Task scheduling method, device and system 技术领域technical field
本申请涉及计算机技术领域,特别涉及一种任务调度方法、装置及系统。The present application relates to the field of computer technology, and in particular, to a task scheduling method, device and system.
背景技术Background technique
随着芯片技术的快速发展,芯片架构(也可以称为处理器架构)的类型越来越丰富。例如,常见的不同芯片架构的处理器包括:支持通用计算的中央处理器(central processing unit,CPU)、支持图像渲染和高性能计算的图形处理器(graphics processing unit,GPU),以及支持神经网络计算的神经网络处理器(neural-network processing unit,NPU)等。其中,CPU的芯片架构还可以进一步划分为X86架构和进阶精简指令集机器(advanced RISC machine,AMR)架构等。With the rapid development of chip technology, the types of chip architectures (also called processor architectures) are becoming more and more abundant. For example, common processors of different chip architectures include: a central processing unit (CPU) that supports general-purpose computing, a graphics processor (graphics processing unit, GPU) that supports image rendering and high-performance computing, and a neural network that supports neural networks. Computational neural network processor (neural-network processing unit, NPU), etc. Among them, the chip architecture of the CPU can be further divided into the X86 architecture and the advanced RISC machine (AMR) architecture.
异构集群是指由不同芯片架构的计算节点所组成的集群,例如异构集群中的部分计算节点的处理器为CPU,部分计算节点的处理器则为GPU或NPU。由于计算节点中的处理器仅能够运行与其芯片架构类型相同的可执行代码,因此异构集群中的调度器在调度任务时,需要基于任务的可执行代码的架构,将该任务调度至处理器的芯片架构与该可执行代码的架构相匹配的计算节点。A heterogeneous cluster refers to a cluster composed of computing nodes with different chip architectures. For example, the processors of some computing nodes in a heterogeneous cluster are CPUs, and the processors of some computing nodes are GPUs or NPUs. Since the processor in the computing node can only run the executable code of the same chip architecture type, the scheduler in the heterogeneous cluster needs a task-based executable code architecture to schedule the task to the processor when scheduling tasks. A compute node whose chip architecture matches the architecture of the executable code.
但是,由于异构集群接收到的大量任务所采用的可执行代码的架构可能是不均衡的,因此基于上述任务调度方式,可能会导致异构集群中各计算节点的负载不均衡,异构集群的资源利用率较低。However, since the architecture of the executable code used by a large number of tasks received by the heterogeneous cluster may be unbalanced, the above task scheduling method may lead to unbalanced load of each computing node in the heterogeneous cluster, and the heterogeneous cluster resource utilization is low.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种资源调度方法、装置及系统,可以解决异构集群的资源利用率较低的技术问题,技术方案如下:The present application provides a resource scheduling method, device and system, which can solve the technical problem of low resource utilization of heterogeneous clusters. The technical solutions are as follows:
一方面,提供了一种任务调度方法,应用于异构集群中的目标计算节点,该异构集群包括调度器和多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同,该目标计算节点属于该多个计算节点;该方法包括:接收该调度器发送的针对目标任务的调度指令,并获取该目标任务的中间表示以及该目标任务的运行时插件,基于该调度指令,通过该运行时插件将该中间表示编译为目标芯片架构的可执行代码,并通过该运行时插件在该目标芯片架构的处理器中运行该可执行代码;其中,该中间表示是对该目标任务的源代码进行编译得到的与芯片架构无关的代码,该目标计算节点包括该目标芯片架构的处理器。In one aspect, a task scheduling method is provided, which is applied to a target computing node in a heterogeneous cluster, where the heterogeneous cluster includes a scheduler and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures , the target computing node belongs to the plurality of computing nodes; the method includes: receiving the scheduling instruction for the target task sent by the scheduler, and obtaining the intermediate representation of the target task and the runtime plug-in of the target task, based on the scheduling instruction , compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in, and run the executable code in the processor of the target chip architecture through the runtime plugin; wherein, the intermediate representation is the target chip architecture The source code of the task is compiled to obtain the code independent of the chip architecture, and the target computing node includes the processor of the target chip architecture.
由于该中间表示是与处理器的芯片架构无关的代码,因此目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为 均衡,有效提高异构集群的资源利用率。Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, it can ensure that the load of each computing node is relatively balanced, and effectively improve the resource utilization of heterogeneous clusters.
可选地,目标计算节点获取该目标任务的中间表示以及运行时插件的过程可以包括:基于该调度指令,从该异构集群的文件管理器中获取该目标任务的中间表示以及运行时插件;或者,接收该调度器发送的该目标任务的中间表示以及运行时插件。Optionally, the process of acquiring the intermediate representation of the target task and the runtime plugin by the target computing node may include: based on the scheduling instruction, acquiring the intermediate representation of the target task and the runtime plugin from the file manager of the heterogeneous cluster; Alternatively, receive the intermediate representation of the target task and the runtime plugin sent by the scheduler.
由于目标任务的中间表示和运行时插件的数据量相对较大,因此可以通过异构集群中的文件管理器来存储该中间表示和运行时插件,由此能够降低对调度器的存储性能的要求。并且,由于调度器无需转发中间表示和运行时插件,因此可以避免对其调度性能造成影响。Since the data volume of the intermediate representation and the runtime plugin of the target task is relatively large, the intermediate representation and the runtime plugin can be stored by the file manager in the heterogeneous cluster, thereby reducing the storage performance requirements of the scheduler . Also, since the scheduler does not need to forward intermediate representations and runtime plugins, it avoids any impact on its scheduling performance.
或者,也可以直接由调度器来转发该中间表示和运行时插件,从而无需在异构集群中额外设置文件管理器,以简化该异构集群的结构,降低异构集群的部署成本。Alternatively, the intermediate representation and the runtime plug-in can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.
可选地,该方法还可以包括:接收该调度器发送的该目标芯片架构的架构标识;相应的,通过该运行时插件将该中间表示编译为目标芯片架构的可执行代码的过程可以包括:基于该调度器发送的该目标芯片架构的架构标识,通过该运行时插件将该中间表示编译为目标芯片架构的可执行代码。Optionally, the method may further include: receiving the architecture identifier of the target chip architecture sent by the scheduler; correspondingly, the process of compiling the intermediate representation into executable code of the target chip architecture through the runtime plug-in may include: Based on the architecture identifier of the target chip architecture sent by the scheduler, the intermediate representation is compiled into executable code of the target chip architecture through the runtime plug-in.
由于目标计算节点可以包括多种不同芯片架构的处理器,因此调度器还可以向目标计算节点发送该目标芯片架构的架构标识,以便该目标计算节点能够确定该中间表示所需编译成的可执行代码的架构。Since the target computing node may include processors of multiple different chip architectures, the scheduler may also send the architecture identifier of the target chip architecture to the target computing node, so that the target computing node can determine the executable that the intermediate representation needs to be compiled into The architecture of the code.
可选地,该方法还可以包括:获取该目标任务的输入数据;通过该运行时插件在该目标芯片架构的处理器中运行该可执行代码的过程可以包括:通过该运行时插件将该输入数据作为可执行代码的输入,在该目标芯片架构的处理器中运行该可执行代码,得到该可执行代码的运行结果;该方法还可以包括:向该调度器发送该运行结果。Optionally, the method may further include: acquiring input data of the target task; the process of running the executable code in the processor of the target chip architecture through the runtime plug-in may include: using the runtime plug-in to input the input data The data is used as the input of the executable code, the executable code is executed in the processor of the target chip architecture, and the execution result of the executable code is obtained; the method may further include: sending the execution result to the scheduler.
该调度器进而可以将该运行结果发送至提供该目标任务的主机,以便主机对该运行结果进行后续处理。例如,主机可以对多个计算节点提供的运行结果进行归约处理。The scheduler can then send the running result to the host providing the target task, so that the host can perform subsequent processing on the running result. For example, the host can perform reduction processing on the running results provided by multiple computing nodes.
可选地,该目标计算节点获取该目标任务的输入数据的过程可以包括:基于该调度指令,从该异构集群的文件管理器中获取该目标任务的输入数据;或者,接收该调度器发送的该目标任务的输入数据。Optionally, the process of acquiring the input data of the target task by the target computing node may include: based on the scheduling instruction, acquiring the input data of the target task from the file manager of the heterogeneous cluster; or, receiving a message sent by the scheduler The input data of the target task.
由于输入数据的数据量相对较大,因此可以通过文件管理器来存储该输入数据,由此能够降低对调度器的存储性能的要求。或者,也可以直接由调度器来转发该输入数据,从而无需在异构集群中额外设置文件管理器,以简化该异构集群的结构,降低异构集群的部署成本。Since the data volume of the input data is relatively large, the input data can be stored by the file manager, thereby reducing the storage performance requirements of the scheduler. Alternatively, the input data can also be directly forwarded by the scheduler, so that there is no need to set up an additional file manager in the heterogeneous cluster, so as to simplify the structure of the heterogeneous cluster and reduce the deployment cost of the heterogeneous cluster.
另一方面,提供了一种任务调度方法,应用于异构集群中的调度器,该异构集群还包括多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同;该方法包括:接收待调度的目标任务的调度需求信息,该调度需求信息包括目标任务的资源需求,以及目标任务支持的至少两种芯片架构;基于该调度需求信息,从多个计算节点中确定目标计算节点,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构;向该目标计算节点发送针对目标任务的调度指令,该调度指令用于指示该目标计算节点通过该目标任务的运行时插件,将该目标任务 的中间表示编译为目标芯片架构的可执行代码,并在该目标芯片架构的处理器中运行该可执行代码,其中,该中间表示是对该目标任务的源代码进行编译得到的与芯片架构无关的代码,。In another aspect, a task scheduling method is provided, which is applied to a scheduler in a heterogeneous cluster, where the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the The method includes: receiving scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task; and determining a target from a plurality of computing nodes based on the scheduling requirement information A computing node, the amount of idle resources of the processor of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures; sending scheduling for the target task to the target computing node instruction, the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor. Executable code, where the intermediate representation is the code independent of the chip architecture obtained by compiling the source code of the target task.
可选地,该调度需求信息还可以包括:该至少两种芯片架构的优先级;相应的,该调度器基于该调度需求信息,从该多个计算节点中确定目标计算节点的过程可以包括:按照该至少两种芯片架构的优先级由高到低的顺序,依次检测该多个计算节点中对应芯片架构的处理器的空闲资源量是否满足该资源需求;若检测到目标芯片架构的处理器的空闲资源量满足该资源需求,则将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。Optionally, the scheduling requirement information may further include: priorities of the at least two chip architectures; correspondingly, based on the scheduling requirement information, the scheduler determines the process of the target computing node from the plurality of computing nodes may include: According to the order of priority of the at least two chip architectures from high to low, sequentially detect whether the amount of idle resources of the processors of the corresponding chip architectures in the plurality of computing nodes meets the resource requirements; if the processor of the target chip architecture is detected If the amount of idle resources satisfies the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
由于不同芯片架构的处理器所擅长处理的任务的类型不同,因此可以在调度需求信息中限定该至少两种芯片架构的优先级,且其中优先级越高的芯片架构越适于处理该目标任务。由此,调度器基于该优先级由高到低的顺序确定目标芯片架构,可以有效确保该目标任务的执行效率。Since processors of different chip architectures are good at processing different types of tasks, the priorities of the at least two chip architectures can be defined in the scheduling requirement information, and the chip architecture with a higher priority is more suitable for processing the target task . Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can effectively ensure the execution efficiency of the target task.
可选地,该方法还可以包括:向该目标计算节点发送该目标芯片架构的架构标识。Optionally, the method may further include: sending the architecture identifier of the target chip architecture to the target computing node.
可选地,该方法还可以包括:接收该目标任务的中间表示和该目标任务的运行时插件,并向该目标计算节点发送该中间表示和该运行时插件。Optionally, the method may further include: receiving the intermediate representation of the target task and the runtime plug-in of the target task, and sending the intermediate representation and the runtime plug-in to the target computing node.
可选地,该目标任务为多个并行任务中的一个并行任务,该调度需求信息还包括:该多个并行任务的并行调度模式;该调度器基于该调度需求信息,从该多个计算节点中确定目标计算节点的过程可以包括:若该多个并行任务的并行调度模式为同步并行模式,则基于该多个并行任务的资源需求之和,从该多个计算节点中确定目标计算节点,该异构集群中该目标芯片架构的处理器的空闲资源量之和满足该多个并行任务的资源需求之和;若该多个并行任务的并行调度模式为理想并行模式,则基于该目标任务的资源需求,从该多个计算节点中确定目标计算节点;其中,同步并行模式是指多个并行任务需同步执行,理想并行模式是指多个并行任务无需同步执行。Optionally, the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks; The process of determining the target computing node may include: if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, then based on the sum of the resource requirements of the multiple parallel tasks, determining the target computing node from the multiple computing nodes, The sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks; if the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, based on the target task The target computing node is determined from the multiple computing nodes; the synchronous parallel mode means that multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that multiple parallel tasks do not need to be executed synchronously.
本申请提供的方案中,调度器可以基于多个并行任务的并行调度模式,采用不同的方式确定该目标计算节点,以确保该多个并行任务能够按照所要求的调度模式可靠执行。In the solution provided by the present application, the scheduler may determine the target computing node in different ways based on the parallel scheduling modes of multiple parallel tasks to ensure that the multiple parallel tasks can be reliably executed according to the required scheduling mode.
又一方面,提供了一种任务调度方法,该方法可以应用于主机,该方法包括:对目标任务的源代码进行编译,得到该目标任务的中间表示和该目标任务的运行时插件,该中间表示为与芯片架构无关的代码;发送该中间表示和该运行时插件;向异构集群中的调度器发送该目标任务的调度需求信息,该调度需求信息包括该目标任务的资源需求,以及该目标任务支持的至少两种芯片架构;其中,该异构集群还包括多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同,该调度需求信息用于指示该调度器将该目标任务调度至该至少两个计算节点中的目标计算节点,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构,该运行时插件用于供该目标计算节点将该中间表示编译为该目标芯片架构的可执行代码。In yet another aspect, a task scheduling method is provided, which can be applied to a host, and the method includes: compiling a source code of a target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, the intermediate Represented as code independent of the chip architecture; send the intermediate representation and the runtime plug-in; send the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, the scheduling requirement information includes the resource requirements of the target task, and the At least two chip architectures supported by the target task; wherein the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to The target task is scheduled to a target computing node of the at least two computing nodes, the idle resources of processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code of the target chip architecture.
可选地,发送该中间表示和该运行时插件的过程可以包括:向该调度器发送该中间表示和该运行时插件;或者,向该异构集群中的文件管理器发送该中间表示和该运行时插件。Optionally, the process of sending the intermediate representation and the runtime plugin may include: sending the intermediate representation and the runtime plugin to the scheduler; or, sending the intermediate representation and the runtime plugin to a file manager in the heterogeneous cluster Runtime plugin.
再一方面,提供了一种目标计算节点,应用于异构集群,该异构集群包括调度器和多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同,该目标计算节点属于该多个计算节点;该目标计算节点包括目标芯片架构的处理器;该目标计算节点还包括至少一个模块,该至少一个模块用于实现上述方面提供的应用于目标计算节点的任务调度方法。In another aspect, a target computing node is provided, which is applied to a heterogeneous cluster. The heterogeneous cluster includes a scheduler and multiple computing nodes. At least two computing nodes in the multiple computing nodes have different chip architectures. The node belongs to the plurality of computing nodes; the target computing node includes a processor of the target chip architecture; the target computing node further includes at least one module, the at least one module is used to implement the task scheduling method applied to the target computing node provided by the above aspects .
再一方面,提供了一种调度器,应用于异构集群,该异构集群还包括多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同;该调度器包括至少一个模块,该至少一个模块用于实现上述方面提供的应用于调度器的任务调度方法。In another aspect, a scheduler is provided, applied to a heterogeneous cluster, the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the scheduler includes at least one A module, where the at least one module is used to implement the task scheduling method applied to the scheduler provided by the above aspects.
再一方面,提供了一种主机,该主机包括至少一个模块,该至少一个模块用于实现上述方面提供的应用于主机的任务调度方法。In another aspect, a host is provided, the host includes at least one module, and the at least one module is used to implement the task scheduling method applied to the host provided by the above aspects.
再一方面,提供了一种计算机设备,该计算机设备包括:存储器,处理器及存储在该存储器上并能够在该处理器上运行的计算机程序,该处理器执行该计算机程序时实现上述方面提供的应用于目标计算节点的任务调度方法,或者,实现上述方面提供的应用于调度器的任务调度方法,又或者,实现上述方面提供的应用于主机的任务调度方法。In yet another aspect, a computer device is provided, the computer device comprising: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the above aspect when the processor executes the computer program. The task scheduling method applied to the target computing node, or the task scheduling method applied to the scheduler provided by the above aspect, or the task scheduling method applied to the host provided by the above aspect.
再一方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,该指令由处理器执行以实现上述方面提供的应用于目标计算节点的任务调度方法,或者,实现上述方面提供的应用于调度器的任务调度方法,又或者,实现上述方面提供的应用于主机的任务调度方法。In yet another aspect, a computer-readable storage medium is provided, where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node provided in the above aspect, or, to achieve The task scheduling method applied to the scheduler provided by the above aspect, or, the task scheduling method applied to the host provided by the above aspect is implemented.
再一方面,提供了一种计算机程序产品,该计算机程序产品在计算机上运行时,可以使得计算机执行实现上述方面提供的应用于目标计算节点的任务调度方法,或者,执行上述方面提供的应用于调度器的任务调度方法,又或者,执行上述方面提供的应用于主机的任务调度方法。In another aspect, a computer program product is provided, which, when running on a computer, can cause the computer to execute the task scheduling method applied to the target computing node provided in the above aspect, or execute the application provided in the above aspect. The task scheduling method of the scheduler, or alternatively, execute the task scheduling method applied to the host provided by the above aspects.
再一方面,提供了一种任务调度系统,该任务调度系统包括:如上述方面提供的主机,如上述方面提供的调度器,以及多个计算节点;该多个计算节点中至少一个计算节点为如上述方面提供的目标计算节点。In another aspect, a task scheduling system is provided, the task scheduling system includes: the host provided in the above aspect, the scheduler provided in the above aspect, and multiple computing nodes; at least one computing node in the multiple computing nodes is A target computing node as provided in the above aspect.
再一方面,提供了一种任务调度系统,该任务调度系统包括:主机,调度器,以及多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同;In another aspect, a task scheduling system is provided, the task scheduling system includes: a host computer, a scheduler, and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures;
该主机,用于对目标任务的源代码进行编译,得到该目标任务的中间表示和该目标任务的运行时插件,发送该中间表示和该运行时插件,以及向该度器发送该目标任务的调度需求信息,其中,该中间表示为与芯片架构无关的代码,该调度需求信息包括该目标任务的资源需求,以及该目标任务支持的至少两种芯片架构;The host is used for compiling the source code of the target task, obtaining the intermediate representation of the target task and the runtime plug-in of the target task, sending the intermediate representation and the runtime plug-in, and sending the information of the target task to the programmer Scheduling requirement information, wherein the intermediate representation is code independent of chip architecture, and the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;
该调度器,用于基于该调度需求信息,从该多个计算节点中确定目标计算节点,并向该目标计算节点发送针对该目标任务的调度指令,其中,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构;The scheduler is configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, and send a scheduling instruction for the target task to the target computing node, wherein the target computing node in the target chip architecture The amount of idle resources of the processor meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures;
该目标计算节点,用于基于该调度指令,通过该运行时插件将该中间表示编译为该目标芯片架构的可执行代码,并通过该运行时插件在目标芯片架构的处理器中运行该可执行代 码。The target computing node is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code on the processor of the target chip architecture through the runtime plug-in code.
本申请提供的方案至少具有如下有益效果:The solution provided by this application has at least the following beneficial effects:
本申请提供了一种任务调度方法、装置及系统,目标计算节点可以获取到目标任务的中间表示和运行时插件。由于该中间表示是与处理器的芯片架构无关的代码,因此目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为均衡,有效提高异构集群的资源利用率。The present application provides a task scheduling method, device and system, and a target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
附图说明Description of drawings
图1是本申请实施例提供的一种任务调度方法的应用场景的示意图;1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application;
图2是本申请实施例提供的另一种任务调度方法的应用场景的示意图;2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application;
图3是本申请实施例提供的一种任务调度方法的流程图;3 is a flowchart of a task scheduling method provided by an embodiment of the present application;
图4是本申请实施例提供的一种任务调度框架的示意图;4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application;
图5是本申请实施例提供的又一种任务调度方法的应用场景的示意图;5 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application;
图6是本申请实施例提供的再一种任务调度方法的应用场景的示意图;6 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application;
图7是本申请实施例提供的一种编译过程的示意图;7 is a schematic diagram of a compilation process provided by an embodiment of the present application;
图8是本申请实施例提供的一种任务调度过程的示意图;8 is a schematic diagram of a task scheduling process provided by an embodiment of the present application;
图9是本申请实施例提供的一种确定目标计算节点的方法流程图;9 is a flowchart of a method for determining a target computing node provided by an embodiment of the present application;
图10是本申请实施例提供的再一种任务调度方法的应用场景的示意图;10 is a schematic diagram of an application scenario of yet another task scheduling method provided by an embodiment of the present application;
图11是本申请实施例提供的一种目标计算节点的结构示意图;11 is a schematic structural diagram of a target computing node provided by an embodiment of the present application;
图12是本申请实施例提供的一种调度器的结构示意图;FIG. 12 is a schematic structural diagram of a scheduler provided by an embodiment of the present application;
图13是本申请实施例提供的一种主机的结构示意图;13 is a schematic structural diagram of a host provided by an embodiment of the present application;
图14是本申请实施例提供的一种计算机设备的结构示意图。FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图详细介绍本申请实施例提供的任务调度方法、装置及系统。The task scheduling method, device, and system provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
图1是本申请实施例提供的一种任务调度方法的应用场景的示意图。如图1所示,该应用场景包括异构集群,该异构集群包括:管理节点01,以及与该管理节点01连接的多个计算节点02。该多个计算节点02中至少两个计算节点02采用的处理器的芯片架构不同。例如,该多个计算节点02中,部分计算节点02采用的处理器为CPU,部分计算节点02采用的 处理器为GPU,其余计算节点02采用的处理器为NPU。FIG. 1 is a schematic diagram of an application scenario of a task scheduling method provided by an embodiment of the present application. As shown in FIG. 1 , the application scenario includes a heterogeneous cluster, and the heterogeneous cluster includes: a management node 01 and a plurality of computing nodes 02 connected to the management node 01 . At least two computing nodes 02 of the plurality of computing nodes 02 adopt different chip architectures of processors. For example, among the plurality of computing nodes 02, some computing nodes 02 use a CPU, some computing nodes 02 use a GPU, and other computing nodes 02 use a NPU.
参考图1,该管理节点01中部署有调度器011,并且该应用场景还可以包括主机03。该主机03可以向该调度器011发送待调度的目标任务,该调度器011进而可以将该目标任务调度至至少一个计算节点02中执行。Referring to FIG. 1 , a scheduler 011 is deployed in the management node 01 , and the application scenario may further include a host 03 . The host 03 may send the target task to be scheduled to the scheduler 011, and the scheduler 011 may then schedule the target task to at least one computing node 02 for execution.
图2是本申请实施例提供的另一种任务调度方法的应用场景的示意图。如图2所示,该主机03中部署有加速库031,该加速库031为用于对处理器的性能进行优化的软件集合。该加速库031可以用于向调度器011发送待调度的目标任务。FIG. 2 is a schematic diagram of an application scenario of another task scheduling method provided by an embodiment of the present application. As shown in FIG. 2 , an acceleration library 031 is deployed in the host 03 , and the acceleration library 031 is a software collection for optimizing the performance of the processor. The acceleration library 031 can be used to send the target task to be scheduled to the scheduler 011 .
可以理解的是,该异构集群中的任一计算节点02也可以作为主机向该调度器011发送待调度的任务。相应的,该应用场景也可以不包括独立于该异构集群的主机03。It can be understood that any computing node 02 in the heterogeneous cluster can also serve as a host to send tasks to be scheduled to the scheduler 011 . Correspondingly, the application scenario may also not include the host 03 independent of the heterogeneous cluster.
还可以理解的是,该管理节点01也可以具有计算节点的功能,即该管理节点01不仅可以调度任务,还可以执行任务。It can also be understood that the management node 01 may also have the function of a computing node, that is, the management node 01 can not only schedule tasks, but also execute tasks.
相关技术中,异构集群中的调度器需要预先记录每个计算节点的芯片架构的类型。主机在向调度器提交任务时,也需要在提交的任务中标记该任务所采用的可执行代码的架构。调度器接收到任务后,即可根据该任务中标记的可执行代码的架构,将该任务调度至芯片架构相匹配的计算节点中执行。但是,若主机提交的大量任务中,各个任务所采用的可执行代码的架构是不均衡的,则基于上述任务调度方式,会导致异构集群中各计算节点的负载不均衡。In the related art, the scheduler in the heterogeneous cluster needs to record the type of chip architecture of each computing node in advance. When the host submits a task to the scheduler, it also needs to mark the architecture of the executable code used by the task in the submitted task. After receiving the task, the scheduler can schedule the task to be executed in a computing node matching the chip architecture according to the architecture of the executable code marked in the task. However, if among a large number of tasks submitted by the host, the architecture of the executable code adopted by each task is unbalanced, based on the above task scheduling method, the load of each computing node in the heterogeneous cluster will be unbalanced.
示例的,假设主机提交的任务中可执行代码的架构为X86架构,则该任务只能调度到处理器架构为X86的计算节点中运行。若异构集群中处理器架构为X86的计算节点已经没有可用的空闲资源,而处理器架构为ARM的计算节点中却有空闲资源。则在该场景下,将无法利用异构集群的空闲资源处理该任务,导致资源利用率较低。For example, assuming that the architecture of the executable code in the task submitted by the host is the X86 architecture, the task can only be scheduled to run on a computing node whose processor architecture is X86. If the computing node whose processor architecture is X86 in the heterogeneous cluster has no available idle resources, but the computing node whose processor architecture is ARM has idle resources. In this scenario, the idle resources of the heterogeneous cluster cannot be used to process the task, resulting in low resource utilization.
或者,主机提交的任务可以包括多个不同架构的可执行代码。调度器接收到任务后,可以根据各个计算节点02的负载确定用于执行该任务的目标计算节点,并将该任务调度至该目标计算节点。由于该任务包括多个不同架构的可执行代码,因此该目标计算节点可以执行架构与其处理器的芯片架构相同的可执行代码。但是,该调度方式需要主机实现多种不同架构的可执行代码,导致成本较高。Alternatively, a host-submitted task may include multiple executables of different architectures. After receiving the task, the scheduler can determine a target computing node for executing the task according to the load of each computing node 02, and schedule the task to the target computing node. Since the task includes executable codes of multiple different architectures, the target computing node can execute executable codes whose architecture is the same as the chip architecture of its processor. However, this scheduling method requires the host to implement executable codes of various architectures, resulting in high cost.
本申请实施例提供了一种用于在异构集群中实现自适应任务调度的分布式中间件。其中,自适应任务调度是指基于异构集群的资源使用情况自适应的调度任务。相应的,该异构集群也可以称为自适应集群(adaptive cluster),该分布式中间件也可以称为自适应中间件。结合图1和图2,该分布式中间件可以包括:部署在主机03中的中间件编程接口032、部署在管理节点01中的调度器011,以及部署在计算节点02中的集群代理021。The embodiment of the present application provides a distributed middleware for implementing adaptive task scheduling in a heterogeneous cluster. The adaptive task scheduling refers to scheduling tasks that are adaptive based on resource usage of heterogeneous clusters. Correspondingly, the heterogeneous cluster may also be referred to as an adaptive cluster, and the distributed middleware may also be referred to as adaptive middleware. 1 and 2 , the distributed middleware may include: a middleware programming interface 032 deployed in the host 03 , a scheduler 011 deployed in the management node 01 , and a cluster agent 021 deployed in the computing node 02 .
该中间件编程接口032用于为加速库031提供接入异构集群的能力,即该加速库031可以通过调用该中间件编程接口032与异构集群中的组件交互数据。例如,该加速库031可以通过该中间件编程接口032向调度器011发送目标任务的调度需求信息、目标任务的中间表示以及目标任务的运行时插件。The middleware programming interface 032 is used to provide the acceleration library 031 with the ability to access heterogeneous clusters, that is, the acceleration library 031 can exchange data with components in the heterogeneous cluster by calling the middleware programming interface 032 . For example, the acceleration library 031 may send the scheduling requirement information of the target task, the intermediate representation of the target task, and the runtime plug-in of the target task to the scheduler 011 through the middleware programming interface 032 .
其中,该中间表示(intermediate representation)也可以称为中间语言或中间代码,是一种源代码的等效内部表示代码。并且,该中间表示与处理器的芯片架构无关,即该中间表示可以被编译为不同架构的可执行代码(也可以称为目标代码)。The intermediate representation may also be called intermediate language or intermediate code, which is an equivalent internal representation code of the source code. Moreover, the intermediate representation is independent of the chip architecture of the processor, that is, the intermediate representation can be compiled into executable codes (also referred to as object codes) of different architectures.
运行时(run time)是一种编程语言的运行环境,该运行环境是一种能够为正在运行的程序提供软件服务的虚拟环境。该运行时插件是指能够提供该中间表示的运行环境的组件。由于本申请实施例提供的该运行时插件支持应用在异构设备环境下运行,因此也可以称为异构运行时插件。该运行时插件能够提供用于被分布式中间件中的集群代理021调用的插件接口,以便该集群代理021能够初始化该运行时插件,反初始化该运行时插件,运行该运行时插件,以及退出清理该运行时插件。Runtime is the runtime environment of a programming language, which is a virtual environment that can provide software services for running programs. The runtime plugin refers to a component capable of providing the runtime environment of the intermediate representation. Since the runtime plug-in provided by the embodiment of the present application supports the application to run in a heterogeneous device environment, it may also be called a heterogeneous runtime plug-in. The runtime plugin can provide a plugin interface for being called by the cluster agent 021 in the distributed middleware so that the cluster agent 021 can initialize the runtime plugin, deinitialize the runtime plugin, run the runtime plugin, and exit Clean up the runtime plugin.
该调度器011用于基于异构群内的异构资源的使用情况进行任务调度。如图2所示,该调度器011主要包括任务管理和调度器0111,以及资源管理和调度器0112。该资源管理和调度器0112用于管理和调度异构集群中各个计算节点02的资源,该资源至少包括处理器资源,还可以包括内存资源等。任务管理和调度器0111用于基于加速库031发送的目标任务的资源需求,向资源管理和调度器0112发送资源调度请求。该资源管理和调度器0112可以基于该资源调度请求为目标任务分配资源。假设资源管理和调度器0112基于各个计算节点02的资源使用情况,为该目标任务分配的资源包括目标计算节点02中目标芯片架构的处理器资源,则任务管理和调度器0111可以基于该分配的资源向目标计算节点02分发该目标任务。The scheduler 011 is used to schedule tasks based on the usage of heterogeneous resources in the heterogeneous group. As shown in FIG. 2 , the scheduler 011 mainly includes a task management and scheduler 0111 and a resource management and scheduler 0112 . The resource management and scheduler 0112 is used to manage and schedule the resources of each computing node 02 in the heterogeneous cluster, and the resources include at least processor resources, and may also include memory resources and the like. The task management and scheduler 0111 is configured to send a resource scheduling request to the resource management and scheduler 0112 based on the resource requirement of the target task sent by the acceleration library 031 . The resource manager and scheduler 0112 can allocate resources to target tasks based on the resource scheduling request. Assuming that the resource management and scheduler 0112 is based on the resource usage of each computing node 02, and the resources allocated for the target task include the processor resources of the target chip architecture in the target computing node 02, the task management and scheduler 0111 can be based on the allocated resources. The resource distributes the target task to the target computing node 02 .
该集群代理021主要用于启动任务服务实例,以及管理运行时插件。如图2所示,该集群代理021包括资源层代理0211和任务层代理0212。其中,资源层代理0211用于收集计算节点02的资源信息并上报至资源管理和调度器0112,以使计算节点02加入异构集群。该任务层代理0212用于基于资源层代理0211提供的资源启动任务服务实例,该任务服务实例中运行有目标任务的运行时插件,或者可以理解为,该任务服务实例包括运行时插件实例。该任务层代理022还可以用于将目标任务的中间表示,以及资源管理和调度器0112确定出的目标芯片架构发送至该运行时插件。该运行时插件进而可以将该中间表示编译为该目标芯片架构的可执行代码,并在该目标芯片架构的处理器中运行该可执行代码,由此实现目标任务的运行。The cluster agent 021 is mainly used to start task service instances and manage runtime plug-ins. As shown in FIG. 2 , the cluster agent 021 includes a resource-level agent 0211 and a task-level agent 0212 . Among them, the resource layer agent 0211 is used to collect the resource information of the computing node 02 and report it to the resource management and scheduler 0112, so that the computing node 02 joins the heterogeneous cluster. The task layer agent 0212 is used to start a task service instance based on the resources provided by the resource layer agent 0211, and the task service instance runs a runtime plug-in of the target task, or it can be understood that the task service instance includes a runtime plug-in instance. The task layer proxy 022 can also be used to send the intermediate representation of the target task and the target chip architecture determined by the resource manager and scheduler 0112 to the runtime plug-in. The runtime plug-in can then compile the intermediate representation into executable code of the target chip architecture, and run the executable code in the processor of the target chip architecture, thereby realizing the running of the target task.
由于加速库031提供的目标任务的中间表示是与芯片架构无关的代码,因此管理节点01中的调度器011可以基于各个计算节点02的资源使用情况调度该目标任务,而无需考虑该目标任务的可执行代码的架构。由此,能够在无需主机03提供多个不同架构的可执行代码的前提下,有效提高异构集群的资源利用率。又由于调度器011无需根据任务中标记的可执行代码的架构,确定芯片架构与该架构相匹配的计算节点,因此可以有效降低资源管理和调度的复杂度,进而提高任务调度的效率。Since the intermediate representation of the target task provided by the acceleration library 031 is code independent of the chip architecture, the scheduler 011 in the management node 01 can schedule the target task based on the resource usage of each computing node 02 without considering the target task's The architecture of the executable code. In this way, the resource utilization rate of the heterogeneous cluster can be effectively improved without the need for the host 03 to provide multiple executable codes of different architectures. In addition, since the scheduler 011 does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, it can effectively reduce the complexity of resource management and scheduling, thereby improving the efficiency of task scheduling.
可以理解的是,在本申请实施例中,不同领域的加速库均可以与上述分布式中间件结合,以实现目标任务在异构集群中的自适应调度。It can be understood that, in the embodiment of the present application, acceleration libraries in different fields can be combined with the above-mentioned distributed middleware to realize adaptive scheduling of target tasks in heterogeneous clusters.
本申请实施例提供了一种任务调度方法,该方法可以应用于上述实施例所提供的应用场景中。参考图3,本申请实施例提供的任务调度方法包括:The embodiment of the present application provides a task scheduling method, and the method can be applied to the application scenarios provided by the foregoing embodiments. Referring to FIG. 3 , the task scheduling method provided by the embodiment of the present application includes:
步骤101、主机对目标任务的源代码进行编译,得到该目标任务的中间表示和运行时插件。Step 101: The host compiles the source code of the target task to obtain an intermediate representation and a runtime plug-in of the target task.
在本申请实施例中,如图1和图2所示,该主机03中的加速库031可以对目标任务的源代码进行编译,得到该目标任务的中间表示和运行时插件。In the embodiment of the present application, as shown in FIG. 1 and FIG. 2 , the acceleration library 031 in the host 03 can compile the source code of the target task to obtain the intermediate representation and runtime plug-in of the target task.
图4是本申请实施例提供的一种任务调度框架的示意图。如图4所示,该加速库031可以为不同领域的加速库。示例的,该加速库031可以为并行编程加速库,例如可以为开放式并行处理(open multi-processing,OpenMP)加速库。或者,该加速库031还可以是数值计算加速库、图计算加速库、数据帧(data frame)加速库或机器学习加速库等其他类型的加速库。可选地,该数值计算加速库可以包括数值Python(numerical Python,NumPy)加速库,该数据帧加速库可以为pandas加速库,该机器学习加速库可以包括Scikit学习(Scikit-learn)加速库。其中,pandas是Python的一个数据分析包。FIG. 4 is a schematic diagram of a task scheduling framework provided by an embodiment of the present application. As shown in FIG. 4 , the acceleration library 031 may be an acceleration library in different fields. Exemplarily, the acceleration library 031 may be a parallel programming acceleration library, such as an open multi-processing (open multi-processing, OpenMP) acceleration library. Alternatively, the acceleration library 031 may also be other types of acceleration libraries, such as a numerical calculation acceleration library, a graph calculation acceleration library, a data frame (data frame) acceleration library, or a machine learning acceleration library. Optionally, the numerical computation acceleration library may include a numerical Python (numerical Python, NumPy) acceleration library, the data frame acceleration library may be a pandas acceleration library, and the machine learning acceleration library may include a Scikit-learn (Scikit-learn) acceleration library. Among them, pandas is a data analysis package for Python.
参考图4,该目标任务可以为计算机视觉(computer vision,CV)应用、自然语言处理(natural language processing,NLP)应用或机器学习预测应用等不同领域的应用的任务。该加速库031可以通过编译器将该目标任务的源代码编译为与芯片架构无关的中间表示,并得到与该编译器相关的运行时插件。其中,该中间表示可以为标准可移植中间表示(standard portable intermediate representation-V,SPIR-V)或者网络组件(WebAssembly,WASM)中间表示等。该运行时插件可以为张量虚拟机(tensor virtual machine,TVM)运行时插件、SPIR-V运行时插件或者WASM运行时插件等。Referring to FIG. 4 , the target task can be a task of applications in different fields such as computer vision (computer vision, CV) applications, natural language processing (natural language processing, NLP) applications, or machine learning prediction applications. The acceleration library 031 can compile the source code of the target task into an intermediate representation independent of the chip architecture through a compiler, and obtain a runtime plug-in related to the compiler. Wherein, the intermediate representation may be a standard portable intermediate representation (standard portable intermediate representation-V, SPIR-V) or a web component (WebAssembly, WASM) intermediate representation, or the like. The runtime plugin may be a tensor virtual machine (tensor virtual machine, TVM) runtime plugin, a SPIR-V runtime plugin, or a WASM runtime plugin, or the like.
继续参考图4,该加速库031所采用的编译器的编程框架可以包括下述框架中的任一种:Python、Java、Go、数值领域专用语言(domain specified language,DSL)、表结构DSL、分布并行DSL和C++异构编程框架等。其中,Python、Java和Go均为计算机编程语言的名称。Continuing to refer to FIG. 4 , the programming framework of the compiler adopted by the acceleration library 031 may include any one of the following frameworks: Python, Java, Go, numerical domain specified language (DSL), table structure DSL, Distributed parallel DSL and C++ heterogeneous programming framework, etc. Among them, Python, Java and Go are the names of computer programming languages.
可选地,如图5所示,该目标任务的源代码可以为某个程序中的代码片段,该代码片段也可以称为待加速的代码片段,或者加速核(kernel)代码片段。开发人员可以采用设备导语的方式预先对该代码片段进行标注,加速库031可以对该被标注的代码片段进行编译,得到目标任务的中间表示和运行时插件。Optionally, as shown in FIG. 5 , the source code of the target task may be a code fragment in a certain program, and the code fragment may also be referred to as a code fragment to be accelerated, or an accelerated kernel (kernel) code fragment. The developer can mark the code fragment in advance by using the device guide, and the acceleration library 031 can compile the marked code fragment to obtain the intermediate representation of the target task and the runtime plug-in.
示例的,假设该加速库为OpenMP加速库,该OpenMP加速库中运行的用于实现矩阵相乘(matmul)运算的程序如下:For example, assuming that the acceleration library is an OpenMP acceleration library, the program running in the OpenMP acceleration library for implementing the matrix multiplication (matmul) operation is as follows:
Figure PCTCN2021142532-appb-000001
Figure PCTCN2021142532-appb-000001
Figure PCTCN2021142532-appb-000002
Figure PCTCN2021142532-appb-000002
上述程序中,float表示浮点型数据类型,int表示整数类型,A和B表示两个输入矩阵,C表示输出矩阵,即矩阵C等于矩阵A和矩阵B的乘积。“pragma omp parallel for”是OpenMP中的一个指令,表示下文的for循环将被多线程执行。In the above program, float represents a floating-point data type, int represents an integer type, A and B represent two input matrices, and C represents an output matrix, that is, matrix C is equal to the product of matrix A and matrix B. "pragma omp parallel for" is a directive in OpenMP, indicating that the following for loop will be executed by multiple threads.
在本申请实施例中,若该两个输入矩阵A和B的数据量较大,则为了提高上述矩阵相乘运算的计算效率,可以通过异构集群中的计算节点并行执行上述程序中的for循环。也即是,可以将该for循环的代码片段卸载至异构集群中执行,相应的,该目标任务的源代码即为该程序中的for循环。In the embodiment of the present application, if the data volume of the two input matrices A and B is relatively large, in order to improve the calculation efficiency of the above-mentioned matrix multiplication operation, the for in the above-mentioned program may be executed in parallel by the computing nodes in the heterogeneous cluster cycle. That is, the code fragment of the for loop can be unloaded to a heterogeneous cluster for execution, and correspondingly, the source code of the target task is the for loop in the program.
示例的,开发人员可以在上述矩阵相乘运算的程序中添加设备导语,以实现对该for循环的标记,该添加有设备导语的程序为:For example, the developer can add a device directive to the above-mentioned matrix multiplication program to mark the for loop. The program with the device directive added is:
Figure PCTCN2021142532-appb-000003
Figure PCTCN2021142532-appb-000003
上述程序中,“pragma omp target device(ADAPTIVE_CLUSTER)”即为添加的设备导语,该设备导语表示将后续的代码片段从主机卸载到目标设备(target device)中执行。在本申请实施例中,该目标设备即为异构集群。“ADAPTIVE_CLUSTER”为本申请实施例中定义的目标设备的名称。OpenMP加速库在编译上述程序的过程中,可以将被设备导语标注的代码片段(for循环)编译为中间表示。之后,OpenMP加速库在运行上述程序的可执行代码的过程中,当检测到被设备导语标注的代码片段的中间表示要被执行时,即可将该中间表示卸载至异构集群中执行。In the above program, "pragma omp target device(ADAPTIVE_CLUSTER)" is the added device guideline, which means to unload the subsequent code fragment from the host to the target device (target device) for execution. In this embodiment of the present application, the target device is a heterogeneous cluster. "ADAPTIVE_CLUSTER" is the name of the target device defined in this embodiment of the application. In the process of compiling the above program, the OpenMP acceleration library can compile the code fragment (for loop) marked by the device directive into an intermediate representation. Then, in the process of running the executable code of the above program, when the OpenMP acceleration library detects that the intermediate representation of the code fragment marked by the device guideline is to be executed, the intermediate representation can be unloaded to the heterogeneous cluster for execution.
可选地,如图6所示,加速库031可以通过底层虚拟机(low level virtual machine,LLVM)对该目标任务的源代码进行编译得到胖二进制(fat binary)文件。该fat binary 文件包含主机代码(例如main函数)以及与芯片架构无关的中间表示。其中,该fat binary文件的文件格式可以为可执行与可链接格式(executable and linkable format,ELF)。Optionally, as shown in FIG. 6 , the acceleration library 031 may obtain a fat binary (fat binary) file by compiling the source code of the target task through a low-level virtual machine (low level virtual machine, LLVM). The fat binary file contains host code (such as the main function) and an intermediate representation independent of the chip architecture. The file format of the fat binary file may be an executable and linkable format (executable and linkable format, ELF).
图7是本申请实施例提供的一种编程框架的示意图。如图7所示,该加速库031可以采用DSL编译器对目标任务的源代码进行编译。该编译过程可以包括:算法抽象、计算图优化、数据图优化、通信图优化以及抽象语法树生成等步骤。上述各个步骤可以由加速库031自动化调度,或者由用户自定义调度。FIG. 7 is a schematic diagram of a programming framework provided by an embodiment of the present application. As shown in FIG. 7 , the acceleration library 031 can use a DSL compiler to compile the source code of the target task. The compilation process may include steps such as algorithm abstraction, computation graph optimization, data graph optimization, communication graph optimization, and abstract syntax tree generation. The above steps can be automatically scheduled by the acceleration library 031, or scheduled by the user.
步骤102、主机向异构集群中的文件管理器发送该目标任务的中间表示、该运行时插件以及该目标任务的输入数据。Step 102: The host sends the intermediate representation of the target task, the runtime plug-in, and the input data of the target task to the file manager in the heterogeneous cluster.
如图5和图8所示,该异构集群还可以包括文件管理器04,该主机01中的加速库031可以通过调用中间件编程接口032,向该文件管理器04发送该目标任务的中间表示、该运行时插件以及该目标任务的输入数据。例如,假设该目标任务的源代码为矩阵相乘运算中的for循环,则该输入数据可以包括输入矩阵A和输入矩阵B。As shown in FIG. 5 and FIG. 8 , the heterogeneous cluster may further include a file manager 04, and the acceleration library 031 in the host 01 can send the middleware of the target task to the file manager 04 by calling the middleware programming interface 032 representation, the runtime plugin, and the input data for the target task. For example, assuming that the source code of the target task is a for loop in a matrix multiplication operation, the input data may include an input matrix A and an input matrix B.
可选地,参考图5,该异构集群还可以包括网关05,该网关05分别与调度器011和文件管理器04连接。如图5中的步骤S1所示,该加速库031可以通过调用该网关05提供的软件开发工具包(software development kit,SDK)接口,向该网关05发送该目标任务的中间表示、运行时插件以及输入数据。该网关05进而可以将接收到的数据转发至文件管理器04。其中,该SDK接口的主要组成部分即为中间件编程接口032。Optionally, referring to FIG. 5 , the heterogeneous cluster may further include a gateway 05 , and the gateway 05 is respectively connected with the scheduler 011 and the file manager 04 . As shown in step S1 in FIG. 5 , the acceleration library 031 can send the intermediate representation of the target task and the runtime plug-in to the gateway 05 by calling the software development kit (SDK) interface provided by the gateway 05 and input data. The gateway 05 can in turn forward the received data to the file manager 04 . Among them, the main component of the SDK interface is the middleware programming interface 032 .
可以理解的是,该文件管理器04可以包括一个或多个具有文件存储功能的存储设备。该异构集群中的各个计算节点02均与该文件管理器04建立有通信连接,且能够从该文件管理器04中获取数据。It can be understood that the file manager 04 may include one or more storage devices with file storage functions. Each computing node 02 in the heterogeneous cluster has established a communication connection with the file manager 04 and can obtain data from the file manager 04 .
步骤103、主机向异构集群中的调度器发送该目标任务的调度需求信息。Step 103: The host sends the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster.
主机03中的加速库031可以通过调用中间件编程接口032,向管理节点01中的调度器011发送该目标任务的调度需求信息。例如,参考图8,加速库031可以向调度器011中的任务管理和调度器011发送调度需求信息。该调度需求信息包括该目标任务的资源需求,以及该目标任务支持的至少两种芯片架构。并且,该调度需求信息可以由该加速库031配置。The acceleration library 031 in the host 03 can send the scheduling requirement information of the target task to the scheduler 011 in the management node 01 by calling the middleware programming interface 032 . For example, referring to FIG. 8 , the acceleration library 031 may send scheduling requirement information to the task management and scheduler 011 in the scheduler 011 . The scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task. And, the scheduling requirement information can be configured by the acceleration library 031 .
示例的,假设执行该目标任务需要的处理器的资源量为x,该目标任务支持的芯片架构包括X86、ARM和GPU,则如图5中的步骤S2,该加速库031可以通过调用中间件编程接口032向网关05发送该目标任务的资源需求:处理器资源量X,以及三种芯片架构:X86、ARM和GPU。该网关05进而可以将接收到的调度需求信息转发至调度器011。As an example, assuming that the amount of processor resources required to execute the target task is x, and the chip architectures supported by the target task include X86, ARM, and GPU, as shown in step S2 in FIG. 5, the acceleration library 031 can call the middleware by calling The programming interface 032 sends to the gateway 05 the resource requirements of the target task: the amount of processor resources X, and three chip architectures: X86, ARM and GPU. The gateway 05 can then forward the received scheduling requirement information to the scheduler 011 .
步骤104、调度器基于该调度需求信息,从多个计算节点中确定目标计算节点。Step 104: The scheduler determines a target computing node from a plurality of computing nodes based on the scheduling requirement information.
在本申请实施例中,调度器011接收到加速库031发送的待调度的目标任务的调度需求信息后,即可基于异构集群中各计算节点02的资源使用情况,从该多个计算节点02中确定出满足该目标任务的执行条件的目标计算节点。其中,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构。In the embodiment of the present application, after the scheduler 011 receives the scheduling requirement information of the target task to be scheduled sent by the acceleration library 031, it can, based on the resource usage of each computing node 02 in the heterogeneous cluster, select the data from the multiple computing nodes. In 02, a target computing node that satisfies the execution conditions of the target task is determined. Wherein, the amount of idle resources of the processor of the target chip architecture in the target computing node meets the resource requirement of the target task, and the target chip architecture belongs to the at least two chip architectures.
可选地,该加速库031发送的目标任务的调度需求信息还可以包括:该至少两种芯片架 构的优先级。相应的,该调度器011可以按照该优先级由高到低的顺序,依次检测异构集群中每种芯片架构的处理器的空闲资源量,并从该多个计算节点02中确定出目标计算节点。Optionally, the scheduling requirement information of the target task sent by the acceleration library 031 may further include: priorities of the at least two chip architectures. Correspondingly, the scheduler 011 can sequentially detect the amount of idle resources of processors of each chip architecture in the heterogeneous cluster in the order of the priority from high to low, and determine the target computing from the plurality of computing nodes 02 node.
由于不同芯片架构的处理器所擅长处理的任务的类型不同,例如CPU擅长标量运算,GPU擅长向量运算,而NPU擅长矩阵运算。因此,在本申请提供的方案中,可以在调度需求信息中限定该至少两种芯片架构的优先级,其中优先级越高的芯片架构越适于处理该目标任务。由此,调度器基于该优先级由高到低的顺序确定目标芯片架构,可以在提高异构集群的资源利用率的同时,尽量确保该目标任务的执行效率。Because processors of different chip architectures are good at different types of tasks, for example, CPUs are good at scalar operations, GPUs are good at vector operations, and NPUs are good at matrix operations. Therefore, in the solution provided by the present application, the priorities of the at least two chip architectures may be defined in the scheduling requirement information, wherein the chip architecture with a higher priority is more suitable for processing the target task. Therefore, the scheduler determines the target chip architecture based on the order of the priority from high to low, which can ensure the execution efficiency of the target task as much as possible while improving the resource utilization of the heterogeneous cluster.
可选地,为了提高任务的执行效率,主机03中的加速库031可以将待执行的任务拆分为多个并行任务,以便异构集群中的各个计算节点02能够并行执行该多个并行任务。相应的,该目标任务即为该多个并行任务中的一个并行任务,该调度需求信息还可以包括:该多个并行任务的并行调度模式。该并行调度模式可以包括同步并行模式和理想并行模式。Optionally, in order to improve the execution efficiency of the task, the acceleration library 031 in the host 03 can split the task to be executed into multiple parallel tasks, so that each computing node 02 in the heterogeneous cluster can execute the multiple parallel tasks in parallel. . Correspondingly, the target task is a parallel task among the multiple parallel tasks, and the scheduling requirement information may further include: parallel scheduling modes of the multiple parallel tasks. The parallel scheduling mode may include a synchronous parallel mode and an ideal parallel mode.
其中,该同步并行模式是指该多个并行任务需同步执行,因此在调度该多个并行任务时,需要确保将该多个并行任务同步调度至相同芯片架构的处理器中执行。该理想并行模式是指该多个并行任务不要求同步执行,即该多个并行任务可以同步执行,或者也可以先执行部分并行任务可以先执行,然后再执行剩余的并行任务。因此在调度该多个并行任务时,可以将该多个并行任务调度至不同芯片架构的处理器中执行。该理想并行也可以称为尴尬并行(embarrassing parallel)。The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the multiple parallel tasks, it is necessary to ensure that the multiple parallel tasks are scheduled to be executed in processors of the same chip architecture. The ideal parallel mode means that the multiple parallel tasks do not require synchronous execution, that is, the multiple parallel tasks can be executed synchronously, or some parallel tasks can be executed first, and then the remaining parallel tasks can be executed. Therefore, when scheduling the multiple parallel tasks, the multiple parallel tasks can be scheduled to processors of different chip architectures for execution. This ideal parallel can also be called embarrassing parallel.
下文以该调度需求信息包括该至少两种芯片架构的优先级,以及该多个并行任务的并行调度模式为例,对上述步骤104的实现过程进行介绍。如图9所示,该步骤104可以包括:The implementation process of the foregoing step 104 is described below by taking the scheduling requirement information including the priorities of the at least two chip architectures and the parallel scheduling modes of the multiple parallel tasks as an example. As shown in FIG. 9, this step 104 may include:
步骤1041、确定多个并行任务的并行调度模式。Step 1041: Determine the parallel scheduling mode of multiple parallel tasks.
调度器011可以基于接收到的调度需求信息确定该多个并行任务的并行调度模式。若该多个并行任务的并行调度模式为理想并行模式,则调度器011可以执行下述步骤1042a和步骤1043a;若该多个并行任务的并行调度模式为同步并行模式,则调度器011可以执行下述步骤1042b和步骤1043b。The scheduler 011 may determine the parallel scheduling mode of the multiple parallel tasks based on the received scheduling requirement information. If the parallel scheduling mode of the multiple parallel tasks is the ideal parallel mode, the scheduler 011 may execute the following steps 1042a and 1043a; if the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the scheduler 011 may execute Steps 1042b and 1043b are described below.
步骤1042a、按照优先级由高到低的顺序,依次检测多个计算节点中对应芯片架构的处理器的空闲资源量是否满足目标任务的资源需求。 Step 1042a, in order of priority from high to low, sequentially detect whether the amount of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the resource requirements of the target task.
若该多个并行任务的并行调度模式为理想并行模式,则调度器011可以直接基于该目标任务的资源需求确定目标计算节点。也即是,调度器011在调度该目标任务时,仅需确保异构集群中某个芯片架构的处理器的空闲资源量能够满足该目标任务的资源需求即可,而无需确保异构集群中某个芯片架构的处理器的空闲资源量之和满足该多个并行任务的资源需求之和。If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, the scheduler 011 may directly determine the target computing node based on the resource requirements of the target tasks. That is, when the scheduler 011 schedules the target task, it only needs to ensure that the idle resources of processors of a certain chip architecture in the heterogeneous cluster can meet the resource requirements of the target task, without ensuring that the The sum of idle resources of processors of a certain chip architecture satisfies the sum of resource requirements of the multiple parallel tasks.
示例的,假设该目标任务支持的三种芯片架构的优先级满足:X86>ARM>GPU。则调度器011可以按照X86、ARM和GPU的顺序,依次检测多个计算节点中每种芯片架构的处理器的空闲资源量是否满足目标任务的资源需求。For example, it is assumed that the priorities of the three chip architectures supported by the target task satisfy: X86>ARM>GPU. Then, the scheduler 011 can sequentially detect whether the amount of idle resources of processors of each chip architecture in the multiple computing nodes meets the resource requirements of the target task in the order of X86, ARM and GPU.
例如,调度器011可以先检测异构集群中X86架构的处理器的空闲资源量是否满足目标任务的资源需求。若X86架构的处理器的空闲资源量满足资源需求,则调度器011可以执行下述步骤1043a。若X86架构的处理器的空闲资源量不满足该资源需求,则调度器011可以 继续检测异构集群中ARM架构的处理器的空闲资源量是否满足该资源需求。若ARM架构的处理器的空闲资源量满足资源需求,则调度器011可以执行下述步骤1043a。若ARM架构的处理器的空闲资源量不满足该资源需求,则调度器011可以继续检测异构集群中CPU架构的处理器的空闲资源量是否满足该资源需求。For example, the scheduler 011 may first detect whether the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task. If the amount of idle resources of the processor of the X86 architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processor of the X86 architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processor of the ARM architecture in the heterogeneous cluster meets the resource requirement. If the amount of idle resources of the processor of the ARM architecture meets the resource requirement, the scheduler 011 may execute the following step 1043a. If the amount of idle resources of the processors of the ARM architecture does not meet the resource requirement, the scheduler 011 may continue to detect whether the amount of idle resources of the processors of the CPU architecture in the heterogeneous cluster meets the resource requirement.
步骤1043a、若检测到目标芯片架构的处理器的空闲资源量满足该资源需求,则将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。 Step 1043a: If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
调度器011按照优先级由高到低的顺序,若检测到目标芯片架构的处理器的空闲资源量满足该资源需求,则可以将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。例如,若调度器011检测到异构集群中X86架构的处理器的空闲资源量满足目标任务的资源需求,则可以将包含该X86架构的处理器的一个计算节点确定目标计算节点。该目标计算节点中该X86架构的处理器的空闲资源量满足该资源需求。The scheduler 011 is in order of priority from high to low, if it detects that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, it can determine a computing node that includes the processor of the target chip architecture as the target computing node node. For example, if the scheduler 011 detects that the amount of idle resources of the processors of the X86 architecture in the heterogeneous cluster meets the resource requirements of the target task, a computing node including the processors of the X86 architecture may be determined as the target computing node. The amount of idle resources of the processor of the X86 architecture in the target computing node satisfies the resource requirement.
步骤1042b、按照优先级由高到低的顺序,依次检测多个计算节点中对应芯片架构的处理器的空闲资源之和是否满足该多个并行任务的资源需求之和。 Step 1042b , in order of priority from high to low, sequentially detect whether the sum of idle resources of processors corresponding to the chip architecture in the multiple computing nodes meets the sum of resource requirements of the multiple parallel tasks.
若该多个并行任务的并行调度模式为同步并行模式,则调度器011可以确定该多个并行任务需同步执行。因此,调度器011在调度目标任务时,需确保异构集群中某个芯片架构的处理器的空闲资源量之和满足该多个并行任务的资源需求之和。也即是,调度器011需基于该多个并行任务的资源需求之和,从多个计算节点中确定出用于执行该目标任务的目标计算节点。If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the scheduler 011 may determine that the multiple parallel tasks need to be executed synchronously. Therefore, when scheduling the target task, the scheduler 011 needs to ensure that the sum of the idle resources of the processors of a certain chip architecture in the heterogeneous cluster satisfies the sum of the resource requirements of the multiple parallel tasks. That is, the scheduler 011 needs to determine the target computing node for executing the target task from the plurality of computing nodes based on the sum of the resource requirements of the multiple parallel tasks.
示例的,假设该目标任务支持的三种芯片架构的优先级满足:X86>ARM>GPU。则调度器011可以按照X86、ARM和GPU的顺序,依次检测多个计算节点中每种芯片架构的处理器的空闲资源量之和是否满足该多个并行任务的资源需求之和。For example, it is assumed that the priorities of the three chip architectures supported by the target task satisfy: X86>ARM>GPU. Then, the scheduler 011 can sequentially detect whether the sum of idle resources of processors of each chip architecture in the multiple computing nodes satisfies the sum of resource requirements of the multiple parallel tasks in the order of X86, ARM and GPU.
步骤1043b、若检测到目标芯片架构的处理器的空闲资源量之和满足该资源需求之和,则将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。 Step 1043b: If it is detected that the sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements, a computing node including the processors of the target chip architecture is determined as the target computing node.
调度器011按照优先级由高到低的顺序,若检测到该异构集群中目标芯片架构的处理器的空闲资源量之和需满足该多个并行任务的资源需求之和,则可以将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。In order of priority from high to low, the scheduler 011 detects that the sum of the idle resources of the processors of the target chip architecture in the heterogeneous cluster needs to meet the sum of the resource requirements of the multiple parallel tasks, and can include A computing node of the processor of the target chip architecture is determined as the target computing node.
例如,若调度器011检测到多个计算节点中ARM架构的处理器的空闲资源量之和满足多个并行任务的资源需求之和,则可以将包含该ARM架构的处理器的一个计算节点确定目标计算节点。该目标计算节点中ARM架构的处理器的空闲资源量满足该目标任务的资源需求。For example, if the scheduler 011 detects that the sum of the idle resources of the processors of the ARM architecture in the multiple computing nodes satisfies the sum of the resource requirements of the multiple parallel tasks, it can determine a computing node including the processors of the ARM architecture. target compute node. The amount of idle resources of the processor of the ARM architecture in the target computing node satisfies the resource requirement of the target task.
可以理解的是,在上述步骤1043a和步骤1043b中,若调度器011检测到该多个计算节点中,存在至少两个满足该目标任务的执行条件的备选计算节点。则调度器011可以从该至少两个备选计算节点中随机选取一个作为目标计算节点。或者,该调度器011可以基于预先配置的资源调度策略,从该至少两个备选计算节点中选取一个作为目标计算节点。其中,满足目标任务的执行条件是指:计算节点包含目标芯片架构的处理器,且该处理器的空闲资源量满足该目标任务的资源需求。It can be understood that, in the above steps 1043a and 1043b, if the scheduler 011 detects that among the plurality of computing nodes, there are at least two candidate computing nodes that satisfy the execution conditions of the target task. Then the scheduler 011 may randomly select one of the at least two candidate computing nodes as the target computing node. Alternatively, the scheduler 011 may select one of the at least two candidate computing nodes as the target computing node based on a preconfigured resource scheduling policy. Wherein, satisfying the execution condition of the target task means that the computing node includes a processor of the target chip architecture, and the amount of idle resources of the processor meets the resource requirement of the target task.
示例的,参考图2,该调度器011可以包括任务管理和调度器0111,以及资源管理和调度器0112。参考图8,该任务管理和调度器0111接收到目标任务的调度需求信息后,可以 基于该目标任务的资源需求,向资源管理和调度器0112发送资源调用请求。该资源管理和调度器0112进而可以基于预先配置的资源调度策略,为该目标任务分配资源,即从多个计算节点02中确定目标计算节点。如图2和图4所示,该资源调度策略可以包括:异构感知、优先级抢占、亲和反亲和、装箱算法或加速器共享等。For example, referring to FIG. 2 , the scheduler 011 may include a task management and scheduler 0111 and a resource management and scheduler 0112 . Referring to FIG. 8 , after receiving the scheduling requirement information of the target task, the task management and scheduler 0111 can send a resource invocation request to the resource management and scheduler 0112 based on the resource requirements of the target task. The resource management and scheduler 0112 can further allocate resources for the target task based on a preconfigured resource scheduling policy, that is, determine a target computing node from a plurality of computing nodes 02 . As shown in FIG. 2 and FIG. 4 , the resource scheduling strategy may include: heterogeneous awareness, priority preemption, affinity and anti-affinity, bin packing algorithm or accelerator sharing, and the like.
还可以理解的是,若该任务管理和调度器0111接收到了包括目标任务在内的多个任务,则在资源管理和调度器0112完成对该多个任务的资源调度后,该任务管理和调度器0111可以基于预先配置的任务调度策略对该多个任务进行任务调度。例如,参考图2和图4,该任务调度策略可以包括:有向无环图(directed acyclic graph,DAG)调度、优先级调度器或优先级调度等。It can also be understood that, if the task management and scheduler 0111 receives multiple tasks including the target task, after the resource management and scheduler 0112 completes the resource scheduling of the multiple tasks, the task management and scheduling The controller 0111 can perform task scheduling on the multiple tasks based on a preconfigured task scheduling policy. For example, referring to FIG. 2 and FIG. 4 , the task scheduling strategy may include: directed acyclic graph (directed acyclic graph, DAG) scheduling, priority scheduler or priority scheduling, and the like.
示例的,假设异构集群中各计算节点的芯片架构包括GPU、NPU和CPU,且该三种芯片架构的处理器的加速比为2:2:1。若任务管理和调度器0111接收到了100个并行任务,该100个并行任务的并行调度模式为理想并行模式,且当前异构集群中有10个GPU、10个NPU和100个CPU的资源空闲,其中计算节点A中有50个X86架构的CPU空闲,计算节点B中有50个ARM架构的CPU空闲)。则调度器011可以将20个并行任务调度至包含GPU的计算节点中执行,将20个并行任务调度至包含NPU的计算节点中执行,将30个并行任务调度至计算节点A中执行,并将30个并行任务调度至计算节点B中执行。其中,每个GPU和每个NPU均用于执行2个并行任务,每个X86架构的CPU和每个ARM架构的CPU均用于执行1个并行任务。For example, it is assumed that the chip architecture of each computing node in the heterogeneous cluster includes GPU, NPU, and CPU, and the acceleration ratio of the processors of the three chip architectures is 2:2:1. If the task management and scheduler 0111 receives 100 parallel tasks, the parallel scheduling mode of the 100 parallel tasks is the ideal parallel mode, and the resources of 10 GPUs, 10 NPUs and 100 CPUs in the current heterogeneous cluster are idle, Among them, 50 CPUs of X86 architecture are idle in computing node A, and 50 CPUs of ARM architecture are idle in computing node B). Then the scheduler 011 can schedule 20 parallel tasks to be executed in the computing node containing GPU, 20 parallel tasks can be scheduled to be executed in the computing node containing NPU, 30 parallel tasks can be scheduled to be executed in the computing node A, and 30 parallel tasks are scheduled to compute node B for execution. Among them, each GPU and each NPU are used to execute 2 parallel tasks, and each CPU of X86 architecture and each CPU of ARM architecture are used to execute 1 parallel task.
步骤105、调度器向该目标计算节点发送针对该目标任务的调度指令。Step 105: The scheduler sends a scheduling instruction for the target task to the target computing node.
调度器011在确定出用于执行该目标任务的目标计算节点02后,即可向该目标计算节点02发送针对该目标任务的调度指令。该调度指令中可以携带有该目标任务的标识。该调度指令用于指示该目标计算节点02通过该目标任务的运行时插件,将该目标任务的中间表示编译为该目标芯片架构的可执行代码,并在该目标芯片架构的处理器中运行该可执行代码。After the scheduler 011 determines the target computing node 02 for executing the target task, the scheduler 011 may send a scheduling instruction for the target task to the target computing node 02 . The scheduling instruction may carry the identifier of the target task. The scheduling instruction is used to instruct the target computing node 02 to compile the intermediate representation of the target task into executable code of the target chip architecture through the runtime plug-in of the target task, and run the target chip architecture on the processor of the target chip architecture. executable code.
示例的,假设该调度器011接收到了N个并行任务(N为大于1的整数),则调度器011确定出用于执行每个并行任务的计算节点02后,如图5中的步骤S3所示,该调度器011可以向用于执行该N个并行任务的N个计算节点02分别发送调度指令。例如,参考图8,该调度器011中的任务管理和调度器0111可以向计算节点02中的任务层代理0212发送该调度指令。For example, assuming that the scheduler 011 receives N parallel tasks (N is an integer greater than 1), after the scheduler 011 determines the computing node 02 for executing each parallel task, as shown in step S3 in FIG. 5 . As shown, the scheduler 011 may respectively send scheduling instructions to the N computing nodes 02 for executing the N parallel tasks. For example, referring to FIG. 8 , the task management and scheduler 0111 in the scheduler 011 may send the scheduling instruction to the task layer agent 0212 in the computing node 02 .
步骤106、调度器向该目标计算节点发送该目标芯片架构的架构标识。Step 106: The scheduler sends the architecture identifier of the target chip architecture to the target computing node.
在本申请实施例中,由于异构集群中的一个或多个计算节点可以包括多种芯片架构的处理器,例如可以包括NPU和X86架构的CPU,或者可以包括GPU和X86架构的CPU。因此,为了便于目标计算节点确定用于运行该目标任务的处理器的芯片架构,该调度器还可以向目标计算节点发送该目标芯片架构的架构标识。In this embodiment of the present application, one or more computing nodes in a heterogeneous cluster may include processors of various chip architectures, for example, may include NPU and CPU of X86 architecture, or may include GPU and CPU of X86 architecture. Therefore, in order to facilitate the target computing node to determine the chip architecture of the processor used for running the target task, the scheduler may also send the architecture identifier of the target chip architecture to the target computing node.
可选地,如图8所示,该调度器011中的资源管理和调度器0112确定出目标芯片架构后,可以向目标计算节点02中的资源层代理0211发送该目标芯片架构的架构标识。该资源层代理0211进而可以将该目标芯片架构的架构标识发送至任务层代理0212。Optionally, as shown in FIG. 8 , after the resource management and scheduler 0112 in the scheduler 011 determines the target chip architecture, it can send the architecture identifier of the target chip architecture to the resource layer agent 0211 in the target computing node 02 . The resource layer agent 0211 can then send the architecture identifier of the target chip architecture to the task layer agent 0212.
或者,该调度器011中的资源管理和调度器0112确定出目标芯片架构后,可以向任务管理和调度器0111发送该目标芯片架构的架构标识。该任务管理和调度器011进而可以将该目标芯片架构的架构标识发送至目标计算节点02中的任务层代理0212。Alternatively, after the resource management and scheduler 0112 in the scheduler 011 determines the target chip architecture, the architecture identifier of the target chip architecture may be sent to the task management and scheduler 0111 . The task manager and scheduler 011 can then send the architecture identification of the target chip architecture to the task layer agent 0212 in the target computing node 02 .
可以理解的是,该步骤106也可以在步骤105之前执行。或者,该步骤106可以与步骤105同步执行,例如,该调度器发送的调度指令中可以携带该目标芯片架构的架构标识。It can be understood that this step 106 can also be performed before step 105 . Alternatively, step 106 may be performed synchronously with step 105, for example, the scheduling instruction sent by the scheduler may carry the architecture identifier of the target chip architecture.
步骤107、目标计算节点基于该调度指令,从该异构集群的文件管理器中获取该目标任务的中间表示、运行时插件以及输入数据。Step 107: Based on the scheduling instruction, the target computing node obtains the intermediate representation of the target task, the runtime plug-in and the input data from the file manager of the heterogeneous cluster.
参考图5中的步骤S4,目标计算节点02接收到该调度器011发送的针对目标任务的调度指令后,即可基于该调度指令中目标任务的标识,从文件管理器04中获取该目标任务的中间表示、运行时插件以及输入数据。Referring to step S4 in FIG. 5 , after receiving the scheduling instruction for the target task sent by the scheduler 011, the target computing node 02 can obtain the target task from the file manager 04 based on the identifier of the target task in the scheduling instruction intermediate representations, runtime plugins, and input data.
由于目标任务的中间表示、运行时插件和输入数据的数据量相对较大,因此可以通过异构集群中的文件管理器来存储该中间表示和运行时插件,由此能够降低对调度器的存储性能的要求。并且,由于调度器无需转发中间表示、运行时插件和输入数据,因此可以避免对其调度性能造成影响。Since the data volume of the intermediate representation, runtime plug-in and input data of the target task is relatively large, the intermediate representation and the runtime plug-in can be stored by the file manager in the heterogeneous cluster, which can reduce the storage of the scheduler performance requirements. Also, since the scheduler does not need to forward intermediate representations, runtime plugins, and input data, it avoids any impact on its scheduling performance.
可选地,在上述步骤102中,该主机03中的加速库031也可以将该目标任务的中间表示、运行时插件以及输入数据中的至少一种数据直接发送至调度器011。相应的,在该步骤107中,该调度器011即可向该目标计算节点02发送上述至少一种数据,即目标计算节点02可以接收该调度器011发送的上述至少一种数据。Optionally, in the above step 102, the acceleration library 031 in the host 03 may also directly send at least one of the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011. Correspondingly, in step 107 , the scheduler 011 can send the above at least one kind of data to the target computing node 02 , that is, the target computing node 02 can receive the above at least one kind of data sent by the scheduler 011 .
例如,参考图6,该异构集群也可以不包括文件管理器04。则在上述步骤102中,该主机03中的加速库031可以向该调度器011发送该目标任务的中间表示、运行时插件以及输入数据。相应的,在该步骤107中,该目标计算节点02即可接收该调度器011发送的该目标任务的中间表示、运行时插件以及输入数据。由于无需在异构集群中额外设置文件管理器,因此可以简化该异构集群的结构,降低异构集群的部署成本。For example, referring to FIG. 6, the heterogeneous cluster may also not include the file manager 04. Then in the above step 102, the acceleration library 031 in the host 03 can send the intermediate representation of the target task, the runtime plug-in and the input data to the scheduler 011. Correspondingly, in step 107, the target computing node 02 can receive the intermediate representation of the target task, the runtime plug-in and the input data sent by the scheduler 011. Since there is no need to additionally set a file manager in the heterogeneous cluster, the structure of the heterogeneous cluster can be simplified, and the deployment cost of the heterogeneous cluster can be reduced.
步骤108、目标计算节点基于该调度指令,通过该运行时插件将该中间表示编译为目标芯片架构的可执行代码。Step 108: Based on the scheduling instruction, the target computing node compiles the intermediate representation into executable code of the target chip architecture through the runtime plug-in.
在本申请实施例中,目标计算节点02获取到运行时插件后,可以运行该运行时插件。该运行时插件进而可以将该目标任务的中间表示编译为目标芯片架构的可执行代码。也即是,该运行时插件可以对该中间表示进行在线编译。参考图7可以看出,该运行时插件支持将中间表示编译为多种不同芯片架构的可执行代码,例如,运行时插件能够将中间表示编译为NPU、GPU、X86或ARM等架构的可执行代码。In the embodiment of the present application, after the target computing node 02 obtains the runtime plug-in, it can run the runtime plug-in. The runtime plug-in may in turn compile the intermediate representation of the target task into executable code for the target chip architecture. That is, the runtime plugin can compile the intermediate representation online. Referring to Figure 7, it can be seen that the runtime plugin supports compiling the intermediate representation into executable codes of various chip architectures. For example, the runtime plugin can compile the intermediate representation into executable codes of NPU, GPU, X86 or ARM architectures. code.
示例的,参考图6,假设该目标计算节点为计算节点A,该目标芯片架构为X86架构,则该计算节点A可以在X86架构的处理器中运行该运行时插件。该运行时插件进而可以将该目标任务的中间表示编译为X86架构的可执行代码。或者,若该目标计算节点为计算节点B,该目标芯片架构为NPU架构,则该计算节点B可以在NPU中运行该运行时插件。该运行时插件进而可以将该目标任务的中间表示编译为NPU架构的可执行代码。For example, referring to FIG. 6 , assuming that the target computing node is a computing node A, and the target chip architecture is an X86 architecture, the computing node A can run the runtime plug-in on a processor of the X86 architecture. The runtime plugin may in turn compile the intermediate representation of the target task into executable code for the X86 architecture. Alternatively, if the target computing node is a computing node B and the target chip architecture is an NPU architecture, the computing node B can run the runtime plug-in in the NPU. The runtime plugin may in turn compile the intermediate representation of the target task into executable code for the NPU architecture.
可选地,如图2和图8所示,目标计算节点02中的任务层代理0212接收到调度指令后,可以先通过运行时插件管理器启动任务服务实例,该任务服务实例中运行有该运行时插 件。处于运行状态的运行时插件进而可以将中间表示编译为目标芯片架构的可执行代码。例如,参考图8,该处于运行状态的运行时插件可以从文件管理器04中获取该目标任务的中间表示,并对该中间表示进行编译,得到可执行代码。或者,该中间表示也可以由任务层代理0212从文件管理器04中获取并发送至该运行时插件。Optionally, as shown in FIG. 2 and FIG. 8 , after the task layer agent 0212 in the target computing node 02 receives the scheduling instruction, it can first start the task service instance through the runtime plug-in manager, and the task service instance runs the task service instance. Runtime plugin. The runtime plugin in the running state can then compile the intermediate representation into executable code for the target chip architecture. For example, referring to FIG. 8 , the runtime plug-in in the running state can obtain the intermediate representation of the target task from the file manager 04, and compile the intermediate representation to obtain executable code. Alternatively, the intermediate representation can also be obtained by the task layer agent 0212 from the file manager 04 and sent to the runtime plug-in.
步骤109、目标计算节点将该输入数据作为该可执行代码的输入,在该目标芯片架构的处理器中运行该可执行代码,得到该可执行代码的运行结果。Step 109: The target computing node uses the input data as the input of the executable code, runs the executable code in the processor of the target chip architecture, and obtains the running result of the executable code.
目标计算节点通过运行时插件将中间表示编译为目标芯片架构的可执行代码后,即可将该输入数据提供至该运行时插件。该运行时插件进而可以将该输入数据作为该可执行代码的输入,在该目标芯片架构的处理器中运行该可执行代码,得到该可执行代码的运行结果。After the target computing node compiles the intermediate representation into executable code of the target chip architecture through the runtime plug-in, the input data can be provided to the runtime plug-in. The runtime plug-in can further use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain the running result of the executable code.
示例的,如图8所示,该目标计算节点02中的任务层代理0212在启动任务服务实例并运行该运行时插件之后,可以向该运行时插件提供输入数据。例如,该输入数据可以为输入矩阵A和B,该运行时插件运行该for循环内的map函数后,得到的运行结果即为矩阵相乘运算的运算结果。For example, as shown in FIG. 8 , after starting the task service instance and running the runtime plug-in, the task layer agent 0212 in the target computing node 02 may provide input data to the runtime plug-in. For example, the input data may be input matrices A and B, and after the runtime plug-in runs the map function in the for loop, the running result obtained is the operation result of the matrix multiplication operation.
可选地,在本申请实施例中,该运行时插件在对目标任务的中间表示进行编译得到可执行代码后,还可以缓存该目标任务的可执行代码。由此,后续再有同类型的目标任务要执行时,则无需再对该目标任务的中间表示进行在线编译,从而避免了在线编译引入的额外开销。Optionally, in this embodiment of the present application, after compiling the intermediate representation of the target task to obtain executable code, the runtime plug-in may further cache the executable code of the target task. Therefore, when there is another target task of the same type to be executed subsequently, there is no need to perform online compilation of the intermediate representation of the target task, thereby avoiding the extra overhead introduced by the online compilation.
例如,该目标计算节点在接收到针对目标任务的调度指令后,若检测到本地已缓存该目标任务的可执行代码,且该可执行代码的芯片架构与调度器发送的目标芯片架构相同,则目标计算节点可以直接通过该运行时插件,将该目标任务的输入数据作为该可执行代码的输入,在该目标芯片架构的处理器中运行该可执行代码。For example, after receiving the scheduling instruction for the target task, the target computing node detects that the executable code of the target task has been cached locally, and the chip architecture of the executable code is the same as that of the target chip sent by the scheduler, then The target computing node can directly use the runtime plug-in to take the input data of the target task as the input of the executable code, and run the executable code in the processor of the target chip architecture.
可以理解的是,不同计算节点02中执行的任务之间可以具有依赖关系,即分配至不同计算节点02的任务在执行过程中可能需要交互数据。因此,任务层代理0212启动的任务服务实例还具有与其他计算节点02中的任务层代理0212通信的功能,由此可以便于在任务执行过程中,从其他计算节点02中获取必要的数据。It can be understood that there may be dependencies between tasks executed in different computing nodes 02, that is, tasks allocated to different computing nodes 02 may require interactive data during the execution process. Therefore, the task service instance started by the task-level agent 0212 also has the function of communicating with the task-level agents 0212 in other computing nodes 02, thereby facilitating the acquisition of necessary data from other computing nodes 02 during task execution.
步骤110、目标计算节点向该调度器发送该运行结果。Step 110: The target computing node sends the running result to the scheduler.
为了便于主机对该运行结果进行进一步的处理,参考图5中的步骤S5,该目标计算节点02可以向调度器011发送该运行结果。In order to facilitate the host to further process the running result, referring to step S5 in FIG. 5 , the target computing node 02 may send the running result to the scheduler 011 .
步骤111、调度器向主机发送该运行结果。Step 111: The scheduler sends the running result to the host.
继续参考图5中的步骤S5,调度器011接收到该运行结果后,即可通过网关05将该运行结果发送至主机中的加速库031,以便加速库031对该运行结果进行进一步的处理。Continuing to refer to step S5 in FIG. 5 , after receiving the running result, the scheduler 011 can send the running result to the acceleration library 031 in the host through the gateway 05 so that the acceleration library 031 can further process the running result.
示例的,参考图5,假设该目标任务为N个并行任务中的一个,则用于执行该N个并行任务的N个计算节点02计算得到该运行结果后,可以分别向调度器011发送该运行结果。该调度器011进而可以将N个运行结果通过网关05发送至加速库031。该加速库031进而可以对接收到的N个运行结果进行归约处理。For example, referring to FIG. 5 , assuming that the target task is one of N parallel tasks, after the N computing nodes 02 for executing the N parallel tasks obtain the running result, they can send the result to the scheduler 011 respectively. operation result. The scheduler 011 can then send the N running results to the acceleration library 031 through the gateway 05 . The acceleration library 031 can further perform reduction processing on the received N running results.
可选地,如图6所示,异构集群中的管理节点01还可以包括历史信息收集模块012,该历史信息收集模块012可以用于收集并存储历史任务的调度信息和执行信息。Optionally, as shown in FIG. 6 , the management node 01 in the heterogeneous cluster may further include a historical information collection module 012, and the historical information collection module 012 may be used to collect and store scheduling information and execution information of historical tasks.
图10是本申请实施例提供的再一种任务调度方法的应用场景的示意图。如图10所示,该主机03可以包括CPU,该CPU用于运行加速库031,以对目标任务的源代码进行编译得到胖二进制文件。参考图10,该胖二进制文件包括主机代码和中间表示,该主机代码可以为CPU主机代码。FIG. 10 is a schematic diagram of an application scenario of still another task scheduling method provided by an embodiment of the present application. As shown in FIG. 10 , the host 03 may include a CPU, and the CPU is used for running the acceleration library 031 to compile the source code of the target task to obtain a fat binary file. Referring to Figure 10, the fat binary includes host code and an intermediate representation, the host code may be CPU host code.
继续参考图10,该主机03中还可以运行有目标无关的设备插件框架以及自适应集群插件(adaptive cluster plugin)。例如,若该加速库031为OpenMP加速库,则该目标无关的设备插件框架可以为目标无关的封装器(target agnostic wrapper)。该目标无关的设备插件框架用于与该自适应集群插件对接,该自适应集群插件用于与分布式中间件交互。例如,该自适应集群插件可以通过调用中间件编程接口,向异构集群中的调度器011发送数据,以将该目标任务卸载至异构集群中执行。相应的,该自适应集群插件也可以称为卸载插件。Continuing to refer to FIG. 10 , the host 03 may also run a target-independent device plugin framework and an adaptive cluster plugin. For example, if the acceleration library 031 is an OpenMP acceleration library, the target agnostic device plug-in framework may be a target agnostic wrapper. The target-independent device plug-in framework is used to interface with the adaptive cluster plug-in, and the adaptive cluster plug-in is used to interact with the distributed middleware. For example, the adaptive cluster plug-in can send data to the scheduler 011 in the heterogeneous cluster by invoking the middleware programming interface, so as to offload the target task to the heterogeneous cluster for execution. Correspondingly, the adaptive cluster plug-in may also be referred to as an uninstall plug-in.
可以理解的是,本申请实施例提供的任务调度方法的步骤可以根据情况增加,也可以根据情况删除。例如,上述步骤103可以在步骤102之前执行。或者,若上述步骤102中各数据的发送对象为调度器,则该步骤102和步骤103也可以同步执行。又或者,若目标计算节点中仅包括一种芯片架构的处理器,则上述步骤106也可以根据情况删除。再或者,若该目标任务不为并行任务,则上述步骤1041、步骤1042b和步骤1043b也可以根据情况删除。It can be understood that, the steps of the task scheduling method provided by the embodiments of the present application may be added or deleted according to circumstances. For example, the above-mentioned step 103 may be performed before the step 102 . Alternatively, if the sending object of each data in the above step 102 is the scheduler, then the step 102 and the step 103 may also be performed synchronously. Alternatively, if the target computing node only includes processors with one chip architecture, the above step 106 may also be deleted according to the situation. Alternatively, if the target task is not a parallel task, the above steps 1041, 1042b and 1043b may also be deleted according to the situation.
还可以理解的是,调度器接收到的多个并行任务(task)也可以称为一个作业(job),则本申请实施例提供的方法不仅可以实现单任务级别的任务调度,还可以实现作业级别的任务调度。It can also be understood that multiple parallel tasks (tasks) received by the scheduler may also be referred to as one job (job), and the method provided in this embodiment of the present application can not only implement task scheduling at the single-task level, but also implement job scheduling. level task scheduling.
综上所述,本申请实施例提供了一种任务调度方法,目标计算节点可以获取到目标任务的中间表示和运行时插件。由于该中间表示是与处理器的芯片架构无关的代码,因此目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为均衡,有效提高异构集群的资源利用率。To sum up, the embodiment of the present application provides a task scheduling method, and the target computing node can obtain the intermediate representation and the runtime plug-in of the target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
并且,由于调度器无需根据任务中标记的可执行代码的架构,确定芯片架构与该架构相匹配的计算节点,因此可以有效降低资源管理和调度的复杂度,进而提高任务调度的效率。又由于无需主机提供多种不同架构的可执行代码,因此可以有效降低主机侧的运维成本和开发成本。Moreover, since the scheduler does not need to determine the computing node whose chip architecture matches the architecture according to the architecture of the executable code marked in the task, the complexity of resource management and scheduling can be effectively reduced, thereby improving the efficiency of task scheduling. In addition, since the host does not need to provide executable codes of various architectures, the operation and maintenance cost and development cost on the host side can be effectively reduced.
本申请实施例还提供了一种目标计算节点,该目标计算节点可以应用于上述实施例所提供的异构集群,且可以用于实现上述方法实施例中由目标计算节点执行的步骤。如图1、图2、图6、图6、以及图9所示,该异构集群包括调度器011和多个计算节点02,该多个计算节点02中至少两个计算节点02的芯片架构不同,该目标计算节点属于该多个计算节点02。 参考图11,该目标计算节点还可以包括:Embodiments of the present application further provide a target computing node, which can be applied to the heterogeneous cluster provided by the foregoing embodiments, and can be used to implement the steps performed by the target computing node in the foregoing method embodiments. As shown in FIG. 1 , FIG. 2 , FIG. 6 , FIG. 6 , and FIG. 9 , the heterogeneous cluster includes a scheduler 011 and a plurality of computing nodes 02 , and the chip architecture of at least two computing nodes 02 in the plurality of computing nodes 02 Differently, the target computing node belongs to the plurality of computing nodes 02 . Referring to Figure 11, the target computing node may also include:
接收模块201,用于接收该调度器发送的针对目标任务的调度指令。该接收模块201的功能实现可以参考上述方法实施例中步骤105的相关描述。The receiving module 201 is configured to receive the scheduling instruction for the target task sent by the scheduler. For the function implementation of the receiving module 201, reference may be made to the relevant description of step 105 in the foregoing method embodiments.
获取模块202,用于获取该目标任务的中间表示以及该目标任务的运行时插件。该获取模块202的功能实现可以参考上述方法实施例中步骤107的相关描述。The obtaining module 202 is configured to obtain the intermediate representation of the target task and the runtime plug-in of the target task. For the functional realization of the obtaining module 202, reference may be made to the relevant description of step 107 in the foregoing method embodiment.
处理模块203,用于基于该调度指令,通过该运行时插件将该中间表示编译为该目标芯片架构的可执行代码,并在该目标芯片架构的处理器中运行该可执行代码,该目标计算节点包括目标芯片架构的处理器。该处理模块203的功能实现可以参考上述方法实施例中步骤108的相关描述。A processing module 203, configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code in the processor of the target chip architecture, the target computing A node includes a processor of the target chip architecture. For the function implementation of the processing module 203, reference may be made to the relevant description of step 108 in the foregoing method embodiments.
可选地,该获取模块202可以用于:接收该调度器发送的该目标任务的中间表示以及运行时插件;或者,基于该调度指令,从该异构集群的文件管理器中获取该目标任务的中间表示以及运行时插件。Optionally, the obtaining module 202 may be configured to: receive the intermediate representation and the runtime plug-in of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the target task from the file manager of the heterogeneous cluster Intermediate representation of , and runtime plugins.
可选地,该接收模块201还可以用于:接收该调度器发送的该目标芯片架构的架构标识。该接收模块201的功能实现可以参考上述方法实施例中步骤106的相关描述。Optionally, the receiving module 201 may be further configured to: receive the architecture identifier of the target chip architecture sent by the scheduler. For the function implementation of the receiving module 201, reference may be made to the relevant description of step 106 in the foregoing method embodiments.
相应的,该处理模块203,可以基于该目标芯片架构的架构标识,通过该运行时插件将该中间表示编译为目标芯片架构的可执行代码。Correspondingly, the processing module 203 may compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the architecture identifier of the target chip architecture.
可选地,该获取模块202,还可以用于获取该目标任务的输入数据。Optionally, the obtaining module 202 can also be used to obtain the input data of the target task.
该处理模块203,用于将该输入数据作为该可执行代码的输入,在该目标芯片架构的处理器中运行该可执行代码,得到该可执行代码的运行结果。该处理模块203的功能实现还可以参考上述方法实施例中步骤109的相关描述。The processing module 203 is configured to use the input data as the input of the executable code, run the executable code in the processor of the target chip architecture, and obtain a running result of the executable code. For the functional implementation of the processing module 203, reference may also be made to the relevant description of step 109 in the foregoing method embodiments.
可选地,如图11所示,该目标计算节点还包括:Optionally, as shown in Figure 11, the target computing node further includes:
发送模块204,用于在该处理模块203得到该可执行代码的运行结果之后,向该调度器发送该运行结果。该发送模块204的功能实现还可以参考上述方法实施例中步骤110的相关描述。The sending module 204 is configured to send the running result to the scheduler after the processing module 203 obtains the running result of the executable code. For the function implementation of the sending module 204, reference may also be made to the relevant description of step 110 in the foregoing method embodiments.
可选地,该获取模块202可以用于:接收该调度器发送的该目标任务的输入数据;或者,基于该调度指令,从该异构集群的文件管理器中获取该目标任务的输入数据。Optionally, the obtaining module 202 may be configured to: receive the input data of the target task sent by the scheduler; or, based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster.
综上所述,本申请实施例提供了一种目标计算节点,该目标计算节点可以获取到目标任务的中间表示和运行时插件。由于该中间表示是与处理器的芯片架构无关的代码,因此目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为均衡,有效提高异构集群的资源利用率。To sum up, the embodiment of the present application provides a target computing node, and the target computing node can obtain an intermediate representation and a runtime plug-in of a target task. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
本申请实施例提供了一种调度器,该调度器可以应用于上述实施例提供的异构集群,例 如可以应用于该异构集群中的管理节点01。并且,该调度器可以用于实现上述方法实施例中由调度器执行的步骤。参考图1、图2、图6、图6、以及图9,该异构集群还包括多个计算节点02,该多个计算节点02中至少两个计算节点02的芯片架构不同。如图12所示,该调度器可以包括:The embodiment of the present application provides a scheduler, and the scheduler can be applied to the heterogeneous cluster provided by the foregoing embodiment, for example, can be applied to the management node 01 in the heterogeneous cluster. Moreover, the scheduler may be used to implement the steps performed by the scheduler in the above method embodiments. Referring to FIG. 1 , FIG. 2 , FIG. 6 , FIG. 6 , and FIG. 9 , the heterogeneous cluster further includes multiple computing nodes 02 , and at least two computing nodes 02 of the multiple computing nodes 02 have different chip architectures. As shown in Figure 12, the scheduler may include:
接收模块301,用于接收待调度的目标任务的调度需求信息,该调度需求信息包括该目标任务的资源需求,以及该目标任务支持的至少两种芯片架构。该接收模块301的功能实现可以参考上述方法实施例中步骤103的相关描述。The receiving module 301 is configured to receive scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task. For the functional realization of the receiving module 301, reference may be made to the relevant description of step 103 in the foregoing method embodiments.
确定模块302,用于基于该调度需求信息,从该多个计算节点中确定目标计算节点,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构。该确定模块302的功能实现可以参考上述方法实施例中步骤104的相关描述。The determining module 302 is configured to determine, based on the scheduling requirement information, a target computing node from the plurality of computing nodes, where the idle resources of the processors of the target chip architecture in the target computing node meet the resource requirements of the target task, and the target Chip architectures belong to the at least two chip architectures. For the functional implementation of the determining module 302, reference may be made to the relevant description of step 104 in the foregoing method embodiments.
发送模块303,用于向该目标计算节点发送针对该目标任务的调度指令,该调度指令用于指示该目标计算节点通过该目标任务的运行时插件,将该目标任务的中间表示编译为该目标芯片架构的可执行代码,并在该目标芯片架构的处理器中运行该可执行代码。该发送模块303的功能实现可以参考上述方法实施例中步骤105的相关描述。The sending module 303 is used to send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into the target through the runtime plug-in of the target task The executable code of the chip architecture is executed in the processor of the target chip architecture. For the function implementation of the sending module 303, reference may be made to the relevant description of step 105 in the foregoing method embodiments.
可选地,该调度需求信息还可以包括:该至少两种芯片架构的优先级;该确定模块302可以用于:Optionally, the scheduling requirement information may further include: priorities of the at least two chip architectures; the determining module 302 may be used for:
按照该至少两种芯片架构的优先级由高到低的顺序,依次检测该多个计算节点中对应芯片架构的处理器的空闲资源量是否满足该资源需求;According to the order of priorities of the at least two chip architectures from high to low, sequentially detecting whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirement;
若检测到目标芯片架构的处理器的空闲资源量满足该资源需求,则将包含该目标芯片架构的处理器的一个计算节点确定为目标计算节点。If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
确定模块302的功能实现可以参考上述方法实施例中步骤1042a和步骤1043a的相关描述。For the function implementation of the determination module 302, reference may be made to the relevant descriptions of steps 1042a and 1043a in the above method embodiments.
可选地,该发送模块303,还可以用于向该目标计算节点发送该目标芯片架构的架构标识。该发送模块303的功能实现还可以参考上述方法实施例中步骤106的相关描述。Optionally, the sending module 303 may be further configured to send the architecture identifier of the target chip architecture to the target computing node. For the function implementation of the sending module 303, reference may also be made to the relevant description of step 106 in the foregoing method embodiments.
可选地,该接收模块301,还可以用于接收该目标任务的中间表示和该目标任务的运行时插件。相应的,该发送模块303,还可以用于向该目标计算节点发送该目标任务的中间表示和该运行时插件。Optionally, the receiving module 301 may also be configured to receive the intermediate representation of the target task and the runtime plug-in of the target task. Correspondingly, the sending module 303 may also be configured to send the intermediate representation of the target task and the runtime plug-in to the target computing node.
可选地,该目标任务为多个并行任务中的一个并行任务,该调度需求信息还包括:该多个并行任务的并行调度模式;该确定模块302可以用于:Optionally, the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further includes: parallel scheduling modes of the multiple parallel tasks; the determining module 302 may be used for:
若该多个并行任务的并行调度模式为同步并行模式,则基于该多个并行任务的资源需求之和,从该多个计算节点中确定目标计算节点,该异构集群中该目标芯片架构的处理器的空闲资源量之和满足该多个并行任务的资源需求之和;If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, then based on the sum of the resource requirements of the multiple parallel tasks, the target computing node is determined from the multiple computing nodes, and the target computing node in the heterogeneous cluster is the target chip architecture. The sum of the idle resources of the processor satisfies the sum of the resource requirements of the multiple parallel tasks;
若该多个并行任务的并行调度模式为理想并行模式,则基于该目标任务的资源需求,从该多个计算节点中确定目标计算节点;If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;
其中,同步并行模式是指该多个并行任务需同步执行,理想并行模式是指该多个并行任 务无需同步执行。该确定模块302的功能实现还可以参考上述方法实施例中步骤1041,步骤1042b和步骤1043b的相关描述。The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously. For the functional implementation of the determining module 302, reference may also be made to the relevant descriptions of step 1041, step 1042b and step 1043b in the above method embodiments.
综上所述,本申请实施例提供了一种调度器,由于目标计算节点可以获取到目标任务的中间表示和运行时插件,且该中间表示是与处理器的芯片架构无关的代码,因此该目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为均衡,有效提高异构集群的资源利用率。To sum up, the embodiment of the present application provides a scheduler. Since the target computing node can obtain the intermediate representation and the runtime plug-in of the target task, and the intermediate representation is code independent of the chip architecture of the processor, the The target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
本申请实施例还提供了一种主机,该主机可以应用于上述实施例提供的任务调度系统,且可以用于实现上述方法实施例中由主机执行的步骤。参考图13,该主机可以包括:An embodiment of the present application further provides a host, which can be applied to the task scheduling system provided in the foregoing embodiment, and can be used to implement the steps performed by the host in the foregoing method embodiment. Referring to Figure 13, the host may include:
编译模块401,用于对目标任务的源代码进行编译,得到该目标任务的中间表示和该目标任务的运行时插件。该编译模块401的功能实现可参考上述方法实施例中步骤101的相关描述。The compiling module 401 is used for compiling the source code of the target task to obtain the intermediate representation of the target task and the runtime plug-in of the target task. For the functional realization of the compiling module 401, reference may be made to the relevant description of step 101 in the foregoing method embodiment.
第一发送模块402,用于发送该中间表示和该运行时插件。该第一发送模块402的功能实现可以参考上述方法实施例中步骤102的相关描述。The first sending module 402 is configured to send the intermediate representation and the runtime plug-in. For the function implementation of the first sending module 402, reference may be made to the relevant description of step 102 in the foregoing method embodiments.
第二发送模块403,用于向异构集群中的调度器发送该目标任务的调度需求信息,该调度需求信息包括该目标任务的资源需求,以及该目标任务支持的至少两种芯片架构。The second sending module 403 is configured to send scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task.
其中,该异构集群还包括多个计算节点,该多个计算节点中至少两个计算节点的芯片架构不同,该调度需求信息用于指示该调度器将该目标任务调度至该至少两个计算节点中的目标计算节点,该目标计算节点中目标芯片架构的处理器的空闲资源量满足该目标任务的资源需求,且该目标芯片架构属于该至少两种芯片架构,该运行时插件用于供该目标计算节点将该中间表示编译为该目标芯片架构的可执行代码。Wherein, the heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to the at least two computing nodes The target computing node in the node, the amount of idle resources of the processor of the target chip architecture in the target computing node meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures, and the runtime plug-in is used to provide The target computing node compiles the intermediate representation into executable code for the target chip architecture.
第二发送模块403的功能实现可以参考上述方法实施例中步骤103的相关描述。For the function implementation of the second sending module 403, reference may be made to the relevant description of step 103 in the foregoing method embodiments.
可选地,该第一发送模块402可以用于:Optionally, the first sending module 402 can be used to:
向该调度器发送该中间表示和该运行时插件;或者,向该异构集群中的文件管理器发送该中间表示和该运行时插件。The intermediate representation and the runtime plugin are sent to the scheduler; alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
综上所述,本申请实施例提供了一种主机,该主机可以向目标计算节点提供目标任务的中间表示和运行时插件。由于该中间表示是与处理器的芯片架构无关的代码,因此目标计算节点可以通过运行时插件将该中间表示编译为目标芯片架构的可执行代码,并在目标芯片架构的处理器中运行该可执行代码。相应的,异构集群中的调度器在调度目标任务时,不会受到该目标任务中已编译的可执行代码的架构的限制,而是可以基于异构集群中各计算节点的资源使用情况,灵活地确定用于执行该目标任务的计算节点。由此,可以确保各计算节点的负载较为均衡,有效提高异构集群的资源利用率。To sum up, the embodiments of the present application provide a host, which can provide an intermediate representation of a target task and a runtime plug-in to a target computing node. Since the intermediate representation is code independent of the processor's chip architecture, the target computing node can compile the intermediate representation into executable code of the target chip architecture through a runtime plug-in, and run the executable code in the processor of the target chip architecture. Execute the code. Correspondingly, when the scheduler in the heterogeneous cluster schedules the target task, it will not be limited by the architecture of the compiled executable code in the target task, but can be based on the resource usage of each computing node in the heterogeneous cluster. Flexibly determine the computing node for executing the target task. In this way, the load of each computing node can be ensured to be relatively balanced, and the resource utilization rate of the heterogeneous cluster can be effectively improved.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的目标计算节 点、调度器和主机中的各模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of each module in the target computing node, the scheduler and the host described above can refer to the corresponding process in the foregoing method embodiment, in the This will not be repeated here.
应理解的是,本申请实施例提供的上述目标计算节点、调度器和主机均可以用专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。It should be understood that, the above-mentioned target computing nodes, schedulers, and hosts provided in the embodiments of the present application may all be implemented by application-specific integrated circuits (application-specific integrated circuits, ASICs), or programmable logic devices (programmable logic devices, PLDs). , the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof.
当然,也可以通过软件实现上述方法实施例提供的任务调度方法,当通过软件实现上述方法实施例提供的任务调度方法时,该目标计算节点、调度器和主机中均可以包括用于实现上述方法的软件模块。Of course, the task scheduling method provided by the above method embodiments may also be implemented by software. When the task scheduling method provided by the above method embodiments is implemented by software, the target computing node, the scheduler and the host may all include components for implementing the above method. software module.
本申请实施例还提供了一种计算机设备,该计算机设备可以应用于上述实施例提供的任务调度系统。该计算机设备可以为上述实施例提供的目标计算节点、调度器或者主机。参考图14,该计算机设备可以包括:处理器501、存储器502、网络接口503和总线504。其中,总线504用于连接处理器501、存储器502和网络接口503。通过网络接口503(可以是有线或者无线)可以实现与其他器件之间的通信连接。存储器502中存储有计算机程序5021,该计算机程序5021用于实现各种应用功能。An embodiment of the present application further provides a computer device, and the computer device can be applied to the task scheduling system provided by the above embodiment. The computer device may be the target computing node, scheduler or host provided in the above embodiment. Referring to FIG. 14 , the computer device may include: a processor 501 , a memory 502 , a network interface 503 and a bus 504 . The bus 504 is used for connecting the processor 501 , the memory 502 and the network interface 503 . The communication connection with other devices can be realized through the network interface 503 (which may be wired or wireless). The memory 502 stores a computer program 5021 for realizing various application functions.
应理解,在本申请实施例中,处理器501可以是CPU,该处理器501还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、GPU或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。It should be understood that in this embodiment of the present application, the processor 501 may be a CPU, and the processor 501 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays ( FPGA), GPU or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like.
存储器502可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。Memory 502 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).
总线504除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线504。In addition to the data bus, the bus 504 may also include a power bus, a control bus, a status signal bus, and the like. However, for the sake of clarity, the various buses are labeled as bus 504 in the figure.
处理器501被配置为执行存储器502中存储的计算机程序,处理器501通过执行该计算机程序5021来实现上述方法实施例所示的任务调度方法。The processor 501 is configured to execute the computer program stored in the memory 502, and the processor 501 implements the task scheduling method shown in the above method embodiments by executing the computer program 5021.
例如,若该计算机设备为目标计算节点,则该处理器501可以通过执行该计算机程序 5021来实现上述方法实施例中由目标计算节点执行的步骤。若该计算机设备为调度器,则该处理器501可以通过执行该计算机程序5021来实现上述方法实施例中由调度器执行的步骤。若该计算机设备为主机,则该处理器501可以通过执行该计算机程序5021来实现上述方法实施例中由主机执行的步骤。For example, if the computer device is a target computing node, the processor 501 may implement the steps performed by the target computing node in the above method embodiments by executing the computer program 5021. If the computer device is a scheduler, the processor 501 may implement the steps performed by the scheduler in the above method embodiments by executing the computer program 5021 . If the computer device is a host, the processor 501 may implement the steps performed by the host in the above method embodiments by executing the computer program 5021 .
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,该指令由处理器执行以实现上述方法实施例中应用于目标计算节点的任务调度方法,或者实现上述方法实施例中应用于调度器的任务调度方法,又或者实现上述方法实施例中应用于主机的任务调度方法。Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and the instructions are executed by a processor to implement the task scheduling method applied to a target computing node in the foregoing method embodiments, or The task scheduling method applied to the scheduler in the above method embodiment is implemented, or the task scheduling method applied to the host in the above method embodiment is implemented.
本申请实施例还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机实现上述方法实施例中应用于目标计算节点的任务调度方法,或者实现上述方法实施例中应用于调度器的任务调度方法,又或者实现上述方法实施例中应用于主机的任务调度方法。The embodiments of the present application also provide a computer program product containing instructions, when the computer program product runs on a computer, the computer program product enables the computer to implement the task scheduling method applied to the target computing node in the above method embodiments, or to implement the above method. In the example, the task scheduling method applied to the scheduler may be implemented, or the task scheduling method applied to the host in the above method embodiments may be implemented.
本申请实施例还提供了一种任务调度系统,如图1、图2和图10所示,该系统可以包括:主机03,调度器011,以及多个计算节点02,该多个计算节点02中至少两个计算节点02的芯片架构不同。The embodiment of the present application also provides a task scheduling system, as shown in FIG. 1 , FIG. 2 and FIG. 10 , the system may include: a host 03 , a scheduler 011 , and multiple computing nodes 02 , the multiple computing nodes 02 The chip architectures of at least two computing nodes 02 are different.
其中,该多个计算节点02中的至少一个计算节点为如上述实施例提供的目标计算节点,例如可以为图11或图14所示的目标计算节点。Wherein, at least one computing node among the plurality of computing nodes 02 is the target computing node provided in the foregoing embodiment, for example, may be the target computing node shown in FIG. 11 or FIG. 14 .
该调度器011为如上述实施例提供的调度器,例如可以为图12或图14所示的调度器。The scheduler 011 is the scheduler provided in the foregoing embodiment, and may be, for example, the scheduler shown in FIG. 12 or FIG. 14 .
该主机03为如上述实施例提供的主机,例如可以为图13或图14所示的主机。The host 03 is the host provided in the above embodiment, for example, the host shown in FIG. 13 or FIG. 14 .
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也 不对数量和执行顺序进行限定。In this application, the terms "first", "second" and other words are used to distinguish the same or similar items with basically the same function and function, and it should be understood that between "first", "second" and "nth" There are no logical or timing dependencies, and no restrictions on the number and execution order.
本申请中术语“至少一个”的含义是指至少一个,本申请中术语“多个”的含义是指两个或两个以上。本文中术语“系统”和“网络”经常可互换使用。The meaning of the term "at least one" in this application means at least one, and the meaning of the term "plurality" in this application means two or more. The terms "system" and "network" are often used interchangeably herein.
以上所述,仅为本申请的可选实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only optional embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions of the present application shall be included within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (28)

  1. 一种任务调度方法,其特征在于,应用于异构集群中的目标计算节点,所述异构集群包括调度器和多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同,所述目标计算节点属于所述多个计算节点;所述方法包括:A task scheduling method, characterized in that it is applied to a target computing node in a heterogeneous cluster, the heterogeneous cluster includes a scheduler and a plurality of computing nodes, and the chip architecture of at least two computing nodes in the plurality of computing nodes is different, the target computing node belongs to the multiple computing nodes; the method includes:
    接收所述调度器发送的针对目标任务的调度指令;receiving the scheduling instruction for the target task sent by the scheduler;
    获取所述目标任务的中间表示以及所述目标任务的运行时插件,所述中间表示是对所述目标任务的源代码进行编译得到的与芯片架构无关的代码;obtaining an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is a chip architecture-independent code obtained by compiling the source code of the target task;
    基于所述调度指令,通过所述运行时插件将所述中间表示编译为目标芯片架构的可执行代码,所述目标计算节点包括所述目标芯片架构的处理器;Based on the scheduling instruction, the intermediate representation is compiled into executable code of a target chip architecture through the runtime plug-in, and the target computing node includes a processor of the target chip architecture;
    通过所述运行时插件在所述目标芯片架构的处理器中运行所述可执行代码。The executable code is executed in the processor of the target chip architecture by the runtime plug-in.
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述目标任务的中间表示以及所述目标任务的运行时插件,包括:The method according to claim 1, wherein the obtaining the intermediate representation of the target task and the runtime plug-in of the target task comprises:
    基于所述调度指令,从所述异构集群的文件管理器中获取所述目标任务的中间表示以及所述目标任务的运行时插件;Based on the scheduling instruction, obtain the intermediate representation of the target task and the runtime plug-in of the target task from the file manager of the heterogeneous cluster;
    或者,接收所述调度器发送的所述目标任务的中间表示以及所述目标任务的运行时插件。Or, receive the intermediate representation of the target task and the runtime plug-in of the target task sent by the scheduler.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:接收所述调度器发送的所述目标芯片架构的架构标识;The method according to claim 1 or 2, wherein the method further comprises: receiving an architecture identifier of the target chip architecture sent by the scheduler;
    所述通过所述运行时插件将所述中间表示编译为目标芯片架构的可执行代码,包括:The compiling of the intermediate representation into executable code of the target chip architecture by the runtime plug-in includes:
    基于所述目标芯片架构的架构标识,通过所述运行时插件将所述中间表示编译为所述目标芯片架构的可执行代码。Based on the architecture identification of the target chip architecture, the intermediate representation is compiled into executable code of the target chip architecture by the runtime plug-in.
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述方法还包括:获取所述目标任务的输入数据;The method according to any one of claims 1 to 3, wherein the method further comprises: acquiring input data of the target task;
    所述通过所述运行时插件在所述目标芯片架构的处理器中运行所述可执行代码,包括:通过所述运行时插件将所述输入数据作为所述可执行代码的输入,在所述目标芯片架构的处理器中运行所述可执行代码,得到所述可执行代码的运行结果;The executing the executable code in the processor of the target chip architecture by using the runtime plug-in includes: using the runtime plug-in to use the input data as the input of the executable code, in the Running the executable code in the processor of the target chip architecture to obtain a running result of the executable code;
    所述方法还包括:向所述调度器发送所述运行结果。The method further includes: sending the running result to the scheduler.
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述目标任务的输入数据,包 括:The method according to claim 4, wherein the acquisition of the input data of the target task comprises:
    基于所述调度指令,从所述异构集群的文件管理器中获取所述目标任务的输入数据;Based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster;
    或者,接收所述调度器发送的所述目标任务的输入数据。Or, receive the input data of the target task sent by the scheduler.
  6. 一种任务调度方法,其特征在于,应用于异构集群中的调度器,所述异构集群还包括多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同;所述方法包括:A task scheduling method, characterized in that it is applied to a scheduler in a heterogeneous cluster, the heterogeneous cluster further includes a plurality of computing nodes, and at least two computing nodes in the plurality of computing nodes have different chip architectures; The methods described include:
    接收待调度的目标任务的调度需求信息,所述调度需求信息包括所述目标任务的资源需求,以及所述目标任务支持的至少两种芯片架构;receiving scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;
    基于所述调度需求信息,从所述多个计算节点中确定目标计算节点,所述目标计算节点中目标芯片架构的处理器的空闲资源量满足所述目标任务的资源需求,且所述目标芯片架构属于所述至少两种芯片架构;Based on the scheduling requirement information, a target computing node is determined from the plurality of computing nodes, the idle resource amount of the processor of the target chip architecture in the target computing node satisfies the resource requirement of the target task, and the target chip The architecture belongs to the at least two chip architectures;
    向所述目标计算节点发送针对所述目标任务的调度指令,所述调度指令用于指示所述目标计算节点通过所述目标任务的运行时插件,将所述目标任务的中间表示编译为所述目标芯片架构的可执行代码,并在所述目标芯片架构的处理器中运行所述可执行代码,其中,所述中间表示是对所述目标任务的源代码进行编译得到的与芯片架构无关的代码。Send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to compile the intermediate representation of the target task into the target task through the runtime plug-in of the target task The executable code of the target chip architecture is executed in the processor of the target chip architecture, wherein the intermediate representation is obtained by compiling the source code of the target task and is independent of the chip architecture code.
  7. 根据权利要求6所述的方法,其特征在于,所述调度需求信息还包括:所述至少两种芯片架构的优先级;所述基于所述调度需求信息,从所述多个计算节点中确定目标计算节点,包括:The method according to claim 6, wherein the scheduling requirement information further comprises: priorities of the at least two chip architectures; the determining from the plurality of computing nodes based on the scheduling requirement information Target compute nodes, including:
    按照所述至少两种芯片架构的优先级由高到低的顺序,依次检测所述多个计算节点中对应芯片架构的处理器的空闲资源量是否满足所述资源需求;According to the order of the priorities of the at least two chip architectures from high to low, sequentially detecting whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirements;
    若检测到目标芯片架构的处理器的空闲资源量满足所述资源需求,则将包含所述目标芯片架构的处理器的一个计算节点确定为目标计算节点。If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:The method according to claim 6 or 7, wherein the method further comprises:
    接收所述目标任务的中间表示和所述目标任务的运行时插件;receiving an intermediate representation of the target task and a runtime plugin for the target task;
    向所述目标计算节点发送所述中间表示和所述运行时插件。The intermediate representation and the runtime plugin are sent to the target compute node.
  9. 根据权利要求6至8任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 6 to 8, wherein the method further comprises:
    向所述目标计算节点发送所述目标芯片架构的架构标识。The architecture identifier of the target chip architecture is sent to the target computing node.
  10. 根据权利要求6至9任一所述的方法,其特征在于,所述目标任务为多个并行任务中的一个并行任务,所述调度需求信息还包括:所述多个并行任务的并行调度模式;The method according to any one of claims 6 to 9, wherein the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further comprises: parallel scheduling modes of the multiple parallel tasks ;
    所述基于所述调度需求信息,从所述多个计算节点中确定目标计算节点,包括:The determining a target computing node from the plurality of computing nodes based on the scheduling requirement information includes:
    若所述多个并行任务的并行调度模式为同步并行模式,则基于所述多个并行任务的资源需求之和,从所述多个计算节点中确定目标计算节点,所述异构集群中所述目标芯片架构的处理器的空闲资源量之和满足所述多个并行任务的资源需求之和;If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the target computing node is determined from the multiple computing nodes based on the sum of the resource requirements of the multiple parallel tasks. The sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements of the multiple parallel tasks;
    若所述多个并行任务的并行调度模式为理想并行模式,则基于所述目标任务的资源需求,从所述多个计算节点中确定目标计算节点;If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;
    其中,所述同步并行模式是指所述多个并行任务需同步执行,所述理想并行模式是指所述多个并行任务无需同步执行。The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously.
  11. 一种任务调度方法,其特征在于,所述方法包括:A task scheduling method, characterized in that the method comprises:
    对目标任务的源代码进行编译,得到所述目标任务的中间表示和所述目标任务的运行时插件,所述中间表示为与芯片架构无关的代码;Compile the source code of the target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is code independent of the chip architecture;
    发送所述中间表示和所述运行时插件;sending the intermediate representation and the runtime plugin;
    向异构集群中的调度器发送所述目标任务的调度需求信息,所述调度需求信息包括所述目标任务的资源需求,以及所述目标任务支持的至少两种芯片架构;Sending scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;
    其中,所述异构集群还包括多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同,所述调度需求信息用于指示所述调度器将所述目标任务调度至所述至少两个计算节点中的目标计算节点,所述目标计算节点中目标芯片架构的处理器的空闲资源量满足所述目标任务的资源需求,且所述目标芯片架构属于所述至少两种芯片架构,所述运行时插件用于供所述目标计算节点将所述中间表示编译为所述目标芯片架构的可执行代码。The heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to A target computing node in the at least two computing nodes, the amount of idle resources of the processors of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code for the target chip architecture.
  12. 根据权利要求11所述的方法,其特征在于,所述发送所述中间表示和所述运行时插件,包括:The method of claim 11, wherein the sending the intermediate representation and the runtime plug-in comprises:
    向所述调度器发送所述中间表示和所述运行时插件;sending the intermediate representation and the runtime plugin to the scheduler;
    或者,向所述异构集群中的文件管理器发送所述中间表示和所述运行时插件。Alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
  13. 一种目标计算节点,其特征在于,应用于异构集群,所述异构集群包括调度器和多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同,所述目标计算节点属于所述多个计算节点;所述目标计算节点还包括:A target computing node, characterized in that it is applied to a heterogeneous cluster, the heterogeneous cluster includes a scheduler and a plurality of computing nodes, and at least two computing nodes in the plurality of computing nodes have different chip architectures, and the target The computing node belongs to the plurality of computing nodes; the target computing node further includes:
    接收模块,用于接收所述调度器发送的针对目标任务的调度指令;a receiving module, configured to receive the scheduling instruction for the target task sent by the scheduler;
    获取模块,用于获取所述目标任务的中间表示以及所述目标任务的运行时插件,所述中间表示是对所述目标任务的源代码进行编译得到的与芯片架构无关的代码;an acquisition module for acquiring an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is a chip architecture-independent code obtained by compiling the source code of the target task;
    处理模块,用于基于所述调度指令,通过所述运行时插件将所述中间表示编译为所述目标芯片架构的可执行代码,并在所述目标芯片架构的处理器中运行所述可执行代码,其 中,所述目标计算节点包括所述目标芯片架构的处理器。a processing module, configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and run the executable code in the processor of the target chip architecture code, wherein the target computing node includes a processor of the target chip architecture.
  14. 根据权利要求13所述的目标计算节点,其特征在于,所述获取模块,用于:The target computing node according to claim 13, wherein the acquisition module is used for:
    基于所述调度指令,从所述异构集群的文件管理器中获取所述目标任务的中间表示以及所述目标任务的运行时插件;Based on the scheduling instruction, obtain the intermediate representation of the target task and the runtime plug-in of the target task from the file manager of the heterogeneous cluster;
    或者,接收所述调度器发送的所述目标任务的中间表示以及所述目标任务的运行时插件。Or, receive the intermediate representation of the target task and the runtime plug-in of the target task sent by the scheduler.
  15. 根据权利要求13或14所述的目标计算节点,其特征在于,所述接收模块,还用于接收所述调度器发送的所述目标芯片架构的架构标识;The target computing node according to claim 13 or 14, wherein the receiving module is further configured to receive the architecture identifier of the target chip architecture sent by the scheduler;
    所述处理模块,用于基于所述目标芯片架构的架构标识,通过所述运行时插件将所述中间表示编译为所述目标芯片架构的可执行代码。The processing module is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the architecture identifier of the target chip architecture.
  16. 根据权利要求13至15任一所述的目标计算节点,其特征在于,所述获取模块,还用于获取所述目标任务的输入数据;The target computing node according to any one of claims 13 to 15, wherein the obtaining module is further configured to obtain input data of the target task;
    所述处理模块,用于通过所述运行时插件将所述输入数据作为所述可执行代码的输入,在所述目标芯片架构的处理器中运行所述可执行代码,得到所述可执行代码的运行结果;The processing module is configured to use the input data as the input of the executable code through the runtime plug-in, run the executable code in the processor of the target chip architecture, and obtain the executable code the result of the operation;
    所述目标计算节点还包括:The target computing node also includes:
    发送模块,用于在所述处理模块得到所述可执行代码的运行结果之后,向所述调度器发送所述运行结果。A sending module, configured to send the running result to the scheduler after the processing module obtains the running result of the executable code.
  17. 根据权利要求16所述的目标计算节点,其特征在于,所述获取模块,用于:The target computing node according to claim 16, wherein the acquisition module is used for:
    基于所述调度指令,从所述异构集群的文件管理器中获取所述目标任务的输入数据;Based on the scheduling instruction, obtain the input data of the target task from the file manager of the heterogeneous cluster;
    或者,接收所述调度器发送的所述目标任务的输入数据。Or, receive the input data of the target task sent by the scheduler.
  18. 一种调度器,其特征在于,应用于异构集群,所述异构集群还包括多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同;所述调度器包括:A scheduler, characterized in that it is applied to a heterogeneous cluster, wherein the heterogeneous cluster further includes multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures; the scheduler includes:
    接收模块,用于接收待调度的目标任务的调度需求信息,所述调度需求信息包括所述目标任务的资源需求,以及所述目标任务支持的至少两种芯片架构;a receiving module, configured to receive scheduling requirement information of a target task to be scheduled, where the scheduling requirement information includes resource requirements of the target task and at least two chip architectures supported by the target task;
    确定模块,用于基于所述调度需求信息,从所述多个计算节点中确定目标计算节点,所述目标计算节点中目标芯片架构的处理器的空闲资源量满足所述目标任务的资源需求,且所述目标芯片架构属于所述至少两种芯片架构;a determining module, configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, where the idle resource amount of the processor of the target chip architecture in the target computing node satisfies the resource requirement of the target task, and the target chip architecture belongs to the at least two chip architectures;
    发送模块,用于向所述目标计算节点发送针对所述目标任务的调度指令,所述调度指令 用于指示所述目标计算节点通过所述目标任务的运行时插件,将所述目标任务的中间表示编译为所述目标芯片架构的可执行代码,并在所述目标芯片架构的处理器中运行所述可执行代码,所述中间表示是对所述目标任务的源代码进行编译得到的与芯片架构无关的代码。A sending module, configured to send a scheduling instruction for the target task to the target computing node, where the scheduling instruction is used to instruct the target computing node to transfer the middle of the target task through the runtime plug-in of the target task. Indicates that the executable code is compiled into the target chip architecture, and the executable code is run in the processor of the target chip architecture, and the intermediate representation is obtained by compiling the source code of the target task and the chip Architecture-independent code.
  19. 根据权利要求18所述的调度器,其特征在于,所述调度需求信息还包括:所述至少两种芯片架构的优先级;所述确定模块,用于:The scheduler according to claim 18, wherein the scheduling requirement information further comprises: priorities of the at least two chip architectures; and the determining module is configured to:
    按照所述至少两种芯片架构的优先级由高到低的顺序,依次检测所述多个计算节点中对应芯片架构的处理器的空闲资源量是否满足所述资源需求;According to the order of the priorities of the at least two chip architectures from high to low, sequentially detect whether the amount of idle resources of processors corresponding to the chip architectures in the plurality of computing nodes meets the resource requirements;
    若检测到目标芯片架构的处理器的空闲资源量满足所述资源需求,则将包含所述目标芯片架构的处理器的一个计算节点确定为目标计算节点。If it is detected that the amount of idle resources of the processor of the target chip architecture meets the resource requirement, a computing node including the processor of the target chip architecture is determined as the target computing node.
  20. 根据权利要求18或19所述的调度器,其特征在于,所述接收模块,还用于接收所述目标任务的中间表示和所述目标任务的运行时插件;The scheduler according to claim 18 or 19, wherein the receiving module is further configured to receive the intermediate representation of the target task and the runtime plug-in of the target task;
    所述发送模块,还用于向所述目标计算节点发送所述中间表示和所述运行时插件。The sending module is further configured to send the intermediate representation and the runtime plug-in to the target computing node.
  21. 根据权利要求18至20任一所述的调度器,其特征在于,所述发送模块,还用于向所述目标计算节点发送所述目标芯片架构的架构标识。The scheduler according to any one of claims 18 to 20, wherein the sending module is further configured to send the architecture identifier of the target chip architecture to the target computing node.
  22. 根据权利要求18至21任一所述的调度器,其特征在于,所述目标任务为多个并行任务中的一个并行任务,所述调度需求信息还包括:所述多个并行任务的并行调度模式;The scheduler according to any one of claims 18 to 21, wherein the target task is a parallel task among multiple parallel tasks, and the scheduling requirement information further comprises: parallel scheduling of the multiple parallel tasks model;
    所述确定模块,用于:The determining module is used for:
    若所述多个并行任务的并行调度模式为同步并行模式,则基于所述多个并行任务的资源需求之和,从所述多个计算节点中确定目标计算节点,所述异构集群中所述目标芯片架构的处理器的空闲资源量之和满足所述多个并行任务的资源需求之和;If the parallel scheduling mode of the multiple parallel tasks is the synchronous parallel mode, the target computing node is determined from the multiple computing nodes based on the sum of the resource requirements of the multiple parallel tasks. The sum of the idle resources of the processors of the target chip architecture satisfies the sum of the resource requirements of the multiple parallel tasks;
    若所述多个并行任务的并行调度模式为理想并行模式,则基于所述目标任务的资源需求,从所述多个计算节点中确定目标计算节点;If the parallel scheduling mode of the multiple parallel tasks is an ideal parallel mode, determining a target computing node from the multiple computing nodes based on the resource requirements of the target task;
    其中,所述同步并行模式是指所述多个并行任务需同步执行,所述理想并行模式是指所述多个并行任务无需同步执行。The synchronous parallel mode means that the multiple parallel tasks need to be executed synchronously, and the ideal parallel mode means that the multiple parallel tasks do not need to be executed synchronously.
  23. 一种主机,其特征在于,所述主机包括:A host, characterized in that the host comprises:
    编译模块,用于对目标任务的源代码进行编译,得到所述目标任务的中间表示和所述目标任务的运行时插件,所述中间表示为与芯片架构无关的代码;a compilation module, used for compiling the source code of the target task to obtain an intermediate representation of the target task and a runtime plug-in of the target task, where the intermediate representation is code independent of the chip architecture;
    第一发送模块,用于发送所述中间表示和所述运行时插件;a first sending module for sending the intermediate representation and the runtime plug-in;
    第二发送模块,用于向异构集群中的调度器发送所述目标任务的调度需求信息,所述调度需求信息包括所述目标任务的资源需求,以及所述目标任务支持的至少两种芯片架构;The second sending module is configured to send the scheduling requirement information of the target task to the scheduler in the heterogeneous cluster, where the scheduling requirement information includes the resource requirement of the target task and at least two chips supported by the target task architecture;
    其中,所述异构集群还包括多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同,所述调度需求信息用于指示所述调度器将所述目标任务调度至所述至少两个计算节点中的目标计算节点,所述目标计算节点中目标芯片架构的处理器的空闲资源量满足所述目标任务的资源需求,且所述目标芯片架构属于所述至少两种芯片架构,所述运行时插件用于供所述目标计算节点将所述中间表示编译为所述目标芯片架构的可执行代码。The heterogeneous cluster further includes multiple computing nodes, at least two computing nodes in the multiple computing nodes have different chip architectures, and the scheduling requirement information is used to instruct the scheduler to schedule the target task to A target computing node in the at least two computing nodes, the amount of idle resources of the processors of the target chip architecture in the target computing node satisfies the resource requirements of the target task, and the target chip architecture belongs to the at least two A chip architecture, the runtime plug-in is used for the target computing node to compile the intermediate representation into executable code for the target chip architecture.
  24. 根据权利要求23所述的主机,其特征在于,所述第一发送模块,用于:The host according to claim 23, wherein the first sending module is configured to:
    向所述调度器发送所述中间表示和所述运行时插件;sending the intermediate representation and the runtime plugin to the scheduler;
    或者,向所述异构集群中的文件管理器发送所述中间表示和所述运行时插件。Alternatively, the intermediate representation and the runtime plugin are sent to a file manager in the heterogeneous cluster.
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,所述指令由处理器执行以实现如权利要求1至12任一所述的任务调度方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions, and the instructions are executed by a processor to implement the task scheduling method according to any one of claims 1 to 12.
  26. 一种计算机设备,其特征在于,所述计算机设备包括:存储器,处理器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至12任一所述的任务调度方法。A computer device, characterized in that the computer device comprises: a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the computer program when the processor executes the computer program. The task scheduling method according to any one of claims 1 to 12.
  27. 一种任务调度系统,其特征在于,所述任务调度系统包括:如权利要求23或24所述的主机,如权利要求18至22任一所述的调度器,以及多个计算节点;A task scheduling system, characterized in that the task scheduling system comprises: the host according to claim 23 or 24, the scheduler according to any one of claims 18 to 22, and a plurality of computing nodes;
    所述多个计算节点中至少一个计算节点为如权利要求13至17任一所述的目标计算节点。At least one computing node among the plurality of computing nodes is the target computing node according to any one of claims 13 to 17 .
  28. 一种任务调度系统,其特征在于,所述任务调度系统包括:主机,调度器,以及多个计算节点,所述多个计算节点中至少两个计算节点的芯片架构不同;A task scheduling system, characterized in that the task scheduling system comprises: a host computer, a scheduler, and multiple computing nodes, and at least two computing nodes in the multiple computing nodes have different chip architectures;
    所述主机,用于对目标任务的源代码进行编译,得到所述目标任务的中间表示和所述目标任务的运行时插件,发送所述中间表示和所述运行时插件,以及向所述度器发送所述目标任务的调度需求信息,其中,所述中间表示为与芯片架构无关的代码,所述调度需求信息包括所述目标任务的资源需求,以及所述目标任务支持的至少两种芯片架构;The host is used to compile the source code of the target task, obtain the intermediate representation of the target task and the runtime plug-in of the target task, send the intermediate representation and the runtime plug-in, and send the intermediate representation and the runtime plug-in to the host. The server sends the scheduling requirement information of the target task, wherein the intermediate representation is code independent of the chip architecture, and the scheduling requirement information includes the resource requirements of the target task and at least two chips supported by the target task. architecture;
    所述调度器,用于基于所述调度需求信息,从所述多个计算节点中确定目标计算节点,并向所述目标计算节点发送针对所述目标任务的调度指令,其中,所述目标计算节点中目标芯片架构的处理器的空闲资源量满足所述目标任务的资源需求,且所述目标芯片架构属于所述至少两种芯片架构;The scheduler is configured to determine a target computing node from the plurality of computing nodes based on the scheduling requirement information, and send a scheduling instruction for the target task to the target computing node, wherein the target computing node The amount of idle resources of the processor of the target chip architecture in the node meets the resource requirements of the target task, and the target chip architecture belongs to the at least two chip architectures;
    所述目标计算节点,用于基于所述调度指令,通过所述运行时插件将所述中间表示编译为所述目标芯片架构的可执行代码,并通过所述运行时插件在所述目标芯片架构的处理器中运行所述可执行代码。The target computing node is configured to compile the intermediate representation into executable code of the target chip architecture through the runtime plug-in based on the scheduling instruction, and use the runtime plug-in to execute the code in the target chip architecture. run the executable code in a processor.
PCT/CN2021/142532 2021-02-07 2021-12-29 Task scheduling method, apparatus and system WO2022166480A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110167884.X 2021-02-07
CN202110167884.XA CN114911586A (en) 2021-02-07 2021-02-07 Task scheduling method, device and system

Publications (1)

Publication Number Publication Date
WO2022166480A1 true WO2022166480A1 (en) 2022-08-11

Family

ID=82740836

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142532 WO2022166480A1 (en) 2021-02-07 2021-12-29 Task scheduling method, apparatus and system

Country Status (2)

Country Link
CN (1) CN114911586A (en)
WO (1) WO2022166480A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971498B (en) * 2024-03-28 2024-05-31 麒麟软件有限公司 Scheduling method for GPU resources in computing cluster, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339419A (en) * 1990-06-25 1994-08-16 Hewlett-Packard Company ANDF compiler using the HPcode-plus compiler intermediate language
US20160371081A1 (en) * 2015-06-16 2016-12-22 Architecture Technology Corporation Dynamic computational acceleration using a heterogeneous hardware infrastructure
CN106415496A (en) * 2014-05-30 2017-02-15 苹果公司 Unified intermediate representation
CN107111505A (en) * 2015-01-19 2017-08-29 华为技术有限公司 System and method for performing algorithm on Heterogeneous Parallel Systems
CN110865814A (en) * 2019-10-30 2020-03-06 南京天数智芯科技有限公司 Compiler implementation method and system supporting heterogeneous computing core architecture
CN111045795A (en) * 2018-10-11 2020-04-21 浙江宇视科技有限公司 Resource scheduling method and device
CN112148294A (en) * 2019-06-27 2020-12-29 英特尔公司 Method and apparatus for intentional programming for heterogeneous systems

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339419A (en) * 1990-06-25 1994-08-16 Hewlett-Packard Company ANDF compiler using the HPcode-plus compiler intermediate language
CN106415496A (en) * 2014-05-30 2017-02-15 苹果公司 Unified intermediate representation
CN107111505A (en) * 2015-01-19 2017-08-29 华为技术有限公司 System and method for performing algorithm on Heterogeneous Parallel Systems
US20160371081A1 (en) * 2015-06-16 2016-12-22 Architecture Technology Corporation Dynamic computational acceleration using a heterogeneous hardware infrastructure
CN111045795A (en) * 2018-10-11 2020-04-21 浙江宇视科技有限公司 Resource scheduling method and device
CN112148294A (en) * 2019-06-27 2020-12-29 英特尔公司 Method and apparatus for intentional programming for heterogeneous systems
CN110865814A (en) * 2019-10-30 2020-03-06 南京天数智芯科技有限公司 Compiler implementation method and system supporting heterogeneous computing core architecture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971498B (en) * 2024-03-28 2024-05-31 麒麟软件有限公司 Scheduling method for GPU resources in computing cluster, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114911586A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
US11941400B2 (en) Methods and apparatus for intentional programming for heterogeneous systems
US10768989B2 (en) Virtual vector processing
EP2281236B1 (en) Just-ahead-of-time compilation
US8281311B2 (en) Executing a distributed software application on a plurality of compute nodes according to a compilation history
US11630798B1 (en) Virtualized multicore systems with extended instruction heterogeneity
US20120036514A1 (en) Method and apparatus for a compiler and related components for stream-based computations for a general-purpose, multiple-core system
US20060136878A1 (en) Method and apparatus for enabling compiler and run-time optimizations for data flow applications in multi-core architectures
US20230333913A1 (en) Methods and apparatus to configure heterogenous components in an accelerator
WO2023124543A1 (en) Data processing method and data processing apparatus for big data
Bezirgiannis et al. ABS: A high-level modeling language for cloud-aware programming
CN109542464A (en) Development deployment system, method and the storage medium of IoT equipment shell script
CN112860396A (en) GPU (graphics processing Unit) scheduling method and system based on distributed deep learning
US8918767B2 (en) Pattern-based compilation of asynchronous consumption
WO2022166480A1 (en) Task scheduling method, apparatus and system
US20220114469A1 (en) Methods and apparatus for parallel quantum computing
US11435989B2 (en) Thread-local return structure for asynchronous state machine
US9442782B2 (en) Systems and methods of interface description language (IDL) compilers
US11513841B2 (en) Method and system for scheduling tasks in a computing system
Plauth et al. CloudCL: distributed heterogeneous computing on cloud scale
Plauth et al. CloudCL: single-paradigm distributed heterogeneous computing for cloud infrastructures
WO2024060256A1 (en) Self-evolving and multi-versioning code
Samman et al. Architecture, on-chip network and programming interface concept for multiprocessor system-on-chip
JP2023533802A (en) shared data structure
EP4363966A1 (en) Compilation system and method
CN116136787A (en) Method and device for improving execution efficiency based on Golang protocol Cheng Qianru PHP script

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21924473

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21924473

Country of ref document: EP

Kind code of ref document: A1