WO2022236834A1 - Method and system for scheduling tasks - Google Patents

Method and system for scheduling tasks Download PDF

Info

Publication number
WO2022236834A1
WO2022236834A1 PCT/CN2021/093945 CN2021093945W WO2022236834A1 WO 2022236834 A1 WO2022236834 A1 WO 2022236834A1 CN 2021093945 W CN2021093945 W CN 2021093945W WO 2022236834 A1 WO2022236834 A1 WO 2022236834A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
dag
node
policy network
embeddings
Prior art date
Application number
PCT/CN2021/093945
Other languages
French (fr)
Inventor
Zhigang Hua
Gan LIU
Feng QI
Shuang Yang
Runzhong WANG
Original Assignee
Alipay (Hangzhou) Information Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay (Hangzhou) Information Technology Co., Ltd. filed Critical Alipay (Hangzhou) Information Technology Co., Ltd.
Priority to PCT/CN2021/093945 priority Critical patent/WO2022236834A1/en
Priority to CN202180088447.7A priority patent/CN116670684A/en
Publication of WO2022236834A1 publication Critical patent/WO2022236834A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • the disclosure generally relates to systems and methods for scheduling tasks, specifically, directed acyclic graphs (DAGs) -based task scheduling using reinforcement learning (RL) .
  • DAGs directed acyclic graphs
  • RL reinforcement learning
  • Scheduling computational tasks is a critical problem in many areas of computer science ranging from programming languages (e.g., compilation) , operating systems (e.g., parallel processing) , data engineering (e.g., distributed batch/streaming computation topology) , to machine learning (e.g., training graphs) .
  • the overall goal of the DAG-based scheduling problem is to find an optimal scheduling solution (execution order) so that the tasks can be executed with a minimal makespan or a shortest average waiting time.
  • These tasks are usually associated with numerous restrictions to which the scheduling solution must comply. For instance, some tasks may depend on other tasks, while some tasks may have resource constraints. Therefore, the DAG-based scheduling problem combines two well-known NP-hard problems: the minimum makespan problem and the bin packing problem. The former handles interdependence among tasks, and the latter handles one-dimensional or multi-dimensional resource constraints.
  • Existing solutions generally include solving an integer programming problem with branch-and-bound that is usually intractable in practice.
  • Other heuristic approaches such as Shortest Job First (SJF) , Highest Level First, Longest Job Time, Critical Path (CP) , and Random Priority, assign priorities to tasks and then execute the tasks when their dependent tasks are finished.
  • SJF Shortest Job First
  • CP Critical Path
  • Random Priority assign priorities to tasks and then execute the tasks when their dependent tasks are finished.
  • These heuristics are problem-independent and incapable of utilizing the dependencies as defined by the DAGs or the resource consumption constraints when scheduling a job. These methods often fail to obtain optimal scheduling solutions.
  • Various embodiments of the present specification may include systems, methods, and non-transitory computer-readable media for DAG-based task scheduling.
  • a method for DAG-based task scheduling comprising: obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks to be scheduled and processed by one or more processors, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; adding the one or more edges to the DAG to obtain an updated DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm for the one or more processors to process.
  • DAG directed acyclic graph
  • the scheduling the plurality of computing tasks based on the updated DAG comprises: applying the heuristic scheduling algorithm to the updated DAG to determine the scheduling of the plurality of computing tasks.
  • the obtaining a DAG representing a plurality of computing tasks comprises: receiving a plurality of jobs for parallel processing, each job comprising a plurality of tasks; generating a plurality of DAGs corresponding to the plurality of jobs, each DAG representing the plurality of tasks in a corresponding job; generating a new root node; and generating the DAG by connecting the plurality of DAGs to the new root node.
  • the generating embeddings for the plurality of nodes comprises: obtaining feature vectors of the plurality of nodes, a feature vector of each node comprising at least one of the following: a runtime of a computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes.
  • GNN graph neural network
  • the computing the embeddings of the plurality of nodes based on the GNN comprises: for each node of the plurality of nodes, inputting the feature vector of the each node through a first neural network to obtain a hidden representation of the each node; and receiving other hidden representations propagated from neighboring nodes to the each node; and updating the hidden representation of the each node based on the received other hidden representations propagated from the neighboring nodes.
  • the generating embeddings for the plurality of nodes in the DAG comprises: generating the embeddings for the plurality of nodes in the DAG based on a graph neural network (GNN) , wherein the GNN and the policy network are jointly trained based on a plurality of training DAGs using reinforcement learning (RL) .
  • GNN graph neural network
  • RL reinforcement learning
  • the policy network comprises a first policy network and a second policy network
  • the determining one or more edges comprises: identifying one or more starting nodes from the plurality of nodes in the DAG by inputting the embeddings of the plurality of nodes into the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG by inputting the embeddings of the one or more starting nodes into the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes.
  • the identifying the one or more starting nodes from the plurality of nodes in the DAG based on the first policy network comprises: inputting the embeddings of the plurality of nodes into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability represents a recommended chance to select a corresponding node as a starting node.
  • identifying the one or more ending nodes from the plurality of nodes in the DAG based on the embeddings of the one or more starting nodes and the second policy network comprises: for each of the one or more starting nodes, inputting the embedding of the starting node and the embeddings of the plurality of nodes in the DAG into the second policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability representing a recommended chance to select a corresponding node as an ending node that matches with the starting node.
  • the one or more edges are restrained from forming a close loop in the DAG, and the first and the second policy network each comprises a softmax layer configured to enforce the restrain.
  • the method further comprises jointly training the GNN and the policy network, the training comprising: at each time step of the training, identifying a new edge to be added to the DAG based on a current policy network; determining a first performance metric by applying the heuristic scheduling algorithm to the DAG; determining a second performance metric by applying the heuristic scheduling algorithm to the DAG with the new edge; determining a reward based on the first performance metric and the second performance metric; and updating parameters of the GNN and the current policy network based on the loss function comprising a plurality of the rewards across a plurality of time steps.
  • the heuristic scheduling algorithm comprises one of the following: Shortest Job First (SFJ) , Critical Path (CP) , and First-In-First-Out (FIFO) .
  • SFJ Shortest Job First
  • CP Critical Path
  • FIFO First-In-First-Out
  • a system for DAG-based task scheduling comprises one or more processors and one or more computer-readable memories coupled to the one or more processors and having instructions stored thereon that are executable by the one or more processors to perform the method of any of the preceding embodiments.
  • a non-transitory computer-readable storage medium is configured with instructions executable by one or more processors to cause the one or more processors to perform the method of any of the preceding embodiments.
  • an apparatus comprises a plurality of modules for performing the method of any of the preceding embodiments.
  • a system for DAG-based task scheduling may comprise a computer system comprising a processor and a non-transitory computer-readable storage medium storing instructions executable by the processor to cause the computer system to perform operations comprising obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; updating the DAG by adding the one or more edges to the DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm.
  • DAG directed acyclic graph
  • a non-transitory computer-readable storage medium may be configured with instructions executable by one or more processors to cause the one or more processors to perform operations comprising obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; updating the DAG by adding the one or more edges to the DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm.
  • DAG directed acyclic graph
  • Embodiments disclosed in the specification have one or more technical effects.
  • a trained reinforcement learning (RL) agent is employed to iteratively add directed edges to the DAG.
  • the trained RL agent may guarantee that these directed edges comply with scheduling constraints (e.g., priorities of execution and resource allocation) .
  • scheduling constraints e.g., priorities of execution and resource allocation
  • the original DAG scheduling problem is dramatically simplified to a proxy problem with an updated DAG, on which a traditional heuristic scheduling algorithm such as SJF and CP can be directly applied to obtain more efficient scheduling solution with a shorter makespan.
  • the RL agent may include one or more policy networks trained to select the potential directed edges to add to the DAG.
  • the networks may be trained based on a plurality of training DAGs (e.g., training data comprising a plurality of DAGs) and one specific heuristic scheduling algorithm. That is, for different heuristic scheduling algorithms, corresponding RL agents may be trained. This way, the described embodiments herein can be easily integrated with any existing algorithms without extensive modification, such as SJF, CP, etc.
  • the nodes corresponding to the tasks in the DAG may be encoded using a Graph Neural Network (GNN) .
  • GNN Graph Neural Network
  • the encoding process takes into account the topological structure of the DAG and the scheduling constraints such as the run time and resource requirement of each node corresponding to a task.
  • the feature vectors of the nodes in the DAG may be updated to include information propagated from neighboring nodes, which allows the scheduling to be more accurate and optimized at a global scale.
  • FIG. 1 illustrates an example DAG-based scheduling problem, in accordance with various embodiments.
  • FIG. 2 illustrates an example environment for scheduling computational tasks, in accordance with various embodiments.
  • FIG. 3 illustrates an example workflow for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
  • FIG. 4 illustrates example policy networks trained for adding directed edges to a DAG, in accordance with various embodiments.
  • FIG. 5 illustrates a diagram for constructing a DAG in accordance with some embodiments.
  • FIG. 6 illustrates an example method for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
  • FIG. 7 illustrates an example computer system in which any of the embodiments described herein may be implemented.
  • the approaches disclosed herein include a deep reinforcement learning framework for directed acyclic graph (DAG) scheduling.
  • DAG directed acyclic graph
  • These approaches are motivated by the observation that unsatisfactory schedules are often generated by existing solutions due entirely to the wrong ordering of a few task nodes (e.g., the nodes in a DAG representing tasks) while the rest is close to optimal. If these “tricky” nodes are correctly ordered, the quality of schedules would be dramatically improved. Moreover, if the correct ordering of these nodes can be given a priori by an oracle, the scheduling tasks would become substantially simpler such that conventional heuristics are able to find near-optimal solutions.
  • One way to achieve this is to add directed edges to break the ties among tricky job nodes, i.e., to explicitly require that one job node needs to be given priority over another when it comes to execution as well as resource allocation.
  • FIG. 1 illustrates an example DAG-based scheduling problem, in accordance with various embodiments, and how adding directed edges can improve the quality of the schedules generated by conventional heuristic algorithms such as shortest job first (SJF) and critical path (CP) .
  • a plurality of tasks for scheduling are represented as nodes in a DAG.
  • Each task may be associated with various attributes.
  • the nodes in FIG. 1 each includes two attributes: an amount of resources required for executing a corresponding task, and the run time (e.g., projected execution time) of the corresponding task.
  • the tasks may be associated with fewer, more, or different attributes.
  • the interdependencies among the task nodes are represented as the directed edges in the DAG.
  • both scheduling solutions determined by the heuristic scheduling algorithms SJF and CP yield a makespan of 21.
  • the SJF algorithm prioritizes the tasks with the shortest runtimes
  • CP algorithm ranks nodes by the maximum sum of task runtimes along the path to any of their leaf nodes in a DAG.
  • the “makespan” refers to the distance in time that elapses from the start of work to the end (i.e., from the point of starting the first task to the point of ending the last task) .
  • the new DAGs may allow SFJ and CP to find better scheduling solutions with makespans of 16.
  • the added directed edges simplify DAG scheduling by including additional constraints and convert the original problem to a simpler proxy.
  • a reinforcement learning (RL) agent e.g., a software program
  • the RL agent may be configured to determine the addition of the directed edges to the DAG based on one or more trained policy networks.
  • the RL agent may convert the original DAG to an updated DAG that allows conventional heuristic scheduling algorithms to generate optimal scheduling solutions.
  • FIG. 2 illustrates an example environment for scheduling computational tasks, in accordance with various embodiments.
  • the environment may comprise a computing system 102 for scheduling computational tasks 107.
  • the computational tasks 107 may include various tasks that can be executed by computing devices such as 105a and 105b. These tasks 107 may be associated with numerous execution requirement and constraints, such as interdependencies (e.g., one or more tasks may be prerequisites of one or more other tasks) , resource consumptions (e.g., how much computational resources are required for executing one task) , runtime (e.g., execution duration of one task) , other suitable attributes, or any combination thereof.
  • interdependencies e.g., one or more tasks may be prerequisites of one or more other tasks
  • resource consumptions e.g., how much computational resources are required for executing one task
  • runtime e.g., execution duration of one task
  • Common examples may include tasks submitted from a plurality of clients to a server in which the tasks need to be scheduled to execute (either on the server or be distributed to other computing devices) .
  • the computing devices 105a and 105b may include various devices such as computers, smart devices, edge devices. Even though the devices 105a and 105b are illustrated as separated from the computing system 102, they may refer to components within the computing system 102.
  • the computing system 102 may include a staging area for caching the received tasks 107.
  • the computing system 102 may determine optimal scheduling solutions for the cached tasks 107. The determination is subject to the scheduling constraints such as the interdependencies among the tasks 107 (e.g., certain tasks cannot be executed until their prerequisite tasks are executed) and resource consumptions of the tasks 107. For instance, the total computational resource consumption of the tasks scheduled to be executed concurrently cannot exceed the available computational resources.
  • the components of the computing system 102 in FIG. 1 are illustrative. Depending on the implementation, the computing system 102 may include additional, fewer, or alternative components.
  • the computing system 102 may be implemented in one or more networks (e.g., enterprise networks) , one or more endpoints, one or more servers, or one or more clouds.
  • the computing system 102 may include hardware or software which manages access to a centralized resource or service in a network.
  • a cloud may include a cluster of servers and other devices distributed across a network.
  • the computing system 102 may include a DAG generating component 112, an embedding component 114, a determining component 116, and a scheduling component 118.
  • the computing system 102 may include one or more processors (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller or microprocessor, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information) and one or more memories (e.g., permanent memory, temporary memory, non-transitory computer-readable storage medium) .
  • the one or more memories may be configured with instructions executable by the one or more processors.
  • the processor (s) may be configured to perform various operations by interpreting machine-readable instructions stored in the memory.
  • the DAG generating component 112 of the computing system 102 may be configured to obtain a directed acyclic graph (DAG) representing a plurality of received computing tasks for scheduling.
  • the DAG may include a plurality of nodes corresponding to the plurality of computing tasks and a plurality of directed edges representing the interdependencies among the plurality of computing tasks.
  • each node in the DAG may be associated with a feature vector containing various attributes of the corresponding computational task such as resource consumptions, mandatory starting time and/or ending time, prerequisite (parent) tasks, other suitable attributes, or any combination thereof.
  • the DAG may be constructed by another entity and sent to the computing system 102, or constructed by the DAG generating component 112 of the computing system 102 based on the tasks.
  • the embedding component 114 of the computing system 102 may be configured to generate embeddings for the plurality of nodes in the DAG.
  • the nodes in the DAG may be associated with feature vectors.
  • the embedding component 114 may update the feature vectors of the nodes using a graph neural network (GNN) .
  • the feature vector of each node may embed information from the feature vectors of the node’s neighboring nodes.
  • the neighboring nodes may refer not only to the immediately neighboring nodes but also remotely neighboring nodes.
  • the embedding process may include: obtaining feature vectors of the plurality of nodes, the feature vector of each node comprising at least one of the following: a runtime of the computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes.
  • GNN graph neural network
  • the feature vector of the each node may go through a first neural network to obtain a hidden representation of the each node.
  • the embedding of a node refers to an encoded form of a feature vector of the node, and the encoding/embedding is based on the feature vectors of neighboring nodes of the node.
  • the node may also receive other hidden representations propagated from its neighboring nodes. The hidden representation of this node may then be updated based on the received hidden representations propagated from its neighboring nodes.
  • the determining component 116 of the computing system 102 may be configured to determine one or more edges to be added to the DAG and update the DAG by adding the one or more edges to the DAG.
  • the determining component 116 may include a reinforcement learning agent that follows a policy network.
  • the policy network may be trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm.
  • the policy network may include at least two policy networks. The first policy network is trained to identify starting nodes and the second policy network is trained to identify ending nodes based on the starting nodes identified by the first policy network.
  • the one or more edges may be determined by: identifying one or more starting nodes from the plurality of nodes in the DAG based on the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG based on embeddings of the one or more starting nodes and the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes.
  • the first and second policy networks share a same network structure but have different input parameter configurations.
  • the one or more edges are restrained from forming circles in the DAG, and first policy network and the second policy network each comprises a softmax layer configured to enforce the restrain. A detailed description regarding the policy networks may refer to FIG. 4.
  • the scheduling component 118 may be configured to determine the scheduling of the plurality of tasks 107 based on the updated DAG with the added edges and the heuristic scheduling algorithm. The determination may include applying the heuristic scheduling algorithm to the updated DAG to determine a schedule.
  • the heuristic scheduling algorithm may be the same algorithm that is used during the training of the policy networks in determining module 116. It means that the policy networks are customized for the heuristic scheduling algorithm, and different heuristic scheduling algorithms may be used to train different policy networks.
  • FIG. 3 illustrates an example workflow 300 for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
  • the example workflow 300 may include two phases, a training phase 302 and an application phase 303.
  • the training phase 302 and the applications phase 303 may be carried out by the same or different entities.
  • the training phase 302 may be performed offline.
  • both the training phase 302 and the application phase 303 may be performed online. That is, the data collected from the application phase 303 such as inputting DAG, actions took (e.g., edges added) , and rewards (e.g., makespan reduction) may be used as part of the training phase 302.
  • the training phase 302 may include a plurality of steps illustrated in FIG. 3. Depending on the implementation, the training phase 302 may include fewer, more, or alternative steps, and the steps may be performed in a different order or in parallel.
  • the training phase 302 may include a reinforcement learning (RL) agent 380 exploring and learning two networks: a graph neural network (GNN) 390A and a policy network 390B.
  • the GNN 390A may be trained to encode the nodes in a DAG so that the feature vectors of the nodes are embedded with configuration information propagated from neighboring nodes and the topological structure of the DAG.
  • the policy network 390B may be trained to make recommended actions to the RL agent 380 in response to a state.
  • the action refers to adding a directed edge between a starting node and an ending node to the DAG that does not conflict with the scheduling restrictions of the DAG. For example, a conflict occurs when there is an existing directed path between the starting node and the ending node.
  • the selection of the starting node is conditioned on the encoded feature vectors (also called embeddings) of the nodes generated by the GNN 390A
  • the selection of the ending node is conditioned on the encoded feature vectors of the nodes generated by the GNN 390A and the selected starting node.
  • the training of the GNN 390 and the policy network 390B may be based on a plurality of historical DAGs or synthetic DAGs (also referred to as training DAGs) 360 and a specific scheduling algorithm 370.
  • the scheduling algorithm 370 may be a heuristic scheduling algorithm such as SJF, CP, etc., or another suitable algorithm.
  • the scheduling algorithm 370 may be applied to a DAG to determine a scheduling solution (e.g., an execution plan) for the tasks represented by the DAG.
  • the training process 302 may include a reinforcement learning process formulated based on the following configuration.
  • a DAG graph G ⁇ n, N 0 , E 0 ⁇ , where n is the number of nodes in the DAG, N 0 is the set of n nodes, and E 0 is the set of directed edges in the DAG.
  • Action adding a directed edge a in G by sequentially selecting a starting node a 1 and an ending node a 2 in G.
  • the added edge may not conflict with the graph structure of G, and a conflict occurs when there is an existing directed edge between a 1 and a 2 , or the addition of the directed edge will form a close loop within G.
  • the selection of starting node a 1 is conditioned on G, and the selection of ending node a 2 is conditioned on G and a 1 .
  • the time complexity of adding an edge in G is reduced from O (n 2 ) to O (n) .
  • the reward is defined as the difference between makespans of G′ and G using the scheduling algorithm 370. Depending on the implantation, the reward can be easily switched to other measurements such as average waiting time.
  • the GNN 390A may be structured as an L-layer neural network with parameters of where 1 ⁇ l ⁇ L, and L is an integer greater than 1.
  • the GNN 390A may be used similarly to compute graph embeddings for the nodes in a DAG.
  • the original feature vectors of each node in G may include attributes such as runtime and required computational resources (e.g., CPU time, storage space, number of cores) .
  • these feature vectors may be transformed into a hidden representation e 0 embedded with information propagated from neighboring nodes through a plurality of iterations of message passing.
  • An example process using the GNN 390A to generate embeddings for the nodes in G may include the following steps. Step 1, the edges in G may be reversed to form a new graph G T . Step 2, at each iteration of message passing in G T , each node’s embedding is propagated to its neighbors and updated as follows:
  • ⁇ () stands for activation function
  • H is the number of message passing iterations
  • h is the current iteration
  • N i represents the set of neighbors in G T (e.g., the children nodes in G) .
  • the output of the GNN 390A may include a deep embedding (also referred to updated feature vector) for each node.
  • the entire DAG may also be represented as a global embedding determined based on the embeddings of all the nodes within the DAG. The global embedding of the DAG may be used as one of the inputs to the policy network 390B.
  • the GNN 380 and the policy network 390B may be jointly trained based on a policy gradient algorithm.
  • the training process may include a plurality of rollouts, denoted as N rollouts. In each rollout, a plurality of edges, denoted as M edges are added across a plurality of time steps. At j th time step of i th rollout, a triplet is recorded for training purposes:
  • G i, j is the updated DAG graph with added edges, and refer to the starting node and the ending node of an added edge a i, j respectively, and r i, j represents the reward computed as the reduction of makespan (or average waiting time, or another suitable metrics) of G i, j after an edge a i, j is added.
  • the cumulative reward at the j th time step may be a discounted sum of incremental rewards from the j th time step until completion of a rollout:
  • r is the discounting factor that may be set as 1.0 as a default value.
  • a baseline reward at each time step j may be computed by the following equation, which is subtracted from rewards to reduce variance:
  • a joint loss function may be defined as:
  • represents the parameters of the GNN 390A
  • represents the parameters of the policy network 390B.
  • the policy network 390B may include two policy networks: a first one is trained for determining starting nodes and a second one is trained for determining ending nodes. Denoting the parameters of the first and second policy networks as ⁇ 1 and ⁇ 2 respectively, the loss function may be represented as:
  • the GNN 390 and the policy network 390B may be used in the application phase 303 to determine edges to be added to an incoming DAG representing a plurality of to-be-scheduled tasks.
  • the application phase 303 may start with obtaining a DAG at step 310 to represent the to-be-scheduled tasks.
  • the DAG may be constructed based on the tasks, or received from another entity or device that performed the construction.
  • a plurality of DAGs representing a plurality of multi-task jobs may be received, and the tasks within the jobs may be scheduled altogether.
  • these received DAGs may be first aggregated into one DAG with a new root node.
  • FIG. 5 illustrates a diagram for constructing a DAG in accordance with some embodiments. As shown in FIG. 5, three jobs 510, 520, and 530 may each include a plurality of interdependent tasks that are represented by three DAGs. By adding a new root node 510 and attaching the three DAGs to the new root node 510, a new DAG may be formed as the output of step 310.
  • the DAG output from step 310 may be fed into the GNN 390A to generate embeddings at step 320.
  • the RL agent 380 may recommend directed edges based on the trained policy network 390B at step 330.
  • the trained policy network 390B may include two policy networks for determining the directed edges at step 330, which may include identifying one or more starting nodes from the plurality of nodes in the DAG based on the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG based on embeddings of the one or more starting nodes and the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes.
  • the embeddings of the plurality of nodes may be fed into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes. Each probability represents a recommended chance to select a corresponding node as a starting node.
  • one or more starting nodes may be determined.
  • combinations of the embedding of the starting node and the embeddings of other nodes may be fed into the second policy network to obtain a plurality of probabilities respectively corresponding to the other nodes.
  • the “other nodes” may refer to the plurality of nodes except for the starting node and the nodes that are restrained from having a directed edge from the starting node.
  • each input to the second policy network may include the embedding of the starting node and the embedding of another node, and yield an output of a probability that the another node is recommended as the ending node corresponding to the starting node.
  • one ending node may be selected based on the recommended probabilities generated by the second policy network.
  • the DAG may be updated by adding the directed edges connecting the starting nodes and corresponding ending nodes at step 340.
  • the scheduling algorithm 370 may then be applied to the updated DAG to determine the scheduling solution for the tasks at step 350.
  • the tasks may be executed according to the determined scheduling solution to reach an optimal makespan or average waiting time or another suitable performance measurement.
  • FIG. 4 illustrates example policy networks trained for adding directed edges to a DAG, in accordance with various embodiments.
  • the components or layers in FIG. 4 are for illustrative purposes only. Depending on the implementation, the policy networks in FIG. 4 may include fewer, more, or alternative components or layers.
  • the policy network 410 and the policy network 420 may be collectively referred to as the policy network 390B in FIG. 3 and jointly trained.
  • the policy network 410 may be referred to as a first policy network trained to determine starting nodes in a DAG
  • the policy network 420 may be referred to as a second policy network trained to determine ending nodes in the DAG. The starting nodes and the ending nodes then are used to determine the directed edges to be added to the DAG to improve the performance of the task scheduling algorithm.
  • the inputs to the policy networks 410 and 420 may be different.
  • the inputs to the policy network 410 may include a plurality of pairs, each pair including a global embedding of the DAG and one embedding (e.g., feature vector) of a node in the DAG.
  • the global embedding of the DAG may be the average embedding of all the nodes, and the embedding of a node may be the feature vector of the node after going through the GNN 390A in FIG. 3.
  • the inputs to the policy network 420 may include more information related to the already determined starting nodes. As shown in FIG. 4, the inputs to the policy network 420 may include a plurality of triplets.
  • Each triplet includes the overall embedding of the DAG, the embedding of one starting node, and the embedding of a different node.
  • the different node may refer to a node in the DAG that does not have an existing edge with the starting node and is not restricted from having an edge with the starting node.
  • both policy networks 410 and 420 may include a plurality of neural network layers to extract features from the inputs.
  • a softmax layer may be implemented to enforce one or more edge-addition restrictions, such as no close loop is allowed, and the newly added edge cannot conflict with existing edges (e.g., the dependency indicated by the new edge cannot contradict with existing task dependencies) .
  • the outputs of the policy network 410 may include a plurality of probabilities respectively corresponding to the plurality of nodes in the DAG. Each probability generated by the policy network 410 may represent a recommended chance of selecting the corresponding node as a starting node for the DAG.
  • the outputs of the policy network 420 may also include a plurality of probabilities. Each probability generated by the policy network 420 may represent a conditional probability of selecting the corresponding node as an ending node for a starting node and the DAG.
  • FIG. 6 illustrates an example method 600 for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
  • the method 600 may be implemented by the computing system 102 shown in FIG. 2, and correspond to the flow shown in FIG. 3. Depending on the implementation, the method 600 may have additional, fewer, or alternative steps.
  • Block 610 includes obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks.
  • the obtaining a DAG representing a plurality of computing tasks comprises: receiving a plurality of jobs for parallel processing, each job comprising a plurality of tasks; generating a plurality of DAGs corresponding to the plurality of jobs, each DAG representing the plurality of tasks in a corresponding job; generating a new root node; and generating the DAG by connecting the plurality of DAGs to the new root node.
  • Block 620 includes generating embeddings for the plurality of nodes in the DAG.
  • the generating embeddings for the plurality of nodes comprises: obtaining feature vectors of the plurality of nodes, a feature vector of each node comprising at least one of the following: a runtime of a computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes.
  • GNN graph neural network
  • the computing the embeddings of the plurality of nodes based on the GNN comprises: for each node of the plurality of nodes, inputting the feature vector of the each node through a first neural network to obtain a hidden representation of the each node; and receiving other hidden representations propagated from neighboring nodes to the each node; and updating the hidden representation of the each node based on the received other hidden representations propagated from the neighboring nodes.
  • the generating embeddings for the plurality of nodes in the DAG comprises: generating the embeddings for the plurality of nodes in the DAG based on a graph neural network (GNN) , wherein the GNN and the policy network are jointly trained based on a plurality of training DAGs using reinforcement learning (RL) .
  • GNN graph neural network
  • RL reinforcement learning
  • Block 630 includes determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm.
  • the policy network comprises a first policy network and a second policy network
  • the determining one or more edges comprises: identifying one or more starting nodes from the plurality of nodes in the DAG by inputting the embeddings of the plurality of nodes into the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG by inputting the embeddings of the one or more starting nodes into the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes.
  • the identifying the one or more starting nodes from the plurality of nodes in the DAG based on the first policy network comprises: inputting the embeddings of the plurality of nodes into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability represents a recommended chance to select a corresponding node as a starting node.
  • the identifying the one or more ending nodes from the plurality of nodes in the DAG based on the embeddings of the one or more starting nodes and the second policy network comprises: for each of the one or more starting nodes, inputting the embedding of the starting node and the embeddings of the plurality of nodes in the DAG into the second policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability representing a recommended chance to select a corresponding node as an ending node that matches with the starting node.
  • Block 640 includes updating the DAG by adding the one or more edges to the DAG.
  • the one or more edges are restrained from forming a close loop in the DAG, and the first and the second policy network each comprises a softmax layer configured to enforce the restrain.
  • Block 650 includes scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm.
  • the scheduling the plurality of computing tasks based on the updated DAG comprises: applying the heuristic scheduling algorithm to the updated DAG to determine the scheduling of the plurality of computing tasks.
  • the heuristic scheduling algorithm comprises one of the following: shortest job first (SFJ) , critical path (CP) , and first-in-first-out (FIFO) .
  • the method 600 further comprises jointly training the GNN and the policy network, the training comprising: at each time step of the training, identifying a new edge to be added to the DAG based on a current policy network; determining a first performance metric by applying the heuristic scheduling algorithm to the DAG; determining a second performance metric by applying the heuristic scheduling algorithm to the DAG with the new edge; determining a reward based on the first performance metric and the second performance metric; and updating parameters of the GNN and the current policy network based on the loss function comprising a plurality of the rewards across a plurality of time steps.
  • FIG. 7 illustrates an example computer system in which any of the embodiments described herein may be implemented.
  • the electronic device may be used to implement one or more components of the systems and the methods shown in FIGs. 1-6.
  • the electronic device 700 may comprise a bus 702 or other communication mechanism for communicating information and one or more hardware processors 704 coupled with bus 702 for processing information.
  • Hardware processor (s) 704 may be, for example, one or more general purpose microprocessors.
  • the electronic device 700 may also include a main memory 706, such as a random-access memory (RAM) , cache and/or other dynamic storage devices, coupled to bus 702 for storing information and instructions to be executed by processor (s) 704.
  • Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor (s) 704.
  • Such instructions when stored in storage media accessible to processor (s) 704, may render electronic device 700 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • Main memory 706 may include non-volatile media and/or volatile media.
  • Non-volatile media may include, for example, optical or magnetic disks.
  • Volatile media may include dynamic memory.
  • Common forms of media may include, for example, a RAM, a DRAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, or networked versions of the same.
  • the electronic device 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the electronic device may cause or program electronic device 700 to be a special-purpose machine.
  • the techniques herein are performed by electronic device 700 in response to processor (s) 704 executing one or more sequences of one or more instructions contained in main memory 706. Such instructions may be read into main memory 706 from another storage medium, such as storage device 707. Execution of the sequences of instructions contained in main memory 706 may cause processor (s) 704 to perform the process steps described herein.
  • the processes/methods disclosed herein may be implemented by computer program instructions stored in main memory 706. When these instructions are executed by processor (s) 704, they may perform the steps as shown in corresponding figures and described above.
  • hard-wired circuitry may be used in place of or in combination with software instructions.
  • the electronic device 700 also includes a communication interface 710 coupled to bus 702.
  • Communication interface 710 may provide a two-way data communication coupling to one or more network links that are connected to one or more networks.
  • communication interface 710 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicate with a WAN) .
  • LAN local area network
  • Wireless links may also be implemented.
  • processors or processor-implemented engines may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm) . In other example embodiments, the processors or processor-implemented engines may be distributed across a number of geographic locations.
  • the software product may be stored in a storage medium, comprising a number of instructions to cause a computing device (which may be a personal computer, a server, a network device, and the like) to execute all or some steps of the methods of the embodiments of the present application.
  • the storage medium may comprise a flash drive, a portable hard drive, ROM, RAM, a magnetic disk, an optical disc, another medium operable to store program code, or any combination thereof.
  • Particular embodiments further provide a system comprising a processor and a non-transitory computer-readable storage medium storing instructions executable by the processor to cause the system to perform operations corresponding to steps in any method of the embodiments disclosed above.
  • Particular embodiments further provide a non-transitory computer-readable storage medium configured with instructions executable by one or more processors to cause the one or more processors to perform operations corresponding to steps in any method of the embodiments disclosed above.
  • Embodiments disclosed herein may be implemented through a cloud platform, a server or a server group (hereinafter collectively the “service system” ) that interacts with a client.
  • the client may be a terminal device, or a client registered by a user at a platform, wherein the terminal device may be a mobile terminal, a personal computer (PC) , and any device that may be installed with a platform application program.
  • PC personal computer

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for DAG-based task scheduling. The method may include: obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks to be scheduled and processed by one or more processors. The DAG includes a plurality of nodes representing the plurality of computing tasks. The method further includes generating embeddings for the plurality of nodes in the DAG, and determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network. The policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm. The method further includes adding the one or more edges to the DAG to obtain an updated DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm for the one or more processors to process.

Description

METHOD AND SYSTEM FOR SCHEDULING TASKS TECHNICAL FIELD
The disclosure generally relates to systems and methods for scheduling tasks, specifically, directed acyclic graphs (DAGs) -based task scheduling using reinforcement learning (RL) .
BACKGROUND
Scheduling computational tasks, commonly represented by directed acyclic graphs (DAGs) , is a critical problem in many areas of computer science ranging from programming languages (e.g., compilation) , operating systems (e.g., parallel processing) , data engineering (e.g., distributed batch/streaming computation topology) , to machine learning (e.g., training graphs) . The overall goal of the DAG-based scheduling problem is to find an optimal scheduling solution (execution order) so that the tasks can be executed with a minimal makespan or a shortest average waiting time. These tasks are usually associated with numerous restrictions to which the scheduling solution must comply. For instance, some tasks may depend on other tasks, while some tasks may have resource constraints. Therefore, the DAG-based scheduling problem combines two well-known NP-hard problems: the minimum makespan problem and the bin packing problem. The former handles interdependence among tasks, and the latter handles one-dimensional or multi-dimensional resource constraints.
Existing solutions generally include solving an integer programming problem with branch-and-bound that is usually intractable in practice. Other heuristic approaches, such as Shortest Job First (SJF) , Highest Level First, Longest Job Time, Critical Path (CP) , and Random Priority, assign priorities to tasks and then execute the tasks when their dependent tasks are finished. These heuristics are problem-independent and incapable of utilizing the dependencies as defined by the DAGs or the resource consumption constraints when scheduling a job. These methods often fail to obtain optimal scheduling solutions.
SUMMARY
Various embodiments of the present specification may include systems, methods, and non-transitory computer-readable media for DAG-based task scheduling.
According to one aspect, a method for DAG-based task scheduling comprising: obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks to be scheduled and processed by one or more processors, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; adding the one or more edges to the DAG to obtain an updated DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm for the one or more processors to process.
In some embodiments, the scheduling the plurality of computing tasks based on the updated DAG comprises: applying the heuristic scheduling algorithm to the updated DAG to determine the scheduling of the plurality of computing tasks.
In some embodiments, the obtaining a DAG representing a plurality of computing tasks comprises: receiving a plurality of jobs for parallel processing, each job comprising a plurality of tasks; generating a plurality of DAGs corresponding to the plurality of jobs, each DAG representing the plurality of tasks in a corresponding job; generating a new root node; and generating the DAG by connecting the plurality of DAGs to the new root node.
In some embodiments, the generating embeddings for the plurality of nodes comprises: obtaining feature vectors of the plurality of nodes, a feature vector of each node comprising at least one of the following: a runtime of a computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes.
In some embodiments, the computing the embeddings of the plurality of nodes based on the GNN comprises: for each node of the plurality of nodes, inputting the feature vector of the each node through a first neural network to obtain a hidden representation of the each node; and receiving other hidden representations propagated from neighboring nodes to the each node; and updating the hidden representation of the each node based on the received other hidden representations propagated from the neighboring nodes.
In some embodiments, the generating embeddings for the plurality of nodes in the DAG comprises: generating the embeddings for the plurality of nodes in the DAG based on a graph neural network (GNN) , wherein the GNN and the policy network are jointly trained based on a plurality of training DAGs using reinforcement learning (RL) .
In some embodiments, the policy network comprises a first policy network and a second policy network, and the determining one or more edges comprises: identifying one or more starting nodes from the plurality of nodes in the DAG by inputting the embeddings of the plurality of nodes into the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG by inputting the embeddings of the one or more starting nodes into the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes.
In some embodiments, the identifying the one or more starting nodes from the plurality of nodes in the DAG based on the first policy network comprises: inputting the embeddings of the plurality of nodes into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability represents a recommended chance to select a corresponding node as a starting node.
In some embodiments, identifying the one or more ending nodes from the plurality of nodes in the DAG based on the embeddings of the one or more starting nodes and the second policy network comprises: for each of the one or more starting nodes, inputting the embedding of the starting node and the embeddings of the plurality of nodes in the DAG into the second policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability representing a recommended chance to select a corresponding node as an ending node that matches with the starting node.
In some embodiments, the one or more edges are restrained from forming a close loop in the DAG, and the first and the second policy network each comprises a softmax layer configured to enforce the restrain.
In some embodiments, the method further comprises jointly training the GNN and the policy network, the training comprising: at each time step of the training, identifying a new edge to be added to the DAG based on a current policy network; determining a first  performance metric by applying the heuristic scheduling algorithm to the DAG; determining a second performance metric by applying the heuristic scheduling algorithm to the DAG with the new edge; determining a reward based on the first performance metric and the second performance metric; and updating parameters of the GNN and the current policy network based on the loss function comprising a plurality of the rewards across a plurality of time steps.
In some embodiments, the heuristic scheduling algorithm comprises one of the following: Shortest Job First (SFJ) , Critical Path (CP) , and First-In-First-Out (FIFO) .
According to other embodiments, a system for DAG-based task scheduling comprises one or more processors and one or more computer-readable memories coupled to the one or more processors and having instructions stored thereon that are executable by the one or more processors to perform the method of any of the preceding embodiments.
According to yet other embodiments, a non-transitory computer-readable storage medium is configured with instructions executable by one or more processors to cause the one or more processors to perform the method of any of the preceding embodiments.
According to still other embodiments, an apparatus comprises a plurality of modules for performing the method of any of the preceding embodiments.
According to another aspect, a system for DAG-based task scheduling may comprise a computer system comprising a processor and a non-transitory computer-readable storage medium storing instructions executable by the processor to cause the computer system to perform operations comprising obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; updating the DAG by adding the one or more edges to the DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm. :
According to yet another aspect, a non-transitory computer-readable storage medium may be configured with instructions executable by one or more processors to cause the one or more processors to perform operations comprising obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks; generating embeddings for the plurality of nodes in the DAG; determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm; updating the DAG by adding the one or more edges to the DAG; and scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm.
Embodiments disclosed in the specification have one or more technical effects. In some embodiments, for a DAG representing a plurality of tasks to be scheduled, a trained reinforcement learning (RL) agent is employed to iteratively add directed edges to the DAG. The trained RL agent may guarantee that these directed edges comply with scheduling constraints (e.g., priorities of execution and resource allocation) . By doing so, the original DAG scheduling problem is dramatically simplified to a proxy problem with an updated  DAG, on which a traditional heuristic scheduling algorithm such as SJF and CP can be directly applied to obtain more efficient scheduling solution with a shorter makespan. In some embodiments, the RL agent may include one or more policy networks trained to select the potential directed edges to add to the DAG. The networks may be trained based on a plurality of training DAGs (e.g., training data comprising a plurality of DAGs) and one specific heuristic scheduling algorithm. That is, for different heuristic scheduling algorithms, corresponding RL agents may be trained. This way, the described embodiments herein can be easily integrated with any existing algorithms without extensive modification, such as SJF, CP, etc. In some embodiments, in order to accurately represent the tasks to be scheduled in the DAG, the nodes corresponding to the tasks in the DAG may be encoded using a Graph Neural Network (GNN) . The encoding process takes into account the topological structure of the DAG and the scheduling constraints such as the run time and resource requirement of each node corresponding to a task. With the GNN, the feature vectors of the nodes in the DAG may be updated to include information propagated from neighboring nodes, which allows the scheduling to be more accurate and optimized at a global scale.
These and other features of the systems, methods, and non-transitory computer-readable media disclosed herein, as well as and the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates an example DAG-based scheduling problem, in accordance with various embodiments.
FIG. 2 illustrates an example environment for scheduling computational tasks, in accordance with various embodiments.
FIG. 3 illustrates an example workflow for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
FIG. 4 illustrates example policy networks trained for adding directed edges to a DAG, in accordance with various embodiments.
FIG. 5 illustrates a diagram for constructing a DAG in accordance with some embodiments.
FIG. 6 illustrates an example method for DAG-based scheduling with reinforcement learning, in accordance with various embodiments.
FIG. 7 illustrates an example computer system in which any of the embodiments described herein may be implemented.
DETAILED DESCRIPTION
The approaches disclosed herein include a deep reinforcement learning framework for directed acyclic graph (DAG) scheduling. These approaches are motivated by the observation that unsatisfactory schedules are often generated by existing solutions due  entirely to the wrong ordering of a few task nodes (e.g., the nodes in a DAG representing tasks) while the rest is close to optimal. If these “tricky” nodes are correctly ordered, the quality of schedules would be dramatically improved. Moreover, if the correct ordering of these nodes can be given a priori by an oracle, the scheduling tasks would become substantially simpler such that conventional heuristics are able to find near-optimal solutions. One way to achieve this is to add directed edges to break the ties among tricky job nodes, i.e., to explicitly require that one job node needs to be given priority over another when it comes to execution as well as resource allocation.
FIG. 1 illustrates an example DAG-based scheduling problem, in accordance with various embodiments, and how adding directed edges can improve the quality of the schedules generated by conventional heuristic algorithms such as shortest job first (SJF) and critical path (CP) . In FIG. 1, a plurality of tasks for scheduling are represented as nodes in a DAG. Each task may be associated with various attributes. For illustrative purposes, the nodes in FIG. 1 each includes two attributes: an amount of resources required for executing a corresponding task, and the run time (e.g., projected execution time) of the corresponding task. Depending on the use case, the tasks may be associated with fewer, more, or different attributes. These tasks may also be interdependent, e.g., some tasks are prerequisites of other tasks. The interdependencies among the task nodes are represented as the directed edges in the DAG. As shown, without adding any directed edge, both scheduling solutions determined by the heuristic scheduling algorithms SJF and CP yield a makespan of 21. The SJF algorithm prioritizes the tasks with the shortest runtimes, and CP algorithm ranks nodes by the maximum sum of task runtimes along the path to any of their leaf nodes in a DAG. Here, the “makespan” refers to the distance in time that elapses from the start of work to the end (i.e., from the point of starting the first task to the point of ending the last task) . However, after adding directed edges that enforcing dependencies among tasks, the new DAGs may allow SFJ and CP to find better scheduling solutions with makespans of 16. The added directed edges simplify DAG scheduling by including additional constraints and convert the original problem to a simpler proxy.
In the embodiments described herein, a reinforcement learning (RL) agent (e.g., a software program) may be configured to determine the addition of the directed edges to the DAG based on one or more trained policy networks. The RL agent may convert the original DAG to an updated DAG that allows conventional heuristic scheduling algorithms to generate optimal scheduling solutions.
FIG. 2 illustrates an example environment for scheduling computational tasks, in accordance with various embodiments. As shown in FIG. 1, the environment may comprise a computing system 102 for scheduling computational tasks 107. The computational tasks 107 may include various tasks that can be executed by computing devices such as 105a and 105b. These tasks 107 may be associated with numerous execution requirement and constraints, such as interdependencies (e.g., one or more tasks may be prerequisites of one or more other tasks) , resource consumptions (e.g., how much computational resources are required for executing one task) , runtime (e.g., execution duration of one task) , other suitable attributes, or any combination thereof. Common examples may include tasks submitted from a plurality of clients to a server in which the tasks need to be scheduled to execute (either on the server or be distributed to other computing devices) . In some embodiments, the  computing devices  105a and 105b may include various devices such as computers, smart devices, edge devices.  Even though the  devices  105a and 105b are illustrated as separated from the computing system 102, they may refer to components within the computing system 102.
In some embodiments, the computing system 102 may include a staging area for caching the received tasks 107. The computing system 102 may determine optimal scheduling solutions for the cached tasks 107. The determination is subject to the scheduling constraints such as the interdependencies among the tasks 107 (e.g., certain tasks cannot be executed until their prerequisite tasks are executed) and resource consumptions of the tasks 107. For instance, the total computational resource consumption of the tasks scheduled to be executed concurrently cannot exceed the available computational resources.
The components of the computing system 102 in FIG. 1 are illustrative. Depending on the implementation, the computing system 102 may include additional, fewer, or alternative components. The computing system 102 may be implemented in one or more networks (e.g., enterprise networks) , one or more endpoints, one or more servers, or one or more clouds. The computing system 102 may include hardware or software which manages access to a centralized resource or service in a network. A cloud may include a cluster of servers and other devices distributed across a network.
In some embodiments, the computing system 102 may include a DAG generating component 112, an embedding component 114, a determining component 116, and a scheduling component 118. The computing system 102 may include one or more processors (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller or microprocessor, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information) and one or more memories (e.g., permanent memory, temporary memory, non-transitory computer-readable storage medium) . The one or more memories may be configured with instructions executable by the one or more processors. The processor (s) may be configured to perform various operations by interpreting machine-readable instructions stored in the memory.
In some embodiments, the DAG generating component 112 of the computing system 102 may be configured to obtain a directed acyclic graph (DAG) representing a plurality of received computing tasks for scheduling. The DAG may include a plurality of nodes corresponding to the plurality of computing tasks and a plurality of directed edges representing the interdependencies among the plurality of computing tasks. In some embodiments, each node in the DAG may be associated with a feature vector containing various attributes of the corresponding computational task such as resource consumptions, mandatory starting time and/or ending time, prerequisite (parent) tasks, other suitable attributes, or any combination thereof. The DAG may be constructed by another entity and sent to the computing system 102, or constructed by the DAG generating component 112 of the computing system 102 based on the tasks.
In some embodiments, the embedding component 114 of the computing system 102 may be configured to generate embeddings for the plurality of nodes in the DAG. As mentioned above, the nodes in the DAG may be associated with feature vectors. The embedding component 114 may update the feature vectors of the nodes using a graph neural network (GNN) . The feature vector of each node may embed information from the feature vectors of the node’s neighboring nodes. Here, the neighboring nodes may refer not only to the immediately neighboring nodes but also remotely neighboring nodes. In some embodiments, the embedding process may include: obtaining feature vectors of the plurality  of nodes, the feature vector of each node comprising at least one of the following: a runtime of the computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes. For each node of the plurality of nodes, the feature vector of the each node may go through a first neural network to obtain a hidden representation of the each node. Here, the embedding of a node refers to an encoded form of a feature vector of the node, and the encoding/embedding is based on the feature vectors of neighboring nodes of the node. Meanwhile, the node may also receive other hidden representations propagated from its neighboring nodes. The hidden representation of this node may then be updated based on the received hidden representations propagated from its neighboring nodes.
In some embodiments, the determining component 116 of the computing system 102 may be configured to determine one or more edges to be added to the DAG and update the DAG by adding the one or more edges to the DAG. The determining component 116 may include a reinforcement learning agent that follows a policy network. The policy network may be trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm. In some embodiments, the policy network may include at least two policy networks. The first policy network is trained to identify starting nodes and the second policy network is trained to identify ending nodes based on the starting nodes identified by the first policy network. That is, the one or more edges may be determined by: identifying one or more starting nodes from the plurality of nodes in the DAG based on the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG based on embeddings of the one or more starting nodes and the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes. In some embodiments, the first and second policy networks share a same network structure but have different input parameter configurations. In some embodiments, the one or more edges are restrained from forming circles in the DAG, and first policy network and the second policy network each comprises a softmax layer configured to enforce the restrain. A detailed description regarding the policy networks may refer to FIG. 4.
In some embodiments, the scheduling component 118 may be configured to determine the scheduling of the plurality of tasks 107 based on the updated DAG with the added edges and the heuristic scheduling algorithm. The determination may include applying the heuristic scheduling algorithm to the updated DAG to determine a schedule. In some embodiments, the heuristic scheduling algorithm may be the same algorithm that is used during the training of the policy networks in determining module 116. It means that the policy networks are customized for the heuristic scheduling algorithm, and different heuristic scheduling algorithms may be used to train different policy networks.
FIG. 3 illustrates an example workflow 300 for DAG-based scheduling with reinforcement learning, in accordance with various embodiments. As shown, the example workflow 300 may include two phases, a training phase 302 and an application phase 303. The training phase 302 and the applications phase 303 may be carried out by the same or different entities. In some embodiments, the training phase 302 may be performed offline. In other embodiments, both the training phase 302 and the application phase 303 may be performed online. That is, the data collected from the application phase 303 such as inputting DAG, actions took (e.g., edges added) , and rewards (e.g., makespan reduction) may be used as part of the training phase 302.
In some embodiments, the training phase 302 may include a plurality of steps illustrated in FIG. 3. Depending on the implementation, the training phase 302 may include fewer, more, or alternative steps, and the steps may be performed in a different order or in parallel. In some embodiments, the training phase 302 may include a reinforcement learning (RL) agent 380 exploring and learning two networks: a graph neural network (GNN) 390A and a policy network 390B. The GNN 390A may be trained to encode the nodes in a DAG so that the feature vectors of the nodes are embedded with configuration information propagated from neighboring nodes and the topological structure of the DAG. The policy network 390B may be trained to make recommended actions to the RL agent 380 in response to a state. Here, the action refers to adding a directed edge between a starting node and an ending node to the DAG that does not conflict with the scheduling restrictions of the DAG. For example, a conflict occurs when there is an existing directed path between the starting node and the ending node. In some embodiments, according the policy network 390B, the selection of the starting node is conditioned on the encoded feature vectors (also called embeddings) of the nodes generated by the GNN 390A, and the selection of the ending node is conditioned on the encoded feature vectors of the nodes generated by the GNN 390A and the selected starting node.
In some embodiments, the training of the GNN 390 and the policy network 390B may be based on a plurality of historical DAGs or synthetic DAGs (also referred to as training DAGs) 360 and a specific scheduling algorithm 370. The scheduling algorithm 370 may be a heuristic scheduling algorithm such as SJF, CP, etc., or another suitable algorithm. The scheduling algorithm 370 may be applied to a DAG to determine a scheduling solution (e.g., an execution plan) for the tasks represented by the DAG.
In some embodiments, the training process 302 may include a reinforcement learning process formulated based on the following configuration.
State: a DAG graph G= {n, N 0, E 0} , where n is the number of nodes in the DAG, N 0 is the set of n nodes, and E 0 is the set of directed edges in the DAG.
Action: adding a directed edge a in G by sequentially selecting a starting node a 1 and an ending node a 2 in G. The added edge may not conflict with the graph structure of G, and a conflict occurs when there is an existing directed edge between a 1 and a 2, or the addition of the directed edge will form a close loop within G. The selection of starting node a 1 is conditioned on G, and the selection of ending node a 2 is conditioned on G and a 1. By sequentially selecting the starting and ending nodes, the time complexity of adding an edge in G is reduced from O (n 2) to O (n) .
Transition: after adding the directed edge, G is transitioned to G′.
Reward: in some embodiments, the reward is defined as the difference between makespans of G′ and G using the scheduling algorithm 370. Depending on the implantation, the reward can be easily switched to other measurements such as average waiting time.
In some embodiments, the GNN 390A may be structured as an L-layer neural network with parameters of
Figure PCTCN2021093945-appb-000001
where 1≤l≤L, and L is an integer greater than 1. During the training phase 302 and the application phase 303, the GNN 390A may be used similarly to compute graph embeddings for the nodes in a DAG. For example, given a DAG G, the original feature vectors of each node in G may include attributes such as runtime and required computational resources (e.g., CPU time, storage space, number of cores) . By using GNN 390A, these feature vectors may be transformed into a hidden representation e 0 embedded  with information propagated from neighboring nodes through a plurality of iterations of message passing.
An example process using the GNN 390A to generate embeddings for the nodes in G may include the following steps. Step 1, the edges in G may be reversed to form a new graph G T. Step 2, at each iteration of message passing in G T, each node’s embedding is propagated to its neighbors and updated as follows:
Figure PCTCN2021093945-appb-000002
where σ () stands for activation function, H is the number of message passing iterations and h is the current iteration, 
Figure PCTCN2021093945-appb-000003
is the weight matrix for the h thiteration, N irepresents the set of neighbors in G T (e.g., the children nodes in G) . Hence the parameters of the GNN network may be denoted as:
Figure PCTCN2021093945-appb-000004
In some embodiments, the output of the GNN 390A may include a deep embedding (also referred to updated feature vector) for each node. In some embodiments, the entire DAG may also be represented as a global embedding determined based on the embeddings of all the nodes within the DAG. The global embedding of the DAG may be used as one of the inputs to the policy network 390B.
In some embodiments, the GNN 380 and the policy network 390B may be jointly trained based on a policy gradient algorithm. The training process may include a plurality of rollouts, denoted as N rollouts. In each rollout, a plurality of edges, denoted as M edges are added across a plurality of time steps. At j th time step of i th rollout, a triplet is recorded for training purposes:
Figure PCTCN2021093945-appb-000005
where G i, jis the updated DAG graph with added edges, 
Figure PCTCN2021093945-appb-000006
and
Figure PCTCN2021093945-appb-000007
refer to the starting node and the ending node of an added edge a i, j respectively, and r i, j represents the reward computed as the reduction of makespan (or average waiting time, or another suitable metrics) of G i, j after an edge a i, j is added. In some embodiments, to inspire long term reward, the cumulative reward at the j th time step may be a discounted sum of incremental rewards from the j th time step until completion of a rollout:
Figure PCTCN2021093945-appb-000008
wherein r is the discounting factor that may be set as 1.0 as a default value.
For every N rollouts, a baseline reward at each time step j may be computed by the following equation, which is subtracted from rewards to reduce variance:
Figure PCTCN2021093945-appb-000009
The parameters of the GNN 390A and the policy network 390B may be updated after every N rollouts. A joint loss function may be defined as:
Figure PCTCN2021093945-appb-000010
where φ represents the parameters of the GNN 390A, and θ represents the parameters of the policy network 390B.
In some embodiments, the policy network 390B may include two policy networks: a first one is trained for determining starting nodes and a second one is trained for determining ending nodes. Denoting the parameters of the first and second policy networks as θ 1 and θ 2 respectively, the loss function may be represented as:
Figure PCTCN2021093945-appb-000011
After the training phase 302, the GNN 390 and the policy network 390B may be used in the application phase 303 to determine edges to be added to an incoming DAG representing a plurality of to-be-scheduled tasks.
In some embodiments, the application phase 303 may start with obtaining a DAG at step 310 to represent the to-be-scheduled tasks. The DAG may be constructed based on the tasks, or received from another entity or device that performed the construction. In some embodiments, a plurality of DAGs representing a plurality of multi-task jobs may be received, and the tasks within the jobs may be scheduled altogether. In some embodiments, these received DAGs may be first aggregated into one DAG with a new root node. FIG. 5 illustrates a diagram for constructing a DAG in accordance with some embodiments. As shown in FIG. 5, three  jobs  510, 520, and 530 may each include a plurality of interdependent tasks that are represented by three DAGs. By adding a new root node 510 and attaching the three DAGs to the new root node 510, a new DAG may be formed as the output of step 310.
Referring to FIG. 3, the DAG output from step 310 may be fed into the GNN 390A to generate embeddings at step 320. Based on the embeddings of the nodes in the DAG, the RL agent 380 may recommend directed edges based on the trained policy network 390B at step 330. In some embodiments, the trained policy network 390B may include two policy networks for determining the directed edges at step 330, which may include identifying one or more starting nodes from the plurality of nodes in the DAG based on the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG based on embeddings of the one or more starting nodes and the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes. For example, the embeddings of the plurality of nodes may be fed into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes. Each probability represents a recommended chance to select a corresponding node as a starting node. Based on the recommended probabilities, one or more starting nodes may be determined. Next, for each starting node, combinations of the embedding of the starting node and the embeddings of other nodes may be fed into the second policy network to obtain a plurality of probabilities respectively corresponding to the other nodes. Here, the “other nodes” may refer to the plurality of nodes except for the starting node and the nodes that are restrained from having a directed edge from the starting node. For example, each input to the second policy network may include the embedding of the starting node and the embedding of another node, and yield an output of a probability that the another node is recommended as the ending node corresponding to the starting node. Subsequently, for each of the starting nodes, one ending node may be selected based on the recommended probabilities generated by the second policy network.
After the starting nodes and the ending nodes are selected at step 330, the DAG may be updated by adding the directed edges connecting the starting nodes and corresponding ending nodes at step 340. The scheduling algorithm 370 may then be applied to the updated DAG to determine the scheduling solution for the tasks at step 350. The tasks may be executed according to the determined scheduling solution to reach an optimal makespan or average waiting time or another suitable performance measurement.
FIG. 4 illustrates example policy networks trained for adding directed edges to a DAG, in accordance with various embodiments. The components or layers in FIG. 4 are for illustrative purposes only. Depending on the implementation, the policy networks in FIG. 4 may include fewer, more, or alternative components or layers. In some embodiments, the policy network 410 and the policy network 420 may be collectively referred to as the policy network 390B in FIG. 3 and jointly trained. As described in FIG. 3, the policy network 410 may be referred to as a first policy network trained to determine starting nodes in a DAG, and the policy network 420 may be referred to as a second policy network trained to determine ending nodes in the DAG. The starting nodes and the ending nodes then are used to determine the directed edges to be added to the DAG to improve the performance of the task scheduling algorithm.
In some embodiments, the inputs to the  policy networks  410 and 420 may be different. For example, the inputs to the policy network 410 may include a plurality of pairs, each pair including a global embedding of the DAG and one embedding (e.g., feature vector) of a node in the DAG. In some embodiments, the global embedding of the DAG may be the average embedding of all the nodes, and the embedding of a node may be the feature vector of the node after going through the GNN 390A in FIG. 3. The inputs to the policy network 420, on the other hand, may include more information related to the already determined starting nodes. As shown in FIG. 4, the inputs to the policy network 420 may include a plurality of triplets. Each triplet includes the overall embedding of the DAG, the embedding of one starting node, and the embedding of a different node. Here, the different node may refer to a node in the DAG that does not have an existing edge with the starting node and is not restricted from having an edge with the starting node.
In some embodiments, both  policy networks  410 and 420 may include a plurality of neural network layers to extract features from the inputs. At the end of the  policy networks  410 and 420, a softmax layer may be implemented to enforce one or more edge-addition restrictions, such as no close loop is allowed, and the newly added edge cannot conflict with existing edges (e.g., the dependency indicated by the new edge cannot contradict with existing task dependencies) .
In some embodiments, the outputs of the policy network 410 may include a plurality of probabilities respectively corresponding to the plurality of nodes in the DAG. Each probability generated by the policy network 410 may represent a recommended chance of selecting the corresponding node as a starting node for the DAG. The outputs of the policy network 420 may also include a plurality of probabilities. Each probability generated by the policy network 420 may represent a conditional probability of selecting the corresponding node as an ending node for a starting node and the DAG.
FIG. 6 illustrates an example method 600 for DAG-based scheduling with reinforcement learning, in accordance with various embodiments. The method 600 may be implemented by the computing system 102 shown in FIG. 2, and correspond to the flow  shown in FIG. 3. Depending on the implementation, the method 600 may have additional, fewer, or alternative steps.
Block 610 includes obtaining a directed acyclic graph (DAG) representing a plurality of computing tasks for scheduling, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks. In some embodiments, the obtaining a DAG representing a plurality of computing tasks comprises: receiving a plurality of jobs for parallel processing, each job comprising a plurality of tasks; generating a plurality of DAGs corresponding to the plurality of jobs, each DAG representing the plurality of tasks in a corresponding job; generating a new root node; and generating the DAG by connecting the plurality of DAGs to the new root node.
Block 620 includes generating embeddings for the plurality of nodes in the DAG. In some embodiments, the generating embeddings for the plurality of nodes comprises: obtaining feature vectors of the plurality of nodes, a feature vector of each node comprising at least one of the following: a runtime of a computing task corresponding to the node or an amount of resources required for running the computing task; and computing the embeddings of the plurality of nodes based on a graph neural network (GNN) and the feature vectors of the plurality of nodes. In some embodiments, the computing the embeddings of the plurality of nodes based on the GNN comprises: for each node of the plurality of nodes, inputting the feature vector of the each node through a first neural network to obtain a hidden representation of the each node; and receiving other hidden representations propagated from neighboring nodes to the each node; and updating the hidden representation of the each node based on the received other hidden representations propagated from the neighboring nodes. In some embodiments, the generating embeddings for the plurality of nodes in the DAG comprises: generating the embeddings for the plurality of nodes in the DAG based on a graph neural network (GNN) , wherein the GNN and the policy network are jointly trained based on a plurality of training DAGs using reinforcement learning (RL) .
Block 630 includes determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm. In some embodiments, the policy network comprises a first policy network and a second policy network, and the determining one or more edges comprises: identifying one or more starting nodes from the plurality of nodes in the DAG by inputting the embeddings of the plurality of nodes into the first policy network; identifying one or more ending nodes from the plurality of nodes in the DAG by inputting the embeddings of the one or more starting nodes into the second policy network; and determining one or more edges connecting the one or more starting nodes and the one or more ending nodes. In some embodiments, the identifying the one or more starting nodes from the plurality of nodes in the DAG based on the first policy network comprises: inputting the embeddings of the plurality of nodes into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability represents a recommended chance to select a corresponding node as a starting node. In some embodiments, the identifying the one or more ending nodes from the plurality of nodes in the DAG based on the embeddings of the one or more starting nodes and the second policy network comprises: for each of the one or more starting nodes, inputting the embedding of the starting node and  the embeddings of the plurality of nodes in the DAG into the second policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability representing a recommended chance to select a corresponding node as an ending node that matches with the starting node.
Block 640 includes updating the DAG by adding the one or more edges to the DAG. In some embodiments, the one or more edges are restrained from forming a close loop in the DAG, and the first and the second policy network each comprises a softmax layer configured to enforce the restrain.
Block 650 includes scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm. In some embodiments, the scheduling the plurality of computing tasks based on the updated DAG comprises: applying the heuristic scheduling algorithm to the updated DAG to determine the scheduling of the plurality of computing tasks. In some embodiments, the heuristic scheduling algorithm comprises one of the following: shortest job first (SFJ) , critical path (CP) , and first-in-first-out (FIFO) .
In some embodiments, the method 600 further comprises jointly training the GNN and the policy network, the training comprising: at each time step of the training, identifying a new edge to be added to the DAG based on a current policy network; determining a first performance metric by applying the heuristic scheduling algorithm to the DAG; determining a second performance metric by applying the heuristic scheduling algorithm to the DAG with the new edge; determining a reward based on the first performance metric and the second performance metric; and updating parameters of the GNN and the current policy network based on the loss function comprising a plurality of the rewards across a plurality of time steps.
FIG. 7 illustrates an example computer system in which any of the embodiments described herein may be implemented. The electronic device may be used to implement one or more components of the systems and the methods shown in FIGs. 1-6. The electronic device 700 may comprise a bus 702 or other communication mechanism for communicating information and one or more hardware processors 704 coupled with bus 702 for processing information. Hardware processor (s) 704 may be, for example, one or more general purpose microprocessors.
The electronic device 700 may also include a main memory 706, such as a random-access memory (RAM) , cache and/or other dynamic storage devices, coupled to bus 702 for storing information and instructions to be executed by processor (s) 704. Main memory 706 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor (s) 704. Such instructions, when stored in storage media accessible to processor (s) 704, may render electronic device 700 into a special-purpose machine that is customized to perform the operations specified in the instructions. Main memory 706 may include non-volatile media and/or volatile media. Non-volatile media may include, for example, optical or magnetic disks. Volatile media may include dynamic memory. Common forms of media may include, for example, a RAM, a DRAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, or networked versions of the same.
The electronic device 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the electronic device may cause or program electronic device 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by electronic device 700 in response to processor (s) 704 executing one or more sequences of one or more instructions contained in main memory 706. Such instructions may be read into main memory 706 from another storage medium, such as storage device 707. Execution of the sequences of instructions contained in main memory 706 may cause processor (s) 704 to perform the process steps described herein. For example, the processes/methods disclosed herein may be implemented by computer program instructions stored in main memory 706. When these instructions are executed by processor (s) 704, they may perform the steps as shown in corresponding figures and described above. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The electronic device 700 also includes a communication interface 710 coupled to bus 702. Communication interface 710 may provide a two-way data communication coupling to one or more network links that are connected to one or more networks. As another example, communication interface 710 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicate with a WAN) . Wireless links may also be implemented.
The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processors or processor-implemented engines may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm) . In other example embodiments, the processors or processor-implemented engines may be distributed across a number of geographic locations.
Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The processes and algorithms may be implemented partially or wholly in application-specific circuitry.
When the functions disclosed herein are implemented in the form of software functional units and sold or used as independent products, they can be stored in a processor executable non-volatile computer readable storage medium. Particular technical solutions disclosed herein (in whole or in part) or aspects that contribute to current technologies may be embodied in the form of a software product. The software product may be stored in a storage medium, comprising a number of instructions to cause a computing device (which may be a personal computer, a server, a network device, and the like) to execute all or some steps of the methods of the embodiments of the present application. The storage medium may comprise a flash drive, a portable hard drive, ROM, RAM, a magnetic disk, an optical disc, another medium operable to store program code, or any combination thereof.
Particular embodiments further provide a system comprising a processor and a non-transitory computer-readable storage medium storing instructions executable by the processor to cause the system to perform operations corresponding to steps in any method of the  embodiments disclosed above. Particular embodiments further provide a non-transitory computer-readable storage medium configured with instructions executable by one or more processors to cause the one or more processors to perform operations corresponding to steps in any method of the embodiments disclosed above.
Embodiments disclosed herein may be implemented through a cloud platform, a server or a server group (hereinafter collectively the “service system” ) that interacts with a client. The client may be a terminal device, or a client registered by a user at a platform, wherein the terminal device may be a mobile terminal, a personal computer (PC) , and any device that may be installed with a platform application program.
The various features and processes described above may be used independently of one another or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The exemplary systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
As used herein, “or” is inclusive and not exclusive, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A, B, or C” means “A, B, A and B, A and C, B and C, or A, B, and C, ” unless expressly indicated otherwise or indicated otherwise by context.
The term “include” or “comprise” is used to indicate the existence of the subsequently declared features, but it does not exclude the addition of other features. Conditional language, such as, among others, “can, ” “could, ” “might, ” or “may, ” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.
Although an overview of the subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the present disclosure. Such embodiments of the subject matter may be referred to herein, individually or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or concept if more than one is, in fact, disclosed.
The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims (15)

  1. A computer-implemented method, comprising:
    obtaining a directed acyclic graph DAG representing a plurality of computing tasks to be scheduled and processed by one or more processors, wherein the DAG includes a plurality of nodes representing the plurality of computing tasks;
    generating embeddings for the plurality of nodes in the DAG;
    determining one or more edges to be added to the DAG based on the embeddings of the plurality of nodes and a policy network, wherein the policy network is trained based on a plurality of training DAGs and a loss function associated with a heuristic scheduling algorithm;
    adding the one or more edges to the DAG to obtain an updated DAG; and
    scheduling the plurality of computing tasks based on the updated DAG and the heuristic scheduling algorithm for the one or more processors to process.
  2. The method of claim 1, wherein the scheduling the plurality of computing tasks comprises:
    applying the heuristic scheduling algorithm to the updated DAG to determine the scheduling of the plurality of computing tasks.
  3. The method of claim 1, wherein the obtaining the DAG representing the plurality of computing tasks comprises:
    receiving a plurality of jobs for parallel processing, each job comprising a plurality of tasks;
    generating a plurality of DAGs corresponding to the plurality of jobs, each DAG representing the plurality of tasks in a corresponding job;
    generating a new root node; and
    generating the DAG by connecting the plurality of DAGs to the new root node.
  4. The method of claim 1, wherein the generating the embeddings for the plurality of nodes comprises:
    obtaining feature vectors of the plurality of nodes, a feature vector of each node comprising at least one of the following: a runtime of a computing task corresponding to the node or an amount of resources required for running the computing task; and
    computing the embeddings of the plurality of nodes based on a Graph Neural Network GNN and the feature vectors of the plurality of nodes.
  5. The method of claim 4, wherein the computing the embeddings of the plurality of nodes based on the GNN comprises:
    for each node of the plurality of nodes, inputting the feature vector of the each node through a first neural network to obtain a hidden representation of the each node; and
    receiving one or more other hidden representations propagated from neighboring nodes to the each node; and
    updating the hidden representation of the each node based on the one or more other hidden representations propagated from the neighboring nodes.
  6. The method of claim 4, wherein the GNN are jointly trained with the policy network based on a plurality of training DAGs using Reinforcement Learning RL.
  7. The method of claim 1, wherein the policy network comprises a first policy network and a second policy network, and the determining the one or more edges comprises:
    identifying one or more starting nodes from the plurality of nodes in the DAG by inputting the embeddings of the plurality of nodes into the first policy network;
    identifying one or more ending nodes from the plurality of nodes in the DAG by inputting the embeddings of the one or more starting nodes into the second policy network; and
    determining the one or more edges by connecting the one or more starting nodes and the one or more ending nodes.
  8. The method of claim 7, wherein the identifying the one or more starting nodes from the plurality of nodes in the DAG based on the first policy network comprises:
    inputting the embeddings of the plurality of nodes into the first policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability represents a recommended chance to select a corresponding node as a starting node.
  9. The method of claim 7, wherein the identifying the one or more ending nodes from the plurality of nodes in the DAG based on the embeddings of the one or more starting nodes and the second policy network comprises:
    for each of the one or more starting nodes, inputting the embedding of the starting node and the embeddings of the plurality of nodes in the DAG into the second policy network to obtain a plurality of probabilities respectively corresponding to the plurality of nodes, wherein each probability representing a recommended chance to select a corresponding node as an ending node that matches with the starting node.
  10. The method of claim 7, wherein the one or more edges are restrained from forming a close loop in the DAG, and the first and the second policy network each comprises a softmax layer configured to enforce the restrain.
  11. The method of claim 6, wherein the method further comprises jointly training the GNN and the policy network, the training comprising:
    at each time step of the training, identifying a new edge to be added to the DAG based on a current policy network;
    determining a first performance metric by applying the heuristic scheduling algorithm to the DAG;
    determining a second performance metric by applying the heuristic scheduling algorithm to the DAG with the new edge;
    determining a reward based on the first performance metric and the second performance metric; and
    updating parameters of the GNN and the current policy network based on the loss function comprising a plurality of the rewards across a plurality of time steps.
  12. The method of claim 1, wherein the heuristic scheduling algorithm comprises one of the following: Shortest Job First SFJ, Critical Path CP, or First-In-First-Out FIFO.
  13. A system comprising one or more processors and one or more non-transitory computer-readable memories storing instructions that, when executed by the one or more processors, cause the system to perform the method of any one of claims 1 to 12.
  14. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 12.
  15. An apparatus comprising a plurality of modules for performing the method of any one of the claims 1 to 12.
PCT/CN2021/093945 2021-05-14 2021-05-14 Method and system for scheduling tasks WO2022236834A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/093945 WO2022236834A1 (en) 2021-05-14 2021-05-14 Method and system for scheduling tasks
CN202180088447.7A CN116670684A (en) 2021-05-14 2021-05-14 Method and system for scheduling tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/093945 WO2022236834A1 (en) 2021-05-14 2021-05-14 Method and system for scheduling tasks

Publications (1)

Publication Number Publication Date
WO2022236834A1 true WO2022236834A1 (en) 2022-11-17

Family

ID=84027961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093945 WO2022236834A1 (en) 2021-05-14 2021-05-14 Method and system for scheduling tasks

Country Status (2)

Country Link
CN (1) CN116670684A (en)
WO (1) WO2022236834A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116302481A (en) * 2023-01-06 2023-06-23 上海交通大学 Resource allocation method and system based on sparse knowledge graph link prediction
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453379B (en) * 2023-12-25 2024-04-05 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150268992A1 (en) * 2014-03-21 2015-09-24 Oracle International Corporation Runtime handling of task dependencies using dependence graphs
CN110069341A (en) * 2019-04-10 2019-07-30 中国科学技术大学 What binding function configured on demand has the dispatching method of dependence task in edge calculations
CN110402431A (en) * 2017-03-23 2019-11-01 亚马逊科技公司 Event driven scheduling is carried out using directed acyclic graph
CN112328380A (en) * 2020-11-10 2021-02-05 武汉理工大学 Task scheduling method and device based on heterogeneous computing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150268992A1 (en) * 2014-03-21 2015-09-24 Oracle International Corporation Runtime handling of task dependencies using dependence graphs
CN110402431A (en) * 2017-03-23 2019-11-01 亚马逊科技公司 Event driven scheduling is carried out using directed acyclic graph
CN110069341A (en) * 2019-04-10 2019-07-30 中国科学技术大学 What binding function configured on demand has the dispatching method of dependence task in edge calculations
CN112328380A (en) * 2020-11-10 2021-02-05 武汉理工大学 Task scheduling method and device based on heterogeneous computing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHANG SHUANG SHUANG ET AL.: "Response Time Analysis of Typed DAG Tasks on Heterogeneous Multi-cores", RESPONSE TIME ANALYSIS OF TYPED DAG TASKS ON HETEROGENEOUS MULTI-CORES, vol. 43, no. 6, 15 June 2020 (2020-06-15), pages 1052 - 1068, XP093004948 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116302481A (en) * 2023-01-06 2023-06-23 上海交通大学 Resource allocation method and system based on sparse knowledge graph link prediction
CN116302481B (en) * 2023-01-06 2024-05-14 上海交通大学 Resource allocation method and system based on sparse knowledge graph link prediction
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system
CN117555306B (en) * 2024-01-11 2024-04-05 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system

Also Published As

Publication number Publication date
CN116670684A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
WO2022236834A1 (en) Method and system for scheduling tasks
US9715408B2 (en) Data-aware workload scheduling and execution in heterogeneous environments
US20190318268A1 (en) Distributed machine learning at edge nodes
US20200007456A1 (en) Computerized methods and systems for managing cloud computer services
US20180158034A1 (en) Dynamic reordering of blockchain transactions to optimize performance and scalability
Ruiz-Alvarez et al. A model and decision procedure for data storage in cloud computing
US10678594B2 (en) System and method for optimizing resource allocation using GPU
Hunter et al. Parallel ranking and selection
US20140215471A1 (en) Creating a model relating to execution of a job on platforms
US20150186427A1 (en) Method and system of analyzing dynamic graphs
US20180165618A1 (en) Resource scheduling for field services
US8051427B2 (en) Method of establishing a logical state of an instance using non-deterministic operation results stored in a result log
WO2017188419A1 (en) Computational resource management device, computational resource management method, and computer-readable recording medium
US11966775B2 (en) Cloud native adaptive job scheduler framework for dynamic workloads
Boerkoel Jr et al. Distributed algorithms for incrementally maintaining multiagent simple temporal networks
US20240202028A1 (en) System and method for collaborative algorithm development and deployment, with smart contract payment for contributors
Jayaram et al. Just-in-time aggregation for federated learning
US10409651B2 (en) Incremental workflow execution
WO2023066073A1 (en) Distributed computing for dynamic generation of optimal and interpretable prescriptive policies with interdependent constraints
Volpe et al. A Deep Reinforcement Learning Approach for Competitive Task Assignment in Enterprise Blockchain
Gainaru et al. Profiles of upcoming HPC Applications and their Impact on Reservation Strategies
US20230229514A1 (en) Intelligent orchestration of classic-quantum computational graphs
WO2017148508A1 (en) Multi-phase high performance business process management engine
JP2015191397A (en) Design parameter search device, design parameter search method, and program
JP6753521B2 (en) Computational resource management equipment, computational resource management methods, and programs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941403

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180088447.7

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21941403

Country of ref document: EP

Kind code of ref document: A1