CN113127169A

CN113127169A - Efficient link scheduling method for dynamic workflow in data center network

Info

Publication number: CN113127169A
Application number: CN202110373804.6A
Authority: CN
Inventors: 沈鸿; 王鑫
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-07-16
Anticipated expiration: 2041-04-07
Also published as: CN113127169B

Abstract

The invention provides a high-efficiency link scheduling method for dynamic workflow in a data center network, which comprises the following steps: s1: processing the n arriving coflows simultaneously by using a directed acyclic graph neural network; s2: forming J incomplete operation DAGs, inputting the J incomplete operation DAGs into a directed acyclic graph neural network as a large unconnected directed acyclic graph, and outputting to obtain an embedding vector of each node; s3: obtaining n embedding vectors through step S2, taking the n embedding vectors as the input of a strategy network in the deep reinforcement learning, obtaining the score of each node, and calculating to obtain a weighted score value of each node; s4: according to partial DAG graphs of different jobs at present, finding out all nodes with the income degree of 0 at present, calculating the probability of each node based on the weighted score value through a softmax operation, and obtaining a coflow scheduling priority list according to the probability arrangement of the nodes; s5: and performing a priority scheduling task based on the coflow scheduling priority list.

Description

Efficient link scheduling method for dynamic workflow in data center network

Technical Field

The invention relates to the field of high-performance computing, in particular to an efficient link scheduling method for dynamic workflows in a data center network.

Background

Modern parallel computing platforms (e.g., Hadoop, Spark, Dryad) have supported processing large data sets in data centers. The process is typically comprised of multiple computation and communication phases. The computation phase involves local operation of the servers, while the communication phase involves data transfer between the servers in the data center network to initiate the next computation phase. Such intermediate communication phases have a large impact on application delay. The flow is an abstraction proposed to model such a communication scheme, representing a set of intermediate parallel data flows, and transmitted between servers to initiate the next phase.

Minimizing the average completion time of the flow for jobs having a single communication phase may improve the latency of the jobs. However, for multi-stage operations, minimizing the average Coflow completion time may not be the correct metric, and may even lead to worse performance, because it ignores the dependency between coflows in the operation: Starts-After and Finishes-Beform. For multi-stage jobs, each job consists of multiple coflows, typically represented by a DAG (directed acyclic graph) that captures the dependencies (starters-After) between the coflows.

Although the flow scheduling for single-stage jobs has been studied in many ways, the flow scheduling for multi-stage jobs and the dependency between flows are largely ignored. The flow scheduling problem for multi-stage jobs has proven to be an NP-hard problem. The difficult problem of the flow scheduling of the multi-stage operation is that many processing difficulties and inherent factors exist, including how to process different operation DAGs, how to effectively extract characteristic information (including node information, side information, dependency relationship and the like) of the operation DAGs, and factors such as different flow numbers in different jobs, different parallel flow numbers in a single flow, and the like. The prior art mainly has the following limitations:

existing jobs, including heuristic and approximation algorithms, are centralized in the flow scheduling of single-phase jobs. For the flow scheduling problem of multi-phase jobs, in Aalo the authors briefly discuss a straightforward heuristic to reduce the completion time of multi-phase jobs. The manually debugged heuristic solution simplifies the problem by giving some slack conditions, ensuring only a rough approximation of the optimal solution for the NP-hard problem.

Therefore, the self-adaptive scheduling model without manual guidance can be constructed, dependent flow from different jobs can be dynamically scheduled by directly interacting with the environment, the sum of weighted operation completion time is optimized, and the operation efficiency is improved. The weights may capture different priorities for different jobs, and in the special case where all weights are equal, the problem is equivalent to minimizing the average job completion time.

Disclosure of Invention

The invention provides an efficient link scheduling method for dynamic workflow in a data center network, aiming at overcoming the defects and shortcomings in the prior art, and the efficient link scheduling method can dynamically schedule the dependent flow from different jobs without depending on a large amount of manual debugging, so that the sum of weighted operation completion time is minimized.

In order to solve the technical problems, the technical scheme of the invention is as follows: a method for efficient link scheduling of dynamic workflows in a data center network comprises the following steps:

s1: processing n arriving coflows simultaneously by using a directed acyclic graph neural network, forming a job by a plurality of tasks of the dependency relationship, and expressing by adopting DAG;

s2: forming J incomplete operation DAGs, inputting the J incomplete operation DAGs into a directed acyclic graph neural network as a large unconnected directed acyclic graph, and outputting to obtain an embedding vector of each node;

s3: obtaining n embedding vectors through step S2, taking the n embedding vectors as the input of a strategy network in the deep reinforcement learning, obtaining the score of each node, and calculating to obtain a weighted score value of each node;

s4: according to partial DAG graphs of different jobs at present, finding out all nodes with the income degree of 0 at present, calculating the probability of each node based on the weighted score value through a softmax operation, and obtaining a coflow scheduling priority list according to the probability arrangement of the nodes; temporarily storing the nodes with the current income degree not being 0 in a flow waiting list;

s5: and performing a priority scheduling task based on the coflow scheduling priority list, updating the coflow scheduling priority list and the coflow waiting list after scheduling one coflow, and feeding back the performance of the reward evaluation action by the environment until n coflow schedules are completed.

Further, the update formula of the directed acyclic graph neural network for computing the global information and node characteristics of the job DAG is as follows:

wherein ,

representation of all nodes v, h, of the l-th layer_gA representation representing the entire DAG graph; p (v) a set of immediate predecessor nodes representing node v; t represents a set of nodes without a direct successor; g^l,F_lAnd R are both parameterized neural networks.

Still further, in step S2, J incomplete job DAGs are constructed, which are as follows:

constructing a waiting queue W with the capacity of m and a pending queue D with the capacity of n, wherein m > n; sequentially arranging the tasks in a waiting queue W, and taking out the first n tasks and putting the tasks into a queue D to be processed when the number of the tasks is more than or equal to n; because n tasks come from different jobs and the sequence of successive arrival meets respective dependency relationship, the J incomplete operation DAG is formed.

Furthermore, the weighted score value of each node is obtained by multiplying the task weight corresponding to the node by the score of each node; each job corresponds to a task weight, and the task weights in the same job are the same.

Still further, in step S5, specifically:

s1: performing a priority scheduling task based on the coflow scheduling priority list; after each coflow is scheduled, removing corresponding nodes and edges in the DAG, updating a node set with the degree of entry of 0, and further updating a coflow scheduling priority list and a coflow waiting list;

s2: and continuously repeating the operation until the n flow schedules are finished, and feeding back the rewarded evaluation action to the environment.

Still further, in step S5, after the flow scheduling priority list is executed, the queue D to be processed is updated, that is, the first n tasks are selected from the waiting queue W again and placed in the queue D to be processed, and the process returns to step S2 to continue the execution.

Still further, in step S5, the quality of the action is evaluated by calculating the sum of weighted job completion times as a reward, and the sum of weighted job completion times is minimized by rewarding the optimization agent.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

1. on the premise that GNN can extract the characteristics of operation DAG, the invention uses the directed acyclic graph neural network instead, makes full use of the special structure of DAG, and can generate a more favorable vector representation of the graph. And the directed acyclic graph neural network can directly process the nodes in sequence and output the embedding vector obeying the dependent sequence.

2. In order to avoid processing a complete set of operation DAG at a time, which results in too large action space, a long time is needed for completing one training, and the input size of the strategy network is not fixed. The invention assumes that the flow arrival sequence in different jobs is based on the respective dependency relationship, and only n tasks in the queue D to be processed need to be processed each time, so that the training speed is faster, the action space is small, and the input size of the strategy network is also fixed.

3. Considering the importance of different jobs, the optimization goal is to weight the sum of job completion times. In order to optimize the objective, the invention considers the influence of the weights on the basis of the policy network output score, and the input of softmax is also a weighted score value, which makes it easier for the reinforcement learning agent to learn the optimal action.

Drawings

Fig. 1 is a schematic model diagram of a dynamic flow scheduling method according to this embodiment.

Fig. 2 is a multi-stage operation DAG provided by this embodiment, in which nodes represent computation stages and edges represent communication stages between nodes.

Fig. 3 is a schematic diagram of the directed acyclic graph neural network provided in this embodiment processing different operation DAGs.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and are used for illustration only, and should not be construed as limiting the patent. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a method for efficient link scheduling of dynamic workflows in a data center network includes the following steps:

The present embodiments do not employ Graph Neural Networks (GNNs), and in general, the most common GNN architecture aggregates information from neighbors based on messaging. This common GNN architecture can efficiently process undirected graphs, but for working DAGs that contain dependencies, it may not be possible to efficiently extract their intrinsic features, which are also required by our neural networks (dependencies or partial ordering). Therefore, to achieve higher prediction capabilities, an acyclic graph neural network (DAGNN) is used to integrate this information into the architectural representation.

In a specific embodiment, the update formula of the directed acyclic graph neural network for computing the global information and node characteristics of the job DAG is as follows:

wherein ,

It can be seen that the acyclic graph neural networks all use the information of the current layer, i.e. the acyclic graph neural networks always use the latest information to update the node representation. Because it only uses the predecessor node of the current node to carry out aggregation and does not need successor node, when updating the current node, it uses the information of the same layer

These differences are all special structures specific to DAGs. This structure is suitably used to produce a more favorable vector representation of the graph. The main idea of the acyclic graph neural network is to process nodes in a partial order defined by a DAG.

In contrast to the undirected graph of the previous GNN process: (1) the acyclic graph neural network updates the representation of the node v directly with the information of the current layer instead of the previous layer. (2) The acyclic graph neural network aggregates only predecessor nodes of node v, not entire neighbor nodes.

In a specific embodiment, in step S2, the general method in the prior art is to input the topology of the single job DAG and the flow information of each coflow into the GNN for a single job DAG, so as to obtain an embedding vector of each node. However, this embodiment addresses multiple different job DAGs, and the arrival times of different jobs and coflows are random. Assume that the order of arrival of coflows in different jobs is based on their respective dependencies. A job set contains J jobs, each job consisting of several dependent tasks (coflows). A wait queue W of capacity m and a pending queue D of capacity n are constructed, where m is much greater than n. With the arrival of tasks, the tasks are sequentially arranged in a waiting queue W, and when the number of the tasks is larger than or equal to n, the first n tasks are taken out and put into a queue D to be processed. The n tasks (coflow) come from different jobs, but the successive order meets respective dependency relationship, and J incomplete job DAGs are formed. Then, the J incomplete job DAGs (the whole can be regarded as a large unconnected directed acyclic graph) are taken together as the input of the DAGNN, and an embedding vector of each node is output through the DAGNN.

In a specific embodiment, n embedding vectors are obtained through step S2, the n embedding vectors output by the DAGNN are used as input of a Policy Network (Policy-Network) in the deep reinforcement learning, and the score of each node is obtained through training by a deep reinforcement learning agent. It is simply the policy network that maps the embedding vector to a scalar value (score).

Because each job corresponds to a task weight, and the task weights in the same job are the same, the embodiment multiplies the corresponding weight of the node by the score of each node to obtain a weighted score value of each node.

In a specific embodiment, step S5, specifically:

In a specific embodiment, a priority scheduling task (coflow) is performed based on the coflow scheduling priority list, and when one coflow is scheduled, the coflow scheduling priority list and the coflow waiting list are updated until n coflow schedules are completed and the environment feeds back whether the actions are evaluated. In this embodiment, the coflow scheduling priority list and the coflow waiting list are updated, because n coflows are divided into two lists according to whether the incomings are 0, and each coflow is processed, the coflow is deleted from the working DAG, which results in a node with the original incomings not being 0, and now the incomings are 0, so the coflow scheduling priority list and the coflow waiting list need to be updated.

In the embodiment, the data center network is abstracted into a huge non-blocking switch, and tasks only compete for bandwidth resources at ports. Each coflow is converted into a requirement matrix, each value representing the size of the flow to be transmitted from a certain ingress port to a certain egress port. For each part operation DAG graph, the part operation DAG graph can be regarded as a complete DAG graph.

The present embodiment is therefore based on the sum of the weighted job completion times, i.e., rewarded, resulting in the current overall schedule. And (4) Reward evaluates the quality of the action, guides the Agent of the Agent to advance towards a desired direction, and finally learns the optimal strategy by continuously and directly interacting with the environment. The agent performs actions (actions are the priority list) to act on the environment, the state of the environment is changed, the environment feeds back a reward to evaluate the current actions, and the aim of the embodiment is to maximize the accumulated reward sum. When the training is started, the selected action is definitely not good, the training is slowly carried out, and the intelligent agent learns a good strategy through the feedback of the environment. Through training, the intelligent agent can execute an optimal strategy every time J work DAGs come, so that the sum of weighted work completion time is minimum.

In this embodiment, after the flow scheduling priority list is executed, the queue D to be processed is updated (state transition), that is, the first n tasks are selected from the waiting queue W again and placed in the queue D to be processed, and the process returns to step 2 to continue execution.

The optimization of the embodiment is to weight the sum of the completion time of the operation, and different jobs have different importance and are distributed with different weights. On the basis that the output of the policy network obtains the score of each node, the influence of the weight is considered, and the input of softmax is also a weighted score value, so that the sum of the completion time of the minimized weighted operation which is the optimization target is kept consistent.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A high-efficiency link scheduling method for dynamic workflow in a data center network is characterized in that: the method comprises the following steps:

s5: performing a priority scheduling task based on the coflow scheduling priority list; and updating the coflow scheduling priority list and the coflow waiting list after scheduling one coflow, and feeding back the performance of the reward evaluation action by the environment until the scheduling of the n coflows is completed.

2. The method of claim 1 for efficient link scheduling of dynamic workflows in a data center network, comprising: the directed acyclic graph neural network is used for calculating global information and node characteristics of a job DAG, and the formula is as follows:

wherein ,

3. The method for efficient link scheduling of dynamic workflows in a data center network of claim 2, wherein: step S2, construct J incomplete job DAGs, as follows:

4. The method of claim 3, wherein the method comprises: the weighted score value of each node is obtained by multiplying the task weight corresponding to the node by the score of each node; each job corresponds to a task weight, and the task weights in the same job are the same.

5. The method of claim 4, wherein the method comprises: step S5, specifically:

6. The method of claim 5, wherein the method comprises: step S5, after the execution of the coflow scheduling priority list is completed, updating the queue D to be processed, that is, the first n tasks are selected from the waiting queue W again and placed in the queue D to be processed, and returning to step S2 to continue the execution.

7. The method of claim 6, wherein the method comprises: in step S5, the quality of the action is evaluated by calculating the sum of the weighted job completion times as a reward, and the agent is optimized by the reward so that the sum of the weighted job completion times is minimized.