CN116450308A

CN116450308A - Multi-strategy learning-based adaptive DAG task scheduling method

Info

Publication number: CN116450308A
Application number: CN202211596732.2A
Authority: CN
Inventors: 程雨夏; 舒浪; 黄凡丁; 李宋晨; 陈一飞
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-07-18

Abstract

The invention mainly aims to solve the problem of low searching efficiency of the existing task scheduling algorithm, and discloses a multi-strategy learning-based adaptive DAG task scheduling method, which comprises the following steps: a state updating stage; a reward updating stage; a motion selection stage; a simulation stage; the simulation phase is repeated until the iteration number limit or time limit is met, and finally a minimum makespan value is returned. The invention effectively balances the relation between exploration and utilization, thereby accelerating the finding of a better makespan value, reducing the cost of searching time and improving the searching efficiency of an algorithm; the method has universality, is suitable for new application and new hardware systems, and improves the system efficiency.

Description

Multi-strategy learning-based adaptive DAG task scheduling method

Technical Field

The invention relates to the technical field of task scheduling systems, in particular to a self-adaptive DAG task scheduling method based on multi-strategy learning.

Background

In distributed heterogeneous computing systems, various computing resources are interconnected with high-speed networks to support computing-intensive parallel and distributed applications. Efficient task scheduling is critical to improving system performance. How to schedule parallel computing tasks for efficient execution in heterogeneous computing systems is a hotspot problem in the field of system research. Parallel computing tasks oriented to application fields such as big data and artificial intelligence generally represent data dependency and parallel relations among tasks by using a DAG (directed acyclic graph) task graph model. DAG task scheduling in heterogeneous computing systems is a classical problem for computer architecture research.

DAG task scheduling under heterogeneous computing systems is an NP-complete problem and is more complex in practical scheduling systems. Many heuristic algorithms have been proposed, such as list scheduling algorithms, genetic and evolutionary based random search algorithms, task replication based algorithms, and the like. Most of these methods are heuristic and lack versatility in different application scenarios. With the update iteration of the software and hardware environment, the traditional heuristic scheduling method which depends on expert experience design is difficult to be universally applied to a novel application scene, so that the traditional scheduling method cannot fully exert the system efficiency in a new application and a new hardware system. In fact, the prior art cannot balance the relationship between exploration and utilization, so that a better makespan value cannot be found quickly, and the search time cost is increased.

Disclosure of Invention

The invention mainly aims to solve the problem of low searching efficiency of the existing task scheduling algorithm, and provides a multi-strategy learning-based adaptive DAG task scheduling method, which effectively balances the relation between exploration and utilization, thereby accelerating the finding of a better makespan value, reducing the searching time cost and improving the algorithm searching efficiency; the method has universality, is suitable for new application and new hardware systems, and improves the system efficiency.

In order to achieve the above object, the present invention adopts the following technical scheme.

An adaptive DAG task scheduling method based on multi-strategy learning comprises the following steps:

step S1: a state updating stage: initializing a ready queue, namely only an entry task, selecting a scheduled task from the ready queue, and updating the state of the task;

step S2: a reward updating stage: scheduling task execution in the ready queue, updating the rewarding value of the task after the execution is finished, and back-transmitting the access times and the rewarding value to the entry node;

step S3: action selection stage: starting from the scheduled task, all the next scheduled tasks n are calculated _i A (n) _i ) The value of A (n _i ) The task with the largest value is put into a ready queue; the maximum A (n) is calculated according to the formula _i ) The value is used as the node of the next scheduling;

step S4: simulation stage: repeating the steps S1-S3 until the exit task is scheduled, and finally obtaining a makespan value;

step S5: repeating the step S4 until the iteration number limit or the time limit is met, and finally returning to a minimum makespan value;

the invention provides a self-adaptive DAG task scheduling method based on multi-strategy learning, which effectively balances the relation between exploration and utilization, and finds out a better makespan value in the actual DAG scheduling process, thereby effectively reducing the searching time cost and improving the algorithm searching efficiency; the method has universality, is suitable for new applications and new hardware systems, and can fully exert the system efficiency.

Preferably, the step S1 includes: setting task node status flags S (n) on the basis of DAG model _i ) Task node access count N (N) _i ) Each update state includes: s (n) _i )＝1，N(n _i )＝N(n _i ) +1. The status flag S (n) _i ) And access count N (N) _i ) The change logic is simple and effective, and the execution efficiency is high.

Preferably, the specific process of the step S1 is as follows: the task node status flags S (n _i ) Initialization is all 0,S (n) _i ) The setting of 1 needs to satisfy:j is epsilon pred (i), where j represents the precursor node of i; the task node access times count N (N) _i ) Initialization is all 0, update mode is N (N _i )＝N(n _i )+1。S(n _i ) Constraint set to 1 satisfies preferential constraint between DAG tasksRelationship.

Preferably, in the step S2, the selection of the processor adopts an insertion-based strategy in the HEFT algorithm. The processor selection strategy can fully utilize the idle time of the processor, and avoid the phenomenon of processor resource waste so as to further shorten the total scheduling length.

Preferably, the step S2 further includes: setting task node jackpot Q (n) based on DAG model _i ) The task node accumulates rewards Q (n) _i ) Initialization is all 0, update mode is Q (n _i )＝Q(n _i )+EST(n _i ) Wherein EST (n) _i ) Representing task n _i Is the earliest start time of (2); task node access count N (N) _i ) The update mode of (a) is N (N) _i )＝N(n _i ) +1; if the scheduled task is an exit task, the end of the round of scheduling is indicated. Cumulative rewards Q (n) _i ) The value of (c) will affect a (n) _i ) Further influencing the next scheduled task selection, the present invention designs a cumulative prize Q (n) _i ) Can ensure the task n with more utilization value in the scheduling _i Priority is scheduled for execution, thereby shortening the overall scheduling length.

Preferably, in step S3, the a (n _i ) The calculation formula of the value is:

A(n _i )＝V(n _i )+E(n _i )

wherein c is a constant parameter, which is mainly used for balancing the weight between exploration and utilization; v (n) _i ) A representation utilization value section; e (n) _i ) A search value section; rank (rank) _u (n _i ) Representing task node n _i Refers to the uplink weight of slave task n _i Critical path length to egress task;representing task node n _i N represents the number of DAG tasks; />Representing a consideration task node n _i Uplink average prize value under uplink weight of (a) refers to the slave task n _i Average prize value for the task in the critical path to the egress task; n (N) _i ) Representing the current task node n _i Is the number of accesses of N (N) _j ) Parent node n representing the current task node _j Is used for the number of accesses. Using value part V (n _i ) The larger the current node is, the more valuable it is. If the current node access number is small, the search value portion E (n _i ) Will increase, indicating that the current node is more worth exploring. And the set key super parameter c can achieve the aim of well balancing exploration and utilization, so that the algorithm performance is improved and the whole state space is searched as large as possible.

Preferably, in step S4, after the simulation ends and the makespan value is obtained, the task node state flag S (n _i ) All 0's for the next round of simulated scheduling. Status markers S (n) _i ) After resetting to 0, the round schedule keeps access count N (N _i ) And cumulative rewards Q (n) _i ) The execution of the next round of scheduling can be well guided, and after multiple rounds of iterative scheduling, the algorithm can obtain a better makespan value.

Therefore, the invention has the advantages that:

(1) The relation between exploration and utilization is effectively balanced, a better makespan value is found by accelerating in the actual DAG scheduling process, the searching time cost is effectively reduced, and the algorithm searching efficiency is improved;

(2) The method has universality, is suitable for new applications and new hardware systems, and can fully exert the system efficiency.

Drawings

FIG. 1 is a flow chart of an adaptive DAG task scheduling method based on multi-strategy learning in an embodiment of the invention.

1. A state updating stage 2, a rewarding updating stage 3, an action selecting stage 4 and a simulation stage.

Detailed Description

The invention is further described below with reference to the drawings and detailed description.

An adaptive DAG task scheduling method based on multi-strategy learning, as shown in figure 1, comprises the following steps:

setting task node status flags S (n) on the basis of DAG model _i ) Task node access count N (N) _i ) Each update state includes: s (n) _i )＝1，N(n _i )＝N(n _i ) +1. Specifically, task node status flags S (n _i ) Initialization is all 0,S (n) _i ) The setting of 1 needs to satisfy:j is epsilon pred (i), where j represents the precursor node of i; task node access count N (N) _i ) Initialization is all 0, update mode is N (N _i )＝N(n _i )+1。

Step S2: a reward updating stage: scheduling task execution in the ready queue, selecting a strategy of insertion-based in HEFT algorithm by a processor, updating a reward value of the task after execution, and back-transmitting access times and the reward value to an entry node;

setting task node jackpot Q (n) based on DAG model _i ) Task node cumulative rewards Q (n) _i ) Initialization is all 0, update mode is Q (n _i )＝Q(n _i )+EST(n _i ) Wherein EST (n) _i ) Representing task n _i Is the earliest start time of (2); task node access count N (N) _i ) The update mode of (a) is N (N) _i )＝N(n _i ) +1; if the scheduled task is an exit task, the end of the round of scheduling is indicated.

Step S3: dynamic movementThe selection stage is as follows: starting from the scheduled task, all the next scheduled tasks n are calculated _i A (n) _i ) The value of A (n _i ) The task with the largest value is put into a ready queue; the maximum A (n) is calculated according to the formula _i ) The value is used as the node of the next scheduling;

A(n _i ) The calculation formula of the value is:

A(n _i )＝V(n _i )+E(n _i )

wherein c is a constant parameter, which is mainly used for balancing the weight between exploration and utilization; v (n) _i ) A representation utilization value section; e (n) _i ) A search value section; rank (rank) _u (n _i ) Representing task node n _i Refers to the uplink weight of slave task n _i Critical path length to egress task;representing task node n _i N represents the number of DAG tasks; />Representing a consideration task node n _i Uplink average prize value under uplink weight of (a) refers to the slave task n _i Average prize value for the task in the critical path to the egress task; n (N) _i ) Representing the current task node n _i Is the number of accesses of N (N) _j ) Parent node n representing the current task node _j Is used for the number of accesses.

Step S4: simulation stage: repeating the steps S1-S3 until the exit task is scheduled, and finally obtaining a makespan value; after the simulation is finished and the makespan value is obtained, the task node state mark S (n) _i ) All 0's for the next round of simulated scheduling.

Step S5: repeating the step S4 until the iteration number limit or the time limit is met, and finally returning to a minimum makespan value.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The adaptive DAG task scheduling method based on multi-strategy learning is characterized by comprising the following steps of:

step S3: action selection stage: starting from the scheduled task, all the next scheduled tasks n are calculated _i A (n) _i ) The value of A (n _i ) The task with the largest value is put into a ready queue;

2. The adaptive DAG task scheduling method based on multi-strategy learning according to claim 1, wherein the step S1 comprises: setting task node status flags S (n) on the basis of DAG model _i ) Task node access count N(n _i ) Each update state includes: s (n) _i )＝1，N(n _i )＝N(n _i )+1。

3. The adaptive DAG task scheduling method based on multi-strategy learning according to claim 2, wherein the specific process of step S1 is as follows: the task node status flags S (n _i ) Initialization is all 0,S (n) _i ) The setting of 1 needs to satisfy:j is epsilon pred (i), where j represents the precursor node of i; the task node access times count N (N) _i ) Initialization is all 0, update mode is N (N _i )＝N(n _i )+1。

4. The adaptive DAG task scheduling method based on multi-policy learning according to claim 1, wherein in step S2, the selection of the processor adopts an insertion-based policy in the HEFT algorithm.

5. The adaptive DAG task scheduling method based on multi-strategy learning of claim 1, wherein the step S2 further comprises: setting task node jackpot Q (n) based on DAG model _i ) The task node accumulates rewards Q (n) _i ) Initialization is all 0, update mode is Q (n _i )＝Q(n _i )+EST(n _i ) Wherein EST (n) _i ) Representing task n _i Is the earliest start time of (2); task node access count N (N) _i ) The update mode of (a) is N (N) _i )＝N(n _i ) +1; if the scheduled task is an exit task, the end of the round of scheduling is indicated.

6. The adaptive DAG task scheduling method based on multi-strategy learning according to claim 1, wherein in step S3, the a (n _i ) The calculation formula of the value is:

A(n _i )＝V(n _i )+E(n _i )

7. The adaptive DAG task scheduling method based on multi-strategy learning as claimed in claim 1 or 2 or 3 or 4 or 5 or 6, wherein in step S4, after the simulation ends to obtain makespan value, the task node status flag S (n _i ) All 0's for the next round of simulated scheduling.