WO2023241000A1 - Dag task scheduling method and apparatus, device, and storage medium - Google Patents

Dag task scheduling method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023241000A1
WO2023241000A1 PCT/CN2022/142437 CN2022142437W WO2023241000A1 WO 2023241000 A1 WO2023241000 A1 WO 2023241000A1 CN 2022142437 W CN2022142437 W CN 2022142437W WO 2023241000 A1 WO2023241000 A1 WO 2023241000A1
Authority
WO
WIPO (PCT)
Prior art keywords
dag
task
dag task
scheduling
network model
Prior art date
Application number
PCT/CN2022/142437
Other languages
French (fr)
Chinese (zh)
Inventor
胡克坤
鲁璐
赵坤
董刚
赵雅倩
李仁刚
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2023241000A1 publication Critical patent/WO2023241000A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the field of task scheduling technology, and in particular to a DAG task scheduling method, device, equipment and storage medium.
  • the purpose of this application is to provide a DAG task scheduling method, device, equipment and medium that can shorten the DAG task scheduling length and improve the parallel execution efficiency of DAG tasks.
  • the specific plan is as follows:
  • the DAG task scheduling model is used to determine the scheduling order of subtasks within the DAG task to be executed, and the parallel computing system is used to execute the DAG task to be executed according to the scheduling order.
  • the network model before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
  • a directed graph neural network is constructed in the order of input layer, K-layer graph convolution layer, and output layer.
  • the network model before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
  • the objective function of the network model is defined with the minimum task scheduling length as the goal, including:
  • the DAG task data set is obtained, including:
  • DAG task parameters include the number of task layers, the number of child nodes of the target node, the probability of generating child nodes of the target node, the probability of adding a connecting edge between two adjacent task layers, and the number of each subtask. Computational load;
  • a corresponding information matrix is generated for each DAG task in the DAG task data set, including:
  • the information matrix corresponding to the DAG task is obtained.
  • the information matrix is used to train the network model
  • reinforcement learning is used to update the model parameters of the network model according to the objective function, including:
  • the subtasks within the DAG task are prioritized based on the attention mechanism and the context of the DAG task according to the vector representation of the subtask;
  • reinforcement learning is used to update the model parameters of the network model until the network model converges.
  • this application discloses a DAG task scheduling device, including:
  • the network building module is used to build the network model in the order of directed graph neural network and sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
  • the data set acquisition module is used to obtain the DAG task data set and generate the corresponding information matrix for each DAG task in the DAG task data set;
  • the training module is used to train the network model using the information matrix, and update the model parameters of the network model using reinforcement learning according to the objective function to obtain the trained DAG task scheduling model;
  • the scheduling sequence determination module is used to use the DAG task scheduling model to determine the scheduling sequence of subtasks within the DAG task to be executed, and to use the parallel computing system to execute the DAG task to be executed according to the scheduling sequence.
  • the DAG task scheduling device also includes:
  • the graph convolution layer building unit is used to construct a graph convolution layer for DAG task feature learning based on aggregation functions and non-linear activation functions;
  • the directed graph neural network construction unit is used to construct the directed graph neural network in the order of the input layer, K-layer graph convolution layer, and output layer.
  • the DAG task scheduling device also includes:
  • the vector expression definition unit is used to use the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task;
  • the sequential decoder building unit is used to build a sequential decoder for prioritization based on the vector expression of the attention mechanism and the context environment to obtain the decoder.
  • network building modules include:
  • the scheduling length deceleration rate evaluation index construction unit is used to generate the scheduling length deceleration rate evaluation index of the DAG task using the task scheduling length corresponding to the priority sorting of the DAG task at different time steps and the lower limit of the task scheduling length as independent variables; the task scheduling length The lower limit is determined based on the path length of the critical path of the DAG task;
  • the reward function construction unit is used to construct the reward function based on the policy gradient algorithm and the scheduling length deceleration rate evaluation index;
  • the objective function building unit is used to build the objective function of the network model based on the reward function.
  • the data set acquisition module includes:
  • DAG task parameters include the number of task layers, the number of child nodes of the target node, the generation probability of child nodes of the target node, and the connecting edges between two adjacent task layers. Add probabilities and computational loads for individual subtasks;
  • a task generation unit is used to generate a DAG task according to the DAG task parameters to obtain a DAG task data set.
  • the data set acquisition module includes:
  • the node feature matrix generation unit is used to generate the node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set;
  • the adjacency matrix generation unit is used to generate an adjacency matrix based on the connection relationship between different subtasks in the DAG task data set;
  • the information matrix determination unit is used to obtain the information matrix corresponding to the DAG task based on the node feature matrix and the adjacency matrix.
  • this application discloses an electronic device, including:
  • Memory used to hold computer programs
  • a processor is used to execute a computer program to implement the aforementioned DAG task scheduling method.
  • the present application discloses a non-volatile readable storage medium for storing a computer program; when the computer program is executed by a processor, the aforementioned DAG task scheduling method is implemented.
  • the network model is constructed according to the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and each of the DAG task data sets is The DAG task generates the corresponding information matrix; uses the information matrix to train the network model, and uses reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model to determine the DAG to be executed The scheduling order of subtasks within the task, and the parallel computing system is used to execute the DAG tasks to be executed according to the scheduling order.
  • a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning.
  • the directed graph neural network can automatically identify rich features related to subtasks within the DAG task, and the sequential decoder can use these features to prioritize subtasks.
  • the reinforcement learning optimization model is used to achieve the scheduling goal of minimizing the DAG task scheduling length, which can shorten the DAG task scheduling length, improve the parallel execution efficiency of DAG tasks, and use reinforcement learning to solve the difficulty of allocating optimal priorities for DAG tasks. Enough supervision labeling issues.
  • Figure 1 is a flow chart of a DAG task scheduling method provided by this application.
  • Figure 3 is a specific directed graph neural network structure diagram provided by this application.
  • Figure 4 is a flow chart of a specific DAG task scheduling model training method provided by this application.
  • Figure 5 is a schematic structural diagram of a DAG task scheduling device provided by this application.
  • the network model is first constructed in the order of Directed Graph Neural Network (DGNN) and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; wherein, the above are
  • the directed graph neural network is used to identify the task characteristics of the subtasks within the DAG task and output the embedded representation corresponding to each subtask, that is, a vector representation.
  • the task characteristics include execution time and dependencies; the above-mentioned sequential decoder is used to output according to the directed graph neural network
  • the embedded representation sorts the priorities of all subtasks and outputs the priority sorting of the subtasks.
  • P ⁇ p i
  • L ⁇ l ij
  • p i ,p j ⁇ P ⁇ is the set of communication links between processing nodes
  • V ⁇ v i
  • B ⁇ b ij ⁇ l ij ⁇ L ⁇ is a set of communication link bandwidths, and b ij represents the bandwidth of communication link l ij .
  • the directed edges Represents the communication and data dependency relationship from the subtask ti connected by this edge to its other subtask t j .
  • T j must start execution after receiving the calculation result of ti .
  • the network model before constructing the network model in the order of directed graph neural network and sequential decoder, it may also include: constructing a graph convolution layer for DAG task feature learning based on the aggregation function and nonlinear activation function; according to the input layer , K-layer graph convolution layer, and output layer are sequentially constructed to obtain a directed graph neural network. Use the above directed graph neural network to learn the vector representation of each sub-task within the above DAG task;
  • the aggregate function aggregates messages from the immediate predecessors of subtask ti , and the update function performs nonlinear transformation on the aggregated messages;
  • Pred(t i ) is the set of direct predecessor subtasks of ti .
  • the aggregate function it can use a variety of methods such as taking the maximum value, taking the average, etc.
  • This patent uses the attention mechanism to implement:
  • ⁇ ij represents the attention coefficient of subtask t j to t i , which is learned through training.
  • the update function it can be any nonlinear activation function. It is not general.
  • the ReLu function is used to implement:
  • the output layer directly outputs the vertex embedded representation learned by the Kth graph convolution layer.
  • the graph convolution layer constructed by using aggregation functions and nonlinear activation functions can better adapt to the directed characteristics of the DAG task graph, extract the dependencies between subtasks, and be able to identify the characteristics of the subtasks themselves and the characteristics of the subtasks.
  • the DAG task has dependencies on other subtasks, so as to learn the embedded representation of subtask nodes more effectively, so as to provide richer features for subsequent subtask prioritization, thereby improving the accuracy of prioritization.
  • the network model in the order of directed graph neural network and sequential decoder may also include: using the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task; Building a sequential decoder for prioritization based on attention mechanisms and vector representations of context. It can be understood that the above decoder is a sequential decoder used to sort all subtasks of the DAG task, specifically based on the embedded representation of all n subtask nodes of the DAG task learned by the directed graph neural network.
  • the sequential decoder can formally describe the DAG task priority allocation as a probability distribution defined by the following formula:
  • is the network parameter to be optimized.
  • the sequential decoder first samples the subtask node with the highest priority from the probability distribution p( ⁇ 1 ⁇ ), and then samples the subtask node with the second highest priority from p( ⁇ 2 ⁇ 1 , ⁇ ), And so on until all subtask nodes are sampled.
  • ⁇ ⁇ argmax(p( ⁇ ⁇ ⁇ 1 , ⁇ 2 ,..., ⁇ ⁇ -1 , ⁇ )) (5)
  • softmax is the normalized exponential function
  • W ⁇ is the feature transformation matrix to be trained
  • is the nonlinear activation function
  • att is the attention function
  • U ⁇ is the set of subtasks that have not yet been assigned priorities at time step ⁇ .
  • the sets O ⁇ and U ⁇ will be updated in real time according to the priority allocation status; cont is the order.
  • the context of the real-time subtask selection of the decoder It can be understood that in formula (6), the vector representation of the subtask is compared with the vector representation of the context, and then the attention mechanism is used to allocate the weight of the subtask; the context environment
  • the vector representation calculation formula is as follows:
  • W represents linear transformation
  • [;]" represents tensor connector
  • cont O and cont U are the embedded representations corresponding to O ⁇ and U ⁇ respectively, which are calculated by the following formula:
  • MDP Markov Decision Process
  • ⁇ critical represents the DAG task critical path length.
  • the DAG task data set used for model training is obtained, and then the information matrix of each DAG task in the DAG task data set is extracted, including the node feature matrix and the adjacency matrix.
  • generating a corresponding information matrix for each DAG task in the DAG task data set may include: generating a node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set; node features The matrix is a feature matrix that represents the computing load of the subtask, the normalized computing load, and the in-degree and out-degree of the subtask; an adjacency matrix is generated based on the connection relationship between different subtasks in the DAG task data set; based on the node feature matrix and the adjacency matrix to obtain the information matrix corresponding to the DAG task.
  • obtaining the DAG task data set may include: configuring DAG task parameters; the DAG task parameters include the number of task layers, the number of child nodes of the target node, the generation probability of child nodes of the target node, and the number of adjacent two nodes.
  • the connection edge adding probability between each task layer and the computing load of each sub-task are generated; a DAG task is generated according to the DAG task parameters to obtain a DAG task data set. Due to the lack of publicly available large-scale DAG task data sets, in this embodiment, the DAG task is first generated, which can be obtained using a parallel task generation model based on the DAG task parameters.
  • the nested fork-join task model synthesizes a DAG task; this model is controlled by four parameters, namely n depth , n child , p fork and p pert .
  • n depth represents the number or depth of DAG task layers
  • n child represents the number of child nodes of a certain node
  • p fork represents the probability of generating a child node for a certain node
  • p pert is between two adjacent nodes. The probability of randomly adding connecting edges.
  • n child that is Determine the number of child nodes under the target node through uniform distribution.
  • This process starts from the entry subtask node and is repeated n depth times, thereby creating a DAG task with n depth levels.
  • connecting edges are randomly added between the k-th layer and k+1-th layer nodes of the DAG task with probability p pert . The larger the value of p pert , the higher the degree of parallelism of the generated DAG task.
  • edges from the last layer node to the exit node add edges from the last layer node to the exit node and assign computational load to each subtask.
  • the computing load of the subtask obeys the normal distribution with parameters ⁇ ( ⁇ >0) and ⁇ , where ⁇ represents the average computing load of the subtask, and ⁇ represents the standard deviation of the computing load of each subtask.
  • represents the average computing load of the subtask
  • represents the standard deviation of the computing load of each subtask.
  • it can also Assuming other distributions, it is sufficient to ensure that the computing load of each subtask is positive, and there is no limit here.
  • For the constructed DAG task extract the characteristics of each subtask node and construct the node feature matrix X; according to the interconnection relationship between nodes, construct the adjacency matrix A.
  • Step S13 Use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model.
  • the model parameters are first initialized.
  • a specific strategy such as normal distribution random initialization, Xavier initialization or He Initialization initialization
  • the parameters W of each layer of the directed graph neural network are initialized, and the sequential decoder model parameter p( ⁇ ) is initialized.
  • the above-mentioned DAG task data set is divided into a training set and a test set.
  • the cross-validation method and retention method can be used. Partitioning methods such as the leave-one-out method or the leave-one-out method are used, in which the test set is used to train the above-mentioned network model, and the test set is used to test the trained network model.
  • the information matrix is used to train the network model
  • reinforcement learning is used to update the model parameters of the network model according to the objective function, which may include:
  • S130 Input the information matrix into the network model, and use the directed graph neural network to output the vector representation of each subtask based on the characteristics of the subtask and the dependency relationship between the subtasks;
  • S131 Use the sequential decoder to prioritize the subtasks within the DAG task based on the attention mechanism and the context of the DAG task according to the vector representation of the subtask;
  • the node feature matrix and adjacency matrix contained in the information matrix are used as the input of the network model, forward propagation is performed, the vector representation of all subtasks is obtained through the directed graph neural network, and the sequential decoder outputs the subtask priority ranking, which can Schedule the execution of subtasks in sequence through the DAG task simulation scheduler, and calculate the corresponding scheduling length, and then calculate the model objective function value according to formula (12); and according to a certain strategy such as stochastic gradient descent or Adam and other algorithms, back propagation corrects each layer Network parameter values. Therefore, the reinforcement learning algorithm is used to minimize the DAG task scheduling length, and the network model is continuously optimized by rewarding DAG task priority ranking with shorter scheduling length; thus, the resulting scheduling length is shorter and the parallel computing efficiency is higher. It can effectively avoid the difficulty of collecting enough supervision labels for optimal priority allocation of DAG tasks.
  • the network model is trained by finding the gradient about ⁇ for the objective function J defined by formula (11):
  • Equation (12) is the gradient operator.
  • Equation (12) can be estimated using the Monte Carlo stochastic gradient descent method:
  • B represents the set of subtasks of the DAG task randomly sampled from the data set.
  • the scheduling plan obtained at this time is the optimal scheduling plan. That is, the gradient of the objective function is estimated based on the Monte Carlo stochastic gradient descent method. Therefore, in this embodiment, deep reinforcement learning for the DAG task is implemented based on the directed graph neural network and the objective function.
  • the DAG task scheduling length can be obtained by scheduling all subtasks in sequence through the DAG task scheduling simulator to be executed in parallel on the parallel computing system ARC, and recording the completion time of the exit task to obtain the DAG task scheduling length.
  • Step S14 Use the DAG task scheduling model to determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order.
  • this embodiment is based on deep reinforcement learning and directed graph neural network to prioritize tasks, thereby determining the scheduling order of tasks, reducing task execution time, and improving task execution efficiency.
  • This embodiment also proposes a DAG task scheduling system based on deep reinforcement learning and directed graph neural network.
  • the system consists of an input module, a directed graph neural network, a sequential decoder, a scheduling length calculation module and a model parameter update module.
  • the input module is responsible for reading the node feature matrix
  • the expression is decoded by the sequential decoder, and the output is the priority ordering of all subtasks; the scheduling length calculation module schedules them to be executed on the parallel computing system based on this ordering, and uses the scheduling length as a feedback signal to update the model using a reinforcement learning algorithm. parameter.
  • the above-mentioned DAG task scheduling system based on deep reinforcement learning and directed graph neural network takes the DAG task as input, generates an embedded representation for each sub-task of the DAG task through the directed graph neural network, and uses the sequential decoder to generate all sub-tasks. Prioritize tasks and calculate the task scheduling length or completion time corresponding to this ranking. The system aims to minimize the scheduling length of DAG tasks, and the calculated scheduling length is used as a reward signal to update the model through a reinforcement learning algorithm.
  • the network model is constructed in the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and the DAG task data is Each DAG task in the set generates a corresponding information matrix; use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model Determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order.
  • a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning.
  • the directed graph neural network can automatically identify rich features related to subtasks within the DAG task, and the sequential decoder can use these features to prioritize subtasks.
  • using the reinforcement learning optimization model to achieve the scheduling goal of minimizing the DAG task scheduling length can shorten the DAG task scheduling length, improve the parallel execution efficiency of DAG tasks, and use reinforcement learning to solve the difficulty of allocating optimal priorities for DAG tasks. Enough supervision labeling issues.
  • the embodiment of the present application also discloses a DAG task scheduling device, as shown in Figure 5.
  • the device includes:
  • the network building module 11 is used to build a network model in the order of a directed graph neural network and a sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
  • the network model is constructed in the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and the DAG task data is Each DAG task in the set generates a corresponding information matrix; use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model Determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order.
  • a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning.
  • the DAG task scheduling device may include:
  • the vector expression definition unit is used to use the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task;
  • the network building module 11 may specifically include:
  • the scheduling length deceleration rate evaluation index construction unit is used to generate the scheduling length deceleration rate evaluation index of the DAG task using the task scheduling length corresponding to the priority sorting of the DAG task at different time steps and the lower limit of the task scheduling length as independent variables; the task scheduling length The lower limit is determined based on the path length of the critical path of the DAG task;
  • the objective function building unit is used to build the objective function of the network model based on the reward function.
  • the data set acquisition module 12 may include:
  • a task generation unit is used to generate a DAG task according to the DAG task parameters to obtain a DAG task data set.
  • the data set acquisition module 12 may include:
  • the node feature matrix generation unit is used to generate the node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set;
  • the adjacency matrix generation unit is used to generate an adjacency matrix based on the connection relationship between different subtasks in the DAG task data set;
  • the information matrix determination unit is used to obtain the information matrix corresponding to the DAG task based on the node feature matrix and the adjacency matrix.
  • the embodiment of the present application also discloses an electronic device, as shown in FIG. 6 .
  • the content in the figure cannot be considered as any limitation on the scope of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application.
  • the electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input-output interface 25 and a communication bus 26.
  • the memory 22 is used to store computer programs, and the computer programs are loaded and executed by the processor 21 to implement relevant steps in the DAG task scheduling method disclosed in any of the foregoing embodiments.
  • the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20;
  • the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here;
  • the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here No specific limitation is made.
  • the operating system 221 is used to manage and control each hardware device and the computer program 222 on the electronic device 20 to realize the calculation and processing of the massive data 223 in the memory 22 by the processor 21. It can be Windows Server, Netware, Unix, Linux etc.
  • the computer program 222 may further include computer programs that can be used to complete other specific tasks.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A DAG task scheduling method and apparatus, a device, and a storage medium. The method comprises: constructing a network model according to the sequence of a directed graph neural network and a sequential decoder, and defining an objective function of the network model by taking the minimum task scheduling length as an objective; obtaining a DAG task data set, and generating a corresponding information matrix for each DAG task in the DAG task data set; training the network model by using the information matrix, and updating model parameters of the network model by using reinforcement learning according to the objective function to obtain a trained DAG task scheduling model; and determining, by using the DAG task scheduling model, the scheduling sequence of sub-tasks in a DAG task to be executed, and executing, by using a parallel computing system according to the scheduling sequence, the DAG task to be executed. The method can shorten the DAG task scheduling length and improve the parallel execution efficiency of DAG tasks.

Description

一种DAG任务调度方法、装置、设备及存储介质A DAG task scheduling method, device, equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月15日提交中国专利局,申请号为202210671115.8,申请名称为“一种DAG任务调度方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on June 15, 2022, with the application number 202210671115.8, and the application name is "A DAG task scheduling method, device, equipment and storage medium", and its entire content is approved by This reference is incorporated into this application.
技术领域Technical field
本申请涉及任务调度技术领域,特别涉及一种DAG任务调度方法、装置、设备及存储介质。The present application relates to the field of task scheduling technology, and in particular to a DAG task scheduling method, device, equipment and storage medium.
背景技术Background technique
目前,在高性能和复杂功能需求的推动下,并行计算系统越来越多地用于执行实时应用程序,如具有感知、规划和控制等复杂功能组件的对高性能和实时性要求极高的自动驾驶任务。DAG(Directed Acyclic Graph,有向无环图)任务常常用来表示类似实时应用程序的多个任务组件(子任务)之间复杂依赖关系和形式化描述细粒度的并行任务调度问题,即DAG任务调度问题。考虑到非抢占式任务模型可以避免任务迁移和切换开销,针对DAG任务的基于优先级的非抢占式调度受到广泛关注,该问题研究如何将给定的DAG任务非抢占式地调度到并行计算系统上执行,使得处理时间最短,是一个典型的NP(Non-deterministic Polynomial Complete)完全问题。现有技术中,长期的并行计算实践积累了大量优秀的启发式调度算法,如表调度算法和聚类调度算法。但由于启发式策略的性质,这些算法无法为DAG任务调度程序建立基本的设计原则,例如在不同DAG任务规模和配置下如何利用DAG任务执行时间和DAG任务图拓扑结构特征为各个子任务分配优先级,并且调度性能不理想。At present, driven by the demand for high performance and complex functions, parallel computing systems are increasingly used to execute real-time applications, such as those with complex functional components such as sensing, planning, and control that require extremely high performance and real-time performance. Autonomous driving tasks. DAG (Directed Acyclic Graph) tasks are often used to represent complex dependencies between multiple task components (subtasks) of similar real-time applications and to formally describe fine-grained parallel task scheduling problems, that is, DAG tasks Scheduling issues. Considering that the non-preemptive task model can avoid task migration and switching overhead, priority-based non-preemptive scheduling for DAG tasks has received widespread attention. This problem studies how to non-preemptively schedule a given DAG task to a parallel computing system. Executing on it to minimize the processing time is a typical NP (Non-deterministic Polynomial Complete) complete problem. In the existing technology, long-term parallel computing practice has accumulated a large number of excellent heuristic scheduling algorithms, such as table scheduling algorithms and clustering scheduling algorithms. However, due to the nature of heuristic strategies, these algorithms cannot establish basic design principles for DAG task schedulers, such as how to use DAG task execution time and DAG task graph topological structure characteristics to allocate priorities to each sub-task under different DAG task sizes and configurations. level, and the scheduling performance is not ideal.
发明内容Contents of the invention
有鉴于此,本申请的目的在于提供一种DAG任务调度方法、装置、设备及介质,能够缩短DAG任务调度长度,提高DAG任务并行执行效率。其具体方案如下:In view of this, the purpose of this application is to provide a DAG task scheduling method, device, equipment and medium that can shorten the DAG task scheduling length and improve the parallel execution efficiency of DAG tasks. The specific plan is as follows:
第一方面,本申请公开了一种DAG任务调度方法,包括:In the first aspect, this application discloses a DAG task scheduling method, including:
按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;Construct a network model in the order of directed graph neural network and sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;Obtain the DAG task data set and generate the corresponding information matrix for each DAG task in the DAG task data set;
利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;Use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model;
利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。The DAG task scheduling model is used to determine the scheduling order of subtasks within the DAG task to be executed, and the parallel computing system is used to execute the DAG task to be executed according to the scheduling order.
在本申请一些实施例中,按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还包括:In some embodiments of the present application, before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
基于聚合函数和非线性激活函数构建用于DAG任务特征学习的图卷积层;Construct a graph convolution layer for DAG task feature learning based on aggregation functions and nonlinear activation functions;
按照输入层、K层图卷积层、输出层的顺序构建得到有向图神经网络。A directed graph neural network is constructed in the order of input layer, K-layer graph convolution layer, and output layer.
在本申请一些实施例中,按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还包括:In some embodiments of the present application, before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
以DAG任务内子任务的优先级分配状态为变量,为DAG任务定义上下文环境的向量表达式;Use the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task;
基于注意力机制和上下文环境的向量表达式构建用于优先级排序的顺序解码器。Building a sequential decoder for prioritization based on attention mechanisms and vector representations of context.
在本申请一些实施例中,以最小任务调度长度为目标定义网络模型的目标函数,包括:In some embodiments of this application, the objective function of the network model is defined with the minimum task scheduling length as the goal, including:
以DAG任务在不同时间步下优先级排序对应的任务调度长度和任务调度长度下限为自变量,生成DAG任务的调度长度减速率评价指标;任务调度长度下限根据DAG任务的关键路径的路径长度确定;Using the task scheduling length corresponding to the priority sorting of DAG tasks at different time steps and the lower limit of the task scheduling length as independent variables, the scheduling length deceleration rate evaluation index of the DAG task is generated; the lower limit of the task scheduling length is determined based on the path length of the critical path of the DAG task. ;
基于策略梯度算法和调度长度减速率评价指标构建奖励函数;Construct a reward function based on the policy gradient algorithm and scheduling length deceleration rate evaluation index;
基于奖励函数构建网络模型的目标函数。Construct the objective function of the network model based on the reward function.
在本申请一些实施例中,获取DAG任务数据集,包括:In some embodiments of this application, the DAG task data set is obtained, including:
配置DAG任务参数;DAG任务参数包括任务层数、目标结点的子结点个数、目标结点的子结点生成概率、相邻两个任务层之间连接边添加概率以及各个子任务的计算负载;Configure DAG task parameters; DAG task parameters include the number of task layers, the number of child nodes of the target node, the probability of generating child nodes of the target node, the probability of adding a connecting edge between two adjacent task layers, and the number of each subtask. Computational load;
根据DAG任务参数生成DAG任务以得到DAG任务数据集。Generate a DAG task according to the DAG task parameters to obtain the DAG task data set.
在本申请一些实施例中,对DAG任务数据集内每个DAG任务生成对应的信息矩阵,包括:In some embodiments of this application, a corresponding information matrix is generated for each DAG task in the DAG task data set, including:
根据DAG任务数据集内DAG任务中每个子任务的特征生成结点特征矩阵;Generate a node feature matrix based on the characteristics of each subtask in the DAG task in the DAG task data set;
根据DAG任务数据集内不同子任务之间的连接关系生成邻接矩阵;Generate an adjacency matrix based on the connection relationships between different subtasks in the DAG task data set;
基于结点特征矩阵和邻接矩阵得到DAG任务对应的信息矩阵。Based on the node feature matrix and adjacency matrix, the information matrix corresponding to the DAG task is obtained.
在本申请一些实施例中,利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,包括:In some embodiments of this application, the information matrix is used to train the network model, and reinforcement learning is used to update the model parameters of the network model according to the objective function, including:
将信息矩阵输入至网络模型,利用有向图神经网络根据子任务的特征和子任务间依赖关系输出得到每个子任务的向量表示;Input the information matrix into the network model, and use the directed graph neural network to output the vector representation of each subtask based on the characteristics of the subtask and the dependency relationship between the subtasks;
利用顺序解码器,根据子任务的向量表示基于注意力机制和DAG任务的上下文环境对DAG任务内的子任务进行优先级排序;Using the sequential decoder, the subtasks within the DAG task are prioritized based on the attention mechanism and the context of the DAG task according to the vector representation of the subtask;
根据优先级排序利用DAG任务调度模拟器计算DAG任务的任务调度长度;Use the DAG task scheduling simulator to calculate the task scheduling length of the DAG task based on priority sorting;
根据任务调度长度和目标函数,利用强化学习更新网络模型的模型参数,直至网络模型收敛。According to the task scheduling length and objective function, reinforcement learning is used to update the model parameters of the network model until the network model converges.
第二方面,本申请公开了一种DAG任务调度装置,包括:In the second aspect, this application discloses a DAG task scheduling device, including:
网络构建模块,用于按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;The network building module is used to build the network model in the order of directed graph neural network and sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
数据集获取模块,用于获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;The data set acquisition module is used to obtain the DAG task data set and generate the corresponding information matrix for each DAG task in the DAG task data set;
训练模块,用于利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;The training module is used to train the network model using the information matrix, and update the model parameters of the network model using reinforcement learning according to the objective function to obtain the trained DAG task scheduling model;
调度顺序确定模块,用于利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。The scheduling sequence determination module is used to use the DAG task scheduling model to determine the scheduling sequence of subtasks within the DAG task to be executed, and to use the parallel computing system to execute the DAG task to be executed according to the scheduling sequence.
在本申请一些实施例中,DAG任务调度装置,还包括:In some embodiments of this application, the DAG task scheduling device also includes:
图卷积层构建单元,用于基于聚合函数和非线性激活函数构建用于DAG任务特征学习的图卷积层;The graph convolution layer building unit is used to construct a graph convolution layer for DAG task feature learning based on aggregation functions and non-linear activation functions;
有向图神经网络构建单元,用于按照输入层、K层图卷积层、输出层的顺序构建得到有向图神经网络。The directed graph neural network construction unit is used to construct the directed graph neural network in the order of the input layer, K-layer graph convolution layer, and output layer.
在本申请一些实施例中,DAG任务调度装置,还包括:In some embodiments of this application, the DAG task scheduling device also includes:
向量表达式定义单元,用于以DAG任务内子任务的优先级分配状态为变量,为DAG任务定义上下文环境的向量表达式;The vector expression definition unit is used to use the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task;
顺序解码器构建单元,用于基于注意力机制和上下文环境的向量表达式构建用于优先级排序的顺序解码器,以得到解码器。The sequential decoder building unit is used to build a sequential decoder for prioritization based on the vector expression of the attention mechanism and the context environment to obtain the decoder.
在本申请一些实施例中,网络构建模块,包括:In some embodiments of this application, network building modules include:
调度长度减速率评价指标构建单元,用于以DAG任务在不同时间步下优先级排序对应的任务调度长度和任务调度长度下限为自变量,生成DAG任务的调度长度减速率评价指标;任务调度长度下限根据DAG任务的关键路径的路径长度确定;The scheduling length deceleration rate evaluation index construction unit is used to generate the scheduling length deceleration rate evaluation index of the DAG task using the task scheduling length corresponding to the priority sorting of the DAG task at different time steps and the lower limit of the task scheduling length as independent variables; the task scheduling length The lower limit is determined based on the path length of the critical path of the DAG task;
奖励函数构建单元,用于基于策略梯度算法和调度长度减速率评价指标构建奖励函数;The reward function construction unit is used to construct the reward function based on the policy gradient algorithm and the scheduling length deceleration rate evaluation index;
目标函数构建单元,用于基于奖励函数构建网络模型的目标函数。The objective function building unit is used to build the objective function of the network model based on the reward function.
在本申请一些实施例中,数据集获取模块,包括:In some embodiments of this application, the data set acquisition module includes:
任务参数配置单元,用于配置DAG任务参数;DAG任务参数包括任务层数、目标结点的子结点个数、目标结点的子结点生成概率、相邻两个任务层之间连接边添加概率以及各个子任务的计算负载;Task parameter configuration unit, used to configure DAG task parameters; DAG task parameters include the number of task layers, the number of child nodes of the target node, the generation probability of child nodes of the target node, and the connecting edges between two adjacent task layers. Add probabilities and computational loads for individual subtasks;
任务生成单元,用于根据DAG任务参数生成DAG任务以得到DAG任务数据集。A task generation unit is used to generate a DAG task according to the DAG task parameters to obtain a DAG task data set.
在本申请一些实施例中,数据集获取模块,包括:In some embodiments of this application, the data set acquisition module includes:
结点特征矩阵生成单元,用于根据DAG任务数据集内DAG任务中每个子任务的特征生成结点特征矩阵;The node feature matrix generation unit is used to generate the node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set;
邻接矩阵生成单元,用于根据DAG任务数据集内不同子任务之间的连接关系生成邻接矩阵;The adjacency matrix generation unit is used to generate an adjacency matrix based on the connection relationship between different subtasks in the DAG task data set;
信息矩阵确定单元,用于基于结点特征矩阵和邻接矩阵得到DAG任务对应的信息矩阵。The information matrix determination unit is used to obtain the information matrix corresponding to the DAG task based on the node feature matrix and the adjacency matrix.
第三方面,本申请公开了一种电子设备,包括:In a third aspect, this application discloses an electronic device, including:
存储器,用于保存计算机程序;Memory, used to hold computer programs;
处理器,用于执行计算机程序,以实现前述的DAG任务调度方法。A processor is used to execute a computer program to implement the aforementioned DAG task scheduling method.
第四方面,本申请公开了一种非易失性可读存储介质,用于存储计算机程序;其中计算机程序被处理器执行时实现前述的DAG任务调度方法。In a fourth aspect, the present application discloses a non-volatile readable storage medium for storing a computer program; when the computer program is executed by a processor, the aforementioned DAG task scheduling method is implemented.
本申请中,按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。本申请中基于有向图神经网络和强化学习得到DAG任务调度模型,有向图神经网络能自动识别DAG任务内子任务相关的丰富特征,顺序解码器能够利用这些特征对子任务进行任务优先级排序,同时,利用强化学习优化模型实现最小化DAG任务调度长度的调度目标, 能够缩短DAG任务调度长度,提高DAG任务并行执行效率,并且利用强化学习能够解决难以为DAG任务的最佳优先级分配收集足够多监督标签的问题。In this application, the network model is constructed according to the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and each of the DAG task data sets is The DAG task generates the corresponding information matrix; uses the information matrix to train the network model, and uses reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model to determine the DAG to be executed The scheduling order of subtasks within the task, and the parallel computing system is used to execute the DAG tasks to be executed according to the scheduling order. In this application, a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning. The directed graph neural network can automatically identify rich features related to subtasks within the DAG task, and the sequential decoder can use these features to prioritize subtasks. , At the same time, the reinforcement learning optimization model is used to achieve the scheduling goal of minimizing the DAG task scheduling length, which can shorten the DAG task scheduling length, improve the parallel execution efficiency of DAG tasks, and use reinforcement learning to solve the difficulty of allocating optimal priorities for DAG tasks. Enough supervision labeling issues.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only This is an embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.
图1为本申请提供的一种DAG任务调度方法流程图;Figure 1 is a flow chart of a DAG task scheduling method provided by this application;
图2为本申请提供的一种具体的DAG任务调度系统结构图;Figure 2 is a specific DAG task scheduling system structure diagram provided by this application;
图3为本申请提供的一种具体的有向图神经网络结构图;Figure 3 is a specific directed graph neural network structure diagram provided by this application;
图4为本申请提供的一种具体的DAG任务调度模型训练方法流程图;Figure 4 is a flow chart of a specific DAG task scheduling model training method provided by this application;
图5为本申请提供的一种DAG任务调度装置结构示意图;Figure 5 is a schematic structural diagram of a DAG task scheduling device provided by this application;
图6为本申请提供的一种电子设备结构图。Figure 6 is a structural diagram of an electronic device provided by this application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are only some of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
现有技术中,长期的并行计算实践积累了大量优秀的启发式调度算法,如表调度算法和聚类调度算法,但由于启发式策略的性质,这些算法无法为DAG任务调度程序建立基本的设计原则,例如在不同DAG任务规模和配置下如何利用DAG任务执行时间和DAG任务图拓扑结构特征为各个子任务分配优先级,并且调度性能不理想。为克服上述技术问题,本申请提出一种DAG任务调度方法,能够缩短DAG任务调度长度,提高DAG任务并行执行效率。In the existing technology, long-term parallel computing practice has accumulated a large number of excellent heuristic scheduling algorithms, such as table scheduling algorithms and clustering scheduling algorithms. However, due to the nature of the heuristic strategy, these algorithms cannot establish a basic design for the DAG task scheduler. Principles, such as how to use DAG task execution time and DAG task graph topological structure characteristics to assign priorities to each sub-task under different DAG task sizes and configurations, and the scheduling performance is not ideal. In order to overcome the above technical problems, this application proposes a DAG task scheduling method, which can shorten the DAG task scheduling length and improve the parallel execution efficiency of DAG tasks.
本申请实施例公开了一种DAG任务调度方法,参见图1所示,该方法可以包括以下步骤:The embodiment of the present application discloses a DAG task scheduling method, as shown in Figure 1. The method may include the following steps:
步骤S11:按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数。Step S11: Construct a network model in the order of directed graph neural network and sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal.
本实施例中,首先按照有向图神经网络(DGNN,Directed Graph Neural Network)、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义该网络模型的目标函数;其中,上述有向图神经网络用于识别DAG任务内子任务的任务特征并输出每个子任务对应的嵌入式表示,即向量表示,任务特征包括执行时间和依赖关系;上述顺序解码器用于根据有向图神经网络输出的嵌入式表示对所有子任务的优先级进行排序,输出子任务的优先级排序。目标函数用于指导网络模型的学习,以便网络模型最后能够实现根据输入的DAG任务输出该DAG任务的最小任务调度长度。其中,上述网络模型中还包括DAG任务调度模拟器,用于计算DAG任务在给定并行计算系统上的调度长度。In this embodiment, the network model is first constructed in the order of Directed Graph Neural Network (DGNN) and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; wherein, the above are The directed graph neural network is used to identify the task characteristics of the subtasks within the DAG task and output the embedded representation corresponding to each subtask, that is, a vector representation. The task characteristics include execution time and dependencies; the above-mentioned sequential decoder is used to output according to the directed graph neural network The embedded representation of sorts the priorities of all subtasks and outputs the priority sorting of the subtasks. The objective function is used to guide the learning of the network model so that the network model can finally output the minimum task scheduling length of the DAG task according to the input DAG task. Among them, the above-mentioned network model also includes a DAG task scheduling simulator, which is used to calculate the scheduling length of DAG tasks on a given parallel computing system.
在进行本实施例的详细描述之前,首先对本实施例中涉及的基础概念进行解释。并行计算系统,一般可描述为一个四元组ARC=(P,L,V,B)。其中:P={p i∣i=1,2,....,m}是处理节点集合;L={l ij∣p i,p j∈P}是处理节点之间通信链路集合;V={v i∣i=1,2,….,m}是处理节点 的计算速度集合,v i表示p i的计算速度,且满足v 1≤v 2≤…≤v k;B={b ij∣l ij∈L}是通信链路带宽的集合,b ij表示通信链路l ij的带宽。 Before describing this embodiment in detail, the basic concepts involved in this embodiment will first be explained. A parallel computing system can generally be described as a four-tuple ARC = (P, L, V, B). Among them: P={p i |i=1,2,....,m} is the set of processing nodes; L={l ij |p i ,p j ∈P} is the set of communication links between processing nodes; V={v i |i=1,2,….,m} is the calculation speed set of processing nodes, vi represents the calculation speed of p i , and satisfies v 1 ≤v 2 ≤…≤v k ; B={ b ij ∣l ij ∈L} is a set of communication link bandwidths, and b ij represents the bandwidth of communication link l ij .
DAG任务,是指具有复杂依赖关系的、可在并行计算系统上并行执行的多个子任务,常用带权的有向无环图表示,并记为DAG=(T,E,C)。其中,T={t i∣i=1,2,3,…,n}是结点集合,每个结点表示一个子任务,n为子任务总数。 DAG tasks refer to multiple subtasks with complex dependencies that can be executed in parallel on a parallel computing system. They are commonly represented by weighted directed acyclic graphs and are recorded as DAG = (T, E, C). Among them, T = {t i |i = 1, 2, 3,..., n} is a set of nodes, each node represents a subtask, and n is the total number of subtasks.
E={∣i,j=1,2,3,…,n}是有向边集合,有向边
Figure PCTCN2022142437-appb-000001
表示从该边所连接的子任务t i到其另一子任务t j之间的通信和数据依赖关系,t j必须在收到t i的计算结果之后才能启动执行。C={c i∣i=1,2,…,n}是计算负载的集合,c i表示子任务t i的计算负载,并记所有子任务计算负载之和为w,则有
Figure PCTCN2022142437-appb-000002
E={∣i,j=1,2,3,…,n} is the set of directed edges. The directed edges
Figure PCTCN2022142437-appb-000001
Represents the communication and data dependency relationship from the subtask ti connected by this edge to its other subtask t j . T j must start execution after receiving the calculation result of ti . C={c i∣i =1,2,…,n} is a set of computing loads, c i represents the computing load of subtask t i , and the sum of the computing loads of all subtasks is w, then we have
Figure PCTCN2022142437-appb-000002
设Pred(t i)和Succ(t i)分别为t i的直接前驱子任务集和直接后继子任务集。称t i与Pred(t i)和Succ(t i)中所有子任务间的连接边的集合分别为t i的入射边集E in(t i)和出射边集E out(t i);记t i的入度和出度分别为deg in(t i)和deg out(t i),则有deg in(t i)=|E in(t i)|和deg out(t i)=|E out(t i)|。若
Figure PCTCN2022142437-appb-000003
则称t i是入口子任务并记为tentry;若
Figure PCTCN2022142437-appb-000004
Figure PCTCN2022142437-appb-000005
则称t i是出口子任务并记为texit。路径λ=[t 1,t 2,…,t k]是子任务结点的有限序列,且满足
Figure PCTCN2022142437-appb-000006
Figure PCTCN2022142437-appb-000007
如果一条路径同时包含入口子任务和出口子任务,则称该路径为完整路径。λ的路径长度Λ(λ)是该路径上所有子任务的计算负载之和,即
Figure PCTCN2022142437-appb-000008
Let Pred(t i ) and Succ(t i ) be the direct predecessor subtask set and direct successor subtask set of t i respectively. The set of connecting edges between t i and all subtasks in Pred(t i ) and Succ(t i ) is called the incident edge set E in (t i ) and the outgoing edge set E out (t i ) of t i respectively; Denote the in-degree and out-degree of t i as deg in (t i ) and deg out (t i ) respectively, then deg in (t i )=|E in (t i )| and deg out (t i )= |E out (t i )|. like
Figure PCTCN2022142437-appb-000003
Then ti is called the entry subtask and is recorded as tentry; if
Figure PCTCN2022142437-appb-000004
Figure PCTCN2022142437-appb-000005
Then ti is said to be an exit subtask and is recorded as texit. The path λ = [t 1 , t 2 ,..., t k ] is a finite sequence of subtask nodes and satisfies
Figure PCTCN2022142437-appb-000006
have
Figure PCTCN2022142437-appb-000007
If a path contains both entry subtasks and exit subtasks, the path is called a complete path. The path length Λ(λ) of λ is the sum of the computational loads of all subtasks on the path, that is
Figure PCTCN2022142437-appb-000008
路径长度最长的完整路径称为关键路径。每个子任务结点vi的原始特征xi除了包含任务负载、入度、出度结点级特征外,还包括关键路径长度和非关键路径长度,非关键路径长度可通过DAG任务总计算负载减去关键路径长度得到。例如图2所示,为一种具体的DAG任务调度系统,展示了一个含9个结点、具有唯一入口和出口的DAG任务,结点内的字符串表示子任务ID和计算负载。事实上,DAG任务可能有多个入口和出口,可以通过添加一个虚拟的入口子任务或出口子任务以及相应连接边,将其变成具有唯一入口和出口的DAG任务。除非特别说明,本实施例中均指后一类DAG任务,且满足子任务个数满足n远远大于并行计算系统节点数m。The complete path with the longest path length is called the critical path. In addition to task load, in-degree, and out-degree node-level features, the original features xi of each subtask node vi also include critical path length and non-critical path length. The non-critical path length can be subtracted from the total computing load of the DAG task. The critical path length is obtained. For example, Figure 2 shows a specific DAG task scheduling system, which shows a DAG task with 9 nodes and a unique entrance and exit. The strings in the nodes represent the subtask ID and computing load. In fact, a DAG task may have multiple entrances and exits, which can be turned into a DAG task with unique entrances and exits by adding a dummy entry subtask or exit subtask and corresponding connecting edges. Unless otherwise specified, this embodiment refers to the latter type of DAG tasks, and the number of subtasks n satisfies the requirement that it is much larger than the number m of nodes in the parallel computing system.
本实施例中,按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还可以包括:基于聚合函数和非线性激活函数构建用于DAG任务特征学习的图卷积层;按照输入层、K层图卷积层、输出层的顺序构建得到有向图神经网络。利用上述有向图神经网络学习上述DAG任务内每个子任务的向量表示;In this embodiment, before constructing the network model in the order of directed graph neural network and sequential decoder, it may also include: constructing a graph convolution layer for DAG task feature learning based on the aggregation function and nonlinear activation function; according to the input layer , K-layer graph convolution layer, and output layer are sequentially constructed to obtain a directed graph neural network. Use the above directed graph neural network to learn the vector representation of each sub-task within the above DAG task;
上述向量表示,记做
Figure PCTCN2022142437-appb-000009
The above vector representation is recorded as
Figure PCTCN2022142437-appb-000009
如图3所示,设计有向图神经网络,它由输入层(input layer)、K层图卷积层(graph conv layer)和输出层(output layer)组成,输入层读取DAG任务的结点特征矩阵X和邻接矩阵A;第k层图卷积层的图卷积操作由聚合函数(aggregate函数)和非线性激活函数实现,如下:As shown in Figure 3, a directed graph neural network is designed, which consists of an input layer, a K-layer graph convolution layer, and an output layer. The input layer reads the results of the DAG task. Point feature matrix
Figure PCTCN2022142437-appb-000010
Figure PCTCN2022142437-appb-000010
Figure PCTCN2022142437-appb-000011
Figure PCTCN2022142437-appb-000011
其中,aggregate函数聚合来自子任务t i的直接前驱发来的消息,update函数对聚合后的消息执行非线性变换;Pred(t i)为t i的直接前驱子任务集。对于aggregate函数,它可以采用多种方式如取最大值、取平均值等,本专利采用注意力机制实现: Among them, the aggregate function aggregates messages from the immediate predecessors of subtask ti , and the update function performs nonlinear transformation on the aggregated messages; Pred(t i ) is the set of direct predecessor subtasks of ti . For the aggregate function, it can use a variety of methods such as taking the maximum value, taking the average, etc. This patent uses the attention mechanism to implement:
Figure PCTCN2022142437-appb-000012
Figure PCTCN2022142437-appb-000012
其中,α ij表示子任务t j对t i的注意力系数,它通过训练学习得到。对于update函数,它可以是任意的非线性激活函数。不是一般性,本实施例中采用ReLu函数实现: Among them, α ij represents the attention coefficient of subtask t j to t i , which is learned through training. For the update function, it can be any nonlinear activation function. It is not general. In this embodiment, the ReLu function is used to implement:
Figure PCTCN2022142437-appb-000013
Right now
Figure PCTCN2022142437-appb-000013
输出层直接输出第K层图卷积层学习到的顶点嵌入式表示。可以理解的是,通过利用聚合函数和非线性激活函数构建的图卷积层,能更好的适应DAG任务图的有向特性,提取子任务间的依赖关系,能够识别子任务自身特征以及在DAG任务中与其余子任务的依赖关系,从而更加有效的学习子任务结点的嵌入式表示,以便为后续的子任务优先级排序提供更丰富的特征,进而提高优先级排序的准确性。The output layer directly outputs the vertex embedded representation learned by the Kth graph convolution layer. It can be understood that the graph convolution layer constructed by using aggregation functions and nonlinear activation functions can better adapt to the directed characteristics of the DAG task graph, extract the dependencies between subtasks, and be able to identify the characteristics of the subtasks themselves and the characteristics of the subtasks. The DAG task has dependencies on other subtasks, so as to learn the embedded representation of subtask nodes more effectively, so as to provide richer features for subsequent subtask prioritization, thereby improving the accuracy of prioritization.
本实施例中,按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还可以包括:以DAG任务内子任务的优先级分配状态为变量,为DAG任务定义上下文环境的向量表达式;基于注意力机制和上下文环境的向量表达式构建用于优先级排序的顺序解码器。可以理解的是,上述解码器为顺序解码器用于为DAG任务的所有子任务排序,具体的根据有向图神经网络学习到的DAG任务所有n个子任务结点的嵌入式表示。In this embodiment, before constructing the network model in the order of directed graph neural network and sequential decoder, it may also include: using the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task; Building a sequential decoder for prioritization based on attention mechanisms and vector representations of context. It can be understood that the above decoder is a sequential decoder used to sort all subtasks of the DAG task, specifically based on the embedded representation of all n subtask nodes of the DAG task learned by the directed graph neural network.
顺序解码器基于注意力机制顺序选择子任务结点以生成大小为n的优先级排序π=[π 12,…,π n],它对应一个优先级有序的子任务排列
Figure PCTCN2022142437-appb-000014
且满足:优先级
Figure PCTCN2022142437-appb-000015
The sequential decoder sequentially selects subtask nodes based on the attention mechanism to generate a priority ranking of size n π = [π 1 , π 2 ,..., π n ], which corresponds to a priority-ordered subtask arrangement.
Figure PCTCN2022142437-appb-000014
And satisfy: priority
Figure PCTCN2022142437-appb-000015
本实施例中顺序解码器可以将DAG任务优先级分配形式化描述为如下公式定义的概率分布:In this embodiment, the sequential decoder can formally describe the DAG task priority allocation as a probability distribution defined by the following formula:
Figure PCTCN2022142437-appb-000016
Figure PCTCN2022142437-appb-000016
其中,θ为待优化的网络参数。顺序解码器首先从概率分布p(π 1∣θ)中采样具有最高优先级的子任务结点,然后再从p(π 2∣π 1,θ)中采样次高优先级子任务结点,以此类推,直至采样选出所有子任务结点。 Among them, θ is the network parameter to be optimized. The sequential decoder first samples the subtask node with the highest priority from the probability distribution p(π 1 ∣θ), and then samples the subtask node with the second highest priority from p(π 2 ∣π 1 ,θ), And so on until all subtask nodes are sampled.
在每个时间步τ(τ∈[1,n]),顺序解码器按照如下规则选择子任务结点
Figure PCTCN2022142437-appb-000017
并赋予其优先级(n-τ):
At each time step τ(τ∈[1,n]), the sequential decoder selects subtask nodes according to the following rules
Figure PCTCN2022142437-appb-000017
And give it priority (n-τ):
π τ=argmax(p(π τ∣π 12,…,π τ-1,θ))   (5) π τ =argmax(p(π τ ∣π 12 ,…,π τ-1 ,θ)) (5)
其中,argmax函数用于确定最大值自变量点集;条件概率分布p(π∣π 12,…,π τ-1,θ)按照如下公式计算得到: Among them, the argmax function is used to determine the maximum independent variable point set; the conditional probability distribution p(π∣π 12 ,…,π τ-1 ,θ) is calculated according to the following formula:
Figure PCTCN2022142437-appb-000018
Figure PCTCN2022142437-appb-000018
其中,softmax为归一化指数函数;W θ为待训练的特征变换矩阵;σ为非线性激活函数;att为注意力函数;U τ是时间步τ尚未分配优先级的子任务集合,已分配优先级的子任务集合记为O τ,且满足O τ∪U τ=T,针对一个DAG任务的优先级排序过程中,集合O τ及U τ会根据优先级分配状态实时更新;cont为顺序解码器实时子任务选择的上下文环境,可 以理解的是,公式(6)中,将子任务的向量表示与上下文环境的向量表示进行比较,然后利用注意力机制分配子任务的权重;上下文环境的向量表示计算公式如下: Among them, softmax is the normalized exponential function; W θ is the feature transformation matrix to be trained; σ is the nonlinear activation function; att is the attention function; U τ is the set of subtasks that have not yet been assigned priorities at time step τ. The set of priority subtasks is recorded as O τ , and satisfies O τ ∪U τ =T. During the priority sorting process for a DAG task, the sets O τ and U τ will be updated in real time according to the priority allocation status; cont is the order. The context of the real-time subtask selection of the decoder. It can be understood that in formula (6), the vector representation of the subtask is compared with the vector representation of the context, and then the attention mechanism is used to allocate the weight of the subtask; the context environment The vector representation calculation formula is as follows:
cont=W[cont O;cont U]+b   (7) cont=W[cont O ;cont U ]+b (7)
其中,W表示线性变换,“[;]”表示张量连接符;cont O和cont U分别为O τ和U τ对应的嵌入式表示,它们通过以下公式计算得到: Among them, W represents linear transformation, "[;]" represents tensor connector; cont O and cont U are the embedded representations corresponding to O τ and U τ respectively, which are calculated by the following formula:
Figure PCTCN2022142437-appb-000019
Figure PCTCN2022142437-appb-000019
Figure PCTCN2022142437-appb-000020
Figure PCTCN2022142437-appb-000020
其中,σ为非线性激活函数。由此基于注意力机制构建顺序解码器,将子任务结点的选择问题归结为从条件概率分布中随机选择子结点编号的问题,以根据子任务结点的向量表示,更加准确地确定子任务的优先级排序。Among them, σ is a nonlinear activation function. Thus, a sequential decoder is built based on the attention mechanism, and the selection problem of subtask nodes is reduced to the problem of randomly selecting subnode numbers from the conditional probability distribution, so as to determine the subtask node more accurately based on the vector representation of the subtask node. Prioritization of tasks.
本实施例中,以最小任务调度长度为目标定义网络模型的目标函数,可以包括:以DAG任务在不同时间步下优先级排序对应的任务调度长度和任务调度长度下限为自变量,生成DAG任务的调度长度减速率评价指标;任务调度长度下限根据DAG任务的关键路径的路径长度确定;基于策略梯度算法和调度长度减速率评价指标构建奖励函数;基于奖励函数构建网络模型的目标函数。可以理解的是,DAG任务调度问题可以建模表示为马尔科夫决策过程(MDP,Markov Decision Process);一个基本的MDP通常可以用一个五元组来描述:MDP=(A,S,R,Π,Δ)。In this embodiment, defining the objective function of the network model with the minimum task scheduling length as the goal may include: using the task scheduling length corresponding to the priority ordering of DAG tasks at different time steps and the lower limit of the task scheduling length as independent variables to generate a DAG task The scheduling length deceleration rate evaluation index; the lower limit of the task scheduling length is determined according to the path length of the critical path of the DAG task; the reward function is constructed based on the policy gradient algorithm and the scheduling length deceleration rate evaluation index; the objective function of the network model is constructed based on the reward function. It can be understood that the DAG task scheduling problem can be modeled and expressed as a Markov Decision Process (MDP, Markov Decision Process); a basic MDP can usually be described by a five-tuple: MDP = (A, S, R, Π,Δ).
其中,A是顺序解码器n个时间步动作的集合,在时间步τ(τ∈[1,n])的动作A τ表示顺序解码器从DAG任务中选择子任务结点
Figure PCTCN2022142437-appb-000021
并赋予其优先级(n-τ);S是n个时间步的环境状态集合;时间步τ的环境状态Sτ由DAG任务所有子任务结点的嵌入式表示
Figure PCTCN2022142437-appb-000022
已分配优先级的子任务序列
Figure PCTCN2022142437-appb-000023
以及在指定部分子任务优先级序列前提下DAG任务的调度长度估计Makespan(π,τ)这三个量共同组成;
Among them, A is a set of n time step actions of the sequential decoder. The action A τ at time step τ (τ∈[1,n]) represents the sequential decoder’s selection of subtask nodes from the DAG task.
Figure PCTCN2022142437-appb-000021
And give it priority (n-τ); S is the set of environmental states of n time steps; the environmental state Sτ of time step τ is represented by the embedded representation of all sub-task nodes of the DAG task
Figure PCTCN2022142437-appb-000022
Assigned priority subtask sequence
Figure PCTCN2022142437-appb-000023
And under the premise of specifying the priority sequence of some subtasks, the scheduling length estimate of the DAG task Makespan (π, τ) is composed of these three quantities;
Figure PCTCN2022142437-appb-000024
Right now
Figure PCTCN2022142437-appb-000024
R表示环境即时返回值,用以评估顺序解码器上一个选择动作效果。由于DAG任务调度的目标是最小化任务调度长度,本实施例中基于调度长度减速率评价指标设计一个奖励函数r(π,τ):R represents the immediate return value of the environment, which is used to evaluate the effect of the previous selection action of the sequential decoder. Since the goal of DAG task scheduling is to minimize the task scheduling length, in this embodiment, a reward function r(π,τ) is designed based on the scheduling length deceleration rate evaluation index:
Figure PCTCN2022142437-appb-000025
Figure PCTCN2022142437-appb-000025
在一个由m个计算节点组成的并行计算系统上,
Figure PCTCN2022142437-appb-000026
Figure PCTCN2022142437-appb-000027
表示在时间步τ(τ∈[1,n]),DAG任务优先级排序π对应的任务调度长度估计。其中,Λ(λ critical(π,τ))为根据优先级排序π确定DAG任务关键路径长度,w为DAG所有子任务计算负载之和。Makespan low(DAG)表示DAG任务在并行计算系统上的调度长度下界,具体计算公式如下:
On a parallel computing system composed of m computing nodes,
Figure PCTCN2022142437-appb-000026
Figure PCTCN2022142437-appb-000027
Indicates the task scheduling length estimate corresponding to DAG task priority sorting π at time step τ(τ∈[1,n]). Among them, Λ(λ critical (π,τ)) is the critical path length of the DAG task determined according to the priority sorting π, and w is the sum of the computing load of all subtasks of the DAG. Makespan low (DAG) represents the lower bound of the scheduling length of DAG tasks on a parallel computing system. The specific calculation formula is as follows:
Figure PCTCN2022142437-appb-000028
Figure PCTCN2022142437-appb-000028
其中,λ criical(DAG)表示DAG任务关键路径长度。 Among them, λ critical (DAG) represents the DAG task critical path length.
在MDP中,假设状态转换矩阵Π是确定性的,因为对于给定的状态和动作,下一个状态的确定没有随机性,这是因为调度动作不执行任务,它只影响调度策略并改变任务的排列。最后,设折扣因子Δ为常量1,根据公式(4)和(10),利用基于策略梯度算法,以最大化DAG任务优先级排序π对应的累计奖励的期望为目标,定义网络模型的目标函数J为:In MDP, it is assumed that the state transition matrix Π is deterministic, because for a given state and action, there is no randomness in the determination of the next state. This is because the scheduling action does not execute the task, it only affects the scheduling policy and changes the task. arrangement. Finally, let the discount factor Δ be a constant 1. According to formulas (4) and (10), use the policy gradient algorithm to maximize the expectation of cumulative rewards corresponding to the DAG task priority sorting π to define the objective function of the network model. J is:
Figure PCTCN2022142437-appb-000029
Figure PCTCN2022142437-appb-000029
其中,π~p(τ,θ)表征指定DAG任务优先级顺序为从学习策略中采样得到的。Among them, π~p(τ,θ) indicates that the priority order of the specified DAG tasks is sampled from the learning strategy.
步骤S12:获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵。Step S12: Obtain the DAG task data set, and generate a corresponding information matrix for each DAG task in the DAG task data set.
本实施例中,获取用于模型训练的DAG任务数据集,然后提取DAG任务数据集内每个DAG任务的信息矩阵,包括结点特征矩阵和邻接矩阵。具体的,本实施例中,对DAG任务数据集内每个DAG任务生成对应的信息矩阵,可以包括:根据DAG任务数据集内DAG任务中每个子任务的特征生成结点特征矩阵;结点特征矩阵为表征子任务的计算负载、归一化计算负载以及子任务的入度和出度的特征矩阵;根据DAG任务数据集内不同子任务之间的连接关系生成邻接矩阵;基于结点特征矩阵和邻接矩阵得到DAG任务对应的信息矩阵。In this embodiment, the DAG task data set used for model training is obtained, and then the information matrix of each DAG task in the DAG task data set is extracted, including the node feature matrix and the adjacency matrix. Specifically, in this embodiment, generating a corresponding information matrix for each DAG task in the DAG task data set may include: generating a node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set; node features The matrix is a feature matrix that represents the computing load of the subtask, the normalized computing load, and the in-degree and out-degree of the subtask; an adjacency matrix is generated based on the connection relationship between different subtasks in the DAG task data set; based on the node feature matrix and the adjacency matrix to obtain the information matrix corresponding to the DAG task.
本实施例中,获取DAG任务数据集,可以包括:配置DAG任务参数;DAG任务参数包括任务层数、目标结点的子结点个数、目标结点的子结点生成概率、相邻两个任务层之间连接边添加概率以及各个子任务的计算负载;根据DAG任务参数生成DAG任务以得到DAG任务数据集。因缺乏公开可用的大规模DAG任务数据集,本实施例中首先生成DAG任务,具体可以基于DAG任务参数利用并行任务生成模型获取。如嵌套fork-join任务模型合成DAG任务;该模型由四个参数控制,分别是n depth、n child、p fork和p pert。其中,n depth表示DAG任务层数或者深度;n child表示某个结点的子结点个数;p fork表示为某结点生成子结点的概率;p pert在相邻两层结点之间随机添加连接边的概率。对于第k层中的每个子任务结点ti,其子结点t j和边e ij是基于概率p fork生成的;第k+1层中的子结点数量由均匀分布n child确定,即通过均匀分布确定目标结点下的子结点个数。此过程从入口子任务结点开始并重复n depth次,从而创建具有n depth层的DAG任务。此外,以概率p pert在DAG任务的第k层和第k+1层结点之间随机添加连接边,p pert值越大生成的DAG任务的并行度就越高。最后,添加从最后一层结点到出口结点的边,并为每个子任务分配计算负载。其中,子任务的计算负载服从参数为μ(μ>0)和δ的正态分布,其中,μ表示子任务的平均计算负载,δ表示各个子任务的计算负载的标准差,当然,也可以假设为其它分布,只要保证各个子任务的计算负载为正值即可,此处不做限定。对构造好的DAG任务,提取每个子任务结点的特征并构造结点特征矩阵X;根据结点间互连关系,构造邻接矩阵A。 In this embodiment, obtaining the DAG task data set may include: configuring DAG task parameters; the DAG task parameters include the number of task layers, the number of child nodes of the target node, the generation probability of child nodes of the target node, and the number of adjacent two nodes. The connection edge adding probability between each task layer and the computing load of each sub-task are generated; a DAG task is generated according to the DAG task parameters to obtain a DAG task data set. Due to the lack of publicly available large-scale DAG task data sets, in this embodiment, the DAG task is first generated, which can be obtained using a parallel task generation model based on the DAG task parameters. For example, the nested fork-join task model synthesizes a DAG task; this model is controlled by four parameters, namely n depth , n child , p fork and p pert . Among them, n depth represents the number or depth of DAG task layers; n child represents the number of child nodes of a certain node; p fork represents the probability of generating a child node for a certain node; p pert is between two adjacent nodes. The probability of randomly adding connecting edges. For each subtask node ti in the kth layer, its child node t j and edge e ij are generated based on probability p fork ; the number of child nodes in the k+1th layer is determined by the uniform distribution n child , that is Determine the number of child nodes under the target node through uniform distribution. This process starts from the entry subtask node and is repeated n depth times, thereby creating a DAG task with n depth levels. In addition, connecting edges are randomly added between the k-th layer and k+1-th layer nodes of the DAG task with probability p pert . The larger the value of p pert , the higher the degree of parallelism of the generated DAG task. Finally, add edges from the last layer node to the exit node and assign computational load to each subtask. Among them, the computing load of the subtask obeys the normal distribution with parameters μ (μ>0) and δ, where μ represents the average computing load of the subtask, and δ represents the standard deviation of the computing load of each subtask. Of course, it can also Assuming other distributions, it is sufficient to ensure that the computing load of each subtask is positive, and there is no limit here. For the constructed DAG task, extract the characteristics of each subtask node and construct the node feature matrix X; according to the interconnection relationship between nodes, construct the adjacency matrix A.
步骤S13:利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型。Step S13: Use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model.
本实施例中,构建得到网络模型后,首先进行模型参数初始化。按照特定策略如正态分布随机初始化、Xavier初始化或He Initialization初始化,对有向图神经网络各层参数W进行初始化,并对顺序解码器模型参数p(θ)进行初始化。In this embodiment, after the network model is constructed, the model parameters are first initialized. According to a specific strategy such as normal distribution random initialization, Xavier initialization or He Initialization initialization, the parameters W of each layer of the directed graph neural network are initialized, and the sequential decoder model parameter p(θ) is initialized.
然后,利用上述DAG任务数据集对应的信息矩阵对网络模型进行训练,其中,得到上述DAG任务数据集后,将上述DAG任务数据集划分为训练集和测试集,具体可以采用交叉验证法、留出法或留一法等划分方法,其中,测试集用于对上述网络模型进行训练,测试集用于对训练后的网络模型进行测试。Then, use the information matrix corresponding to the above-mentioned DAG task data set to train the network model. After obtaining the above-mentioned DAG task data set, the above-mentioned DAG task data set is divided into a training set and a test set. Specifically, the cross-validation method and retention method can be used. Partitioning methods such as the leave-one-out method or the leave-one-out method are used, in which the test set is used to train the above-mentioned network model, and the test set is used to test the trained network model.
本实施例中,利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,可以包括:In this embodiment, the information matrix is used to train the network model, and reinforcement learning is used to update the model parameters of the network model according to the objective function, which may include:
S130:将信息矩阵输入至网络模型,利用有向图神经网络根据子任务的特征和子任务间依赖关系输出得到每个子任务的向量表示;S130: Input the information matrix into the network model, and use the directed graph neural network to output the vector representation of each subtask based on the characteristics of the subtask and the dependency relationship between the subtasks;
S131:利用顺序解码器,根据子任务的向量表示基于注意力机制和DAG任务的上下文环境对DAG任务内的子任务进行优先级排序;S131: Use the sequential decoder to prioritize the subtasks within the DAG task based on the attention mechanism and the context of the DAG task according to the vector representation of the subtask;
S132:根据优先级排序利用DAG任务调度模拟器计算DAG任务的任务调度长度;S132: Use the DAG task scheduling simulator to calculate the task scheduling length of the DAG task according to priority sorting;
S133:根据任务调度长度和目标函数,利用强化学习更新网络模型的模型参数,直至网络模型收敛。S133: Based on the task scheduling length and objective function, use reinforcement learning to update the model parameters of the network model until the network model converges.
即以信息矩阵中包含的结点特征矩阵和邻接矩阵作为网络模型的输入,进行前向传播,经由有向图神经网络获取所有子任务的向量表示,顺序解码器输出子任务优先级排序,可以通过DAG任务模拟调度器按序调度子任务执行,并计算相应调度长度,然后按照公式(12)计算模型目标函数值;并按照一定策略如随机梯度下降或者Adam等算法,反向传播修正各层网络参数值。由此,利用强化学习算法,以最小化DAG任务调度长度为目标,通过奖励调度长度更短的DAG任务优先级排序,持续优化网络模型;因而得到的调度长度更短,并行计算效率更高。可有效避免难以为DAG任务的最佳优先级分配收集足够多监督标签的困难。That is, the node feature matrix and adjacency matrix contained in the information matrix are used as the input of the network model, forward propagation is performed, the vector representation of all subtasks is obtained through the directed graph neural network, and the sequential decoder outputs the subtask priority ranking, which can Schedule the execution of subtasks in sequence through the DAG task simulation scheduler, and calculate the corresponding scheduling length, and then calculate the model objective function value according to formula (12); and according to a certain strategy such as stochastic gradient descent or Adam and other algorithms, back propagation corrects each layer Network parameter values. Therefore, the reinforcement learning algorithm is used to minimize the DAG task scheduling length, and the network model is continuously optimized by rewarding DAG task priority ranking with shorter scheduling length; thus, the resulting scheduling length is shorter and the parallel computing efficiency is higher. It can effectively avoid the difficulty of collecting enough supervision labels for optimal priority allocation of DAG tasks.
具体的,网络模型训练,通过对公式(11)定义的目标函数J求关于θ的梯度:Specifically, the network model is trained by finding the gradient about θ for the objective function J defined by formula (11):
Figure PCTCN2022142437-appb-000030
Figure PCTCN2022142437-appb-000030
其中,
Figure PCTCN2022142437-appb-000031
为梯度算子。公式(12)中的模型梯度可以用蒙特卡洛随机梯度下降法来估计:
in,
Figure PCTCN2022142437-appb-000031
is the gradient operator. The model gradient in Equation (12) can be estimated using the Monte Carlo stochastic gradient descent method:
Figure PCTCN2022142437-appb-000032
Figure PCTCN2022142437-appb-000032
其中,B表示由数据集中随机采样得到的DAG任务的子任务的集合。利用随机梯度下降法或Adam算法优化目标函数,待目标函数值不再减少或者达到最大迭代次数,模型训练终止,此时得到的调度方案即为最佳调度方案。即基于蒙特卡洛随机梯度下降法估计目标函数梯度。由此,本实施例中基于有向图神经网络和目标函数实现对DAG任务的深度强化学习。其中,DAG任务调度长度可以通过DAG任务调度模拟器按照排列依次调度所有子任务到并行计算系统ARC上并行执行,并记录出口任务的完成时间,得到DAG任务调度长度。Among them, B represents the set of subtasks of the DAG task randomly sampled from the data set. Use stochastic gradient descent method or Adam algorithm to optimize the objective function. When the objective function value no longer decreases or reaches the maximum number of iterations, the model training is terminated. The scheduling plan obtained at this time is the optimal scheduling plan. That is, the gradient of the objective function is estimated based on the Monte Carlo stochastic gradient descent method. Therefore, in this embodiment, deep reinforcement learning for the DAG task is implemented based on the directed graph neural network and the objective function. Among them, the DAG task scheduling length can be obtained by scheduling all subtasks in sequence through the DAG task scheduling simulator to be executed in parallel on the parallel computing system ARC, and recording the completion time of the exit task to obtain the DAG task scheduling length.
步骤S14:利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。Step S14: Use the DAG task scheduling model to determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order.
本实施例中,训练得到DAG任务调度模型后,将待执行DAG任务的结点特征矩阵和邻接矩阵输入至该模型,将模型求得的最佳DAG任务调度顺序作为结果进行输出,并根据调度顺序利用并行计算系统执行待执行DAG任务。由此,针对DAG任务的非抢占式调度问题,本实 施例基于深度强化学习和有向图神经网络进行任务优先级排序,进而确定任务的调度顺序,减少任务的执行时间,提高任务的执行效率。In this embodiment, after the DAG task scheduling model is trained, the node feature matrix and adjacency matrix of the DAG task to be executed are input to the model, and the optimal DAG task scheduling sequence obtained by the model is output as the result, and according to the scheduling Sequentially utilize the parallel computing system to execute the DAG tasks to be executed. Therefore, for the non-preemptive scheduling problem of DAG tasks, this embodiment is based on deep reinforcement learning and directed graph neural network to prioritize tasks, thereby determining the scheduling order of tasks, reducing task execution time, and improving task execution efficiency. .
本实施例中还提出一种基于深度强化学习和有向图神经网络的DAG任务调度系统。如图2所示,该系统由输入模块、有向图神经网络、顺序解码器、调度长度计算模块以及模型参数更新模块组成。输入模块负责读取DAG任务的结点特征矩阵X和邻接矩阵A;有向图神经网络以X和A为输入,识别DAG任务的执行时间和依赖关系,学习子任务的嵌入式表示;这些嵌入式表示经由顺序解码器解码,输出得到所有子任务的优先级排序;调度长度计算模块根据该排序,调度它们在并行计算系统上执行,并将调度长度作为反馈信号,利用强化学习算法来更新模型参数。由此,上述基于深度强化学习和有向图神经网络的DAG任务调度系统以DAG任务为输入,通过有向图神经网络为DAG任务的每个子任务生成嵌入式表示,利用顺序解码器产生所有子任务的优先级排序,并计算该排序对应的任务调度长度或完工时间。系统以最小化DAG任务的调度长度为目标,计算出的调度长度被用作奖励信号以通过强化学习算法来更新模型。This embodiment also proposes a DAG task scheduling system based on deep reinforcement learning and directed graph neural network. As shown in Figure 2, the system consists of an input module, a directed graph neural network, a sequential decoder, a scheduling length calculation module and a model parameter update module. The input module is responsible for reading the node feature matrix The expression is decoded by the sequential decoder, and the output is the priority ordering of all subtasks; the scheduling length calculation module schedules them to be executed on the parallel computing system based on this ordering, and uses the scheduling length as a feedback signal to update the model using a reinforcement learning algorithm. parameter. Therefore, the above-mentioned DAG task scheduling system based on deep reinforcement learning and directed graph neural network takes the DAG task as input, generates an embedded representation for each sub-task of the DAG task through the directed graph neural network, and uses the sequential decoder to generate all sub-tasks. Prioritize tasks and calculate the task scheduling length or completion time corresponding to this ranking. The system aims to minimize the scheduling length of DAG tasks, and the calculated scheduling length is used as a reward signal to update the model through a reinforcement learning algorithm.
由上可见,本实施例中按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。本申请中基于有向图神经网络和强化学习得到DAG任务调度模型,有向图神经网络能自动识别DAG任务内子任务相关的丰富特征,顺序解码器能够利用这些特征对子任务进行任务优先级排序,同时,利用强化学习优化模型实现最小化DAG任务调度长度的调度目标,能够缩短DAG任务调度长度,提高DAG任务并行执行效率,并且利用强化学习能够解决难以为DAG任务的最佳优先级分配收集足够多监督标签的问题。As can be seen from the above, in this embodiment, the network model is constructed in the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and the DAG task data is Each DAG task in the set generates a corresponding information matrix; use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model Determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order. In this application, a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning. The directed graph neural network can automatically identify rich features related to subtasks within the DAG task, and the sequential decoder can use these features to prioritize subtasks. , At the same time, using the reinforcement learning optimization model to achieve the scheduling goal of minimizing the DAG task scheduling length can shorten the DAG task scheduling length, improve the parallel execution efficiency of DAG tasks, and use reinforcement learning to solve the difficulty of allocating optimal priorities for DAG tasks. Enough supervision labeling issues.
相应的,本申请实施例还公开了一种DAG任务调度装置,参见图5所示,该装置包括:Correspondingly, the embodiment of the present application also discloses a DAG task scheduling device, as shown in Figure 5. The device includes:
网络构建模块11,用于按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;The network building module 11 is used to build a network model in the order of a directed graph neural network and a sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
数据集获取模块12,用于获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;The data set acquisition module 12 is used to obtain the DAG task data set and generate a corresponding information matrix for each DAG task in the DAG task data set;
训练模块13,用于利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;The training module 13 is used to train the network model using the information matrix, and update the model parameters of the network model using reinforcement learning according to the objective function to obtain the trained DAG task scheduling model;
调度顺序确定模块14,用于利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。The scheduling order determination module 14 is used to use the DAG task scheduling model to determine the scheduling order of subtasks within the DAG task to be executed, and to use the parallel computing system to execute the DAG task to be executed according to the scheduling order.
由上可见,本实施例中按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义网络模型的目标函数;获取DAG任务数据集,并对DAG任务数据集内每个DAG任务生成对应的信息矩阵;利用信息矩阵对网络模型进行训练,并根据目标函数利用强化学习更新网络模型的模型参数,以得到训练后的DAG任务调度模型;利用DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据调度顺序利用并行计算系统执行待执行DAG任务。本申请中基于有向图神经网络和强化学习得到DAG任务调度模型,有向 图神经网络能自动识别DAG任务内子任务相关的丰富特征,顺序解码器能够利用这些特征对子任务进行任务优先级排序,同时,利用强化学习优化模型实现最小化DAG任务调度长度的调度目标,能够缩短DAG任务调度长度,提高DAG任务并行执行效率,并且利用强化学习能够解决难以为DAG任务的最佳优先级分配收集足够多监督标签的问题。As can be seen from the above, in this embodiment, the network model is constructed in the order of directed graph neural network and sequential decoder, and the objective function of the network model is defined with the minimum task scheduling length as the goal; the DAG task data set is obtained, and the DAG task data is Each DAG task in the set generates a corresponding information matrix; use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain the trained DAG task scheduling model; use the DAG task scheduling model Determine the scheduling order of subtasks within the DAG task to be executed, and use the parallel computing system to execute the DAG task to be executed according to the scheduling order. In this application, a DAG task scheduling model is obtained based on directed graph neural network and reinforcement learning. The directed graph neural network can automatically identify rich features related to subtasks within the DAG task, and the sequential decoder can use these features to prioritize subtasks. , At the same time, using the reinforcement learning optimization model to achieve the scheduling goal of minimizing the DAG task scheduling length can shorten the DAG task scheduling length, improve the parallel execution efficiency of DAG tasks, and use reinforcement learning to solve the difficulty of allocating optimal priorities for DAG tasks. Enough supervision labeling issues.
在一些具体实施例中,DAG任务调度装置具体可以包括:In some specific embodiments, the DAG task scheduling device may include:
图卷积层构建单元,用于基于聚合函数和非线性激活函数构建用于DAG任务特征学习的图卷积层;The graph convolution layer building unit is used to construct a graph convolution layer for DAG task feature learning based on aggregation functions and non-linear activation functions;
有向图神经网络构建单元,用于按照输入层、K层图卷积层、输出层的顺序构建得到有向图神经网络。The directed graph neural network construction unit is used to construct the directed graph neural network in the order of the input layer, K-layer graph convolution layer, and output layer.
在一些具体实施例中,DAG任务调度装置具体可以包括:In some specific embodiments, the DAG task scheduling device may include:
向量表达式定义单元,用于以DAG任务内子任务的优先级分配状态为变量,为DAG任务定义上下文环境的向量表达式;The vector expression definition unit is used to use the priority allocation status of subtasks within the DAG task as a variable to define a vector expression of the context environment for the DAG task;
顺序解码器构建单元,用于基于注意力机制和上下文环境的向量表达式构建用于优先级排序的顺序解码器,以得到解码器。The sequential decoder building unit is used to build a sequential decoder for prioritization based on the vector expression of the attention mechanism and the context environment to obtain the decoder.
在一些具体实施例中,网络构建模块11具体可以包括:In some specific embodiments, the network building module 11 may specifically include:
调度长度减速率评价指标构建单元,用于以DAG任务在不同时间步下优先级排序对应的任务调度长度和任务调度长度下限为自变量,生成DAG任务的调度长度减速率评价指标;任务调度长度下限根据DAG任务的关键路径的路径长度确定;The scheduling length deceleration rate evaluation index construction unit is used to generate the scheduling length deceleration rate evaluation index of the DAG task using the task scheduling length corresponding to the priority sorting of the DAG task at different time steps and the lower limit of the task scheduling length as independent variables; the task scheduling length The lower limit is determined based on the path length of the critical path of the DAG task;
奖励函数构建单元,用于基于策略梯度算法和调度长度减速率评价指标构建奖励函数;The reward function construction unit is used to construct the reward function based on the policy gradient algorithm and the scheduling length deceleration rate evaluation index;
目标函数构建单元,用于基于奖励函数构建网络模型的目标函数。The objective function building unit is used to build the objective function of the network model based on the reward function.
在一些具体实施例中,数据集获取模块12具体可以包括:In some specific embodiments, the data set acquisition module 12 may include:
任务参数配置单元,用于配置DAG任务参数;DAG任务参数包括任务层数、目标结点的子结点个数、目标结点的子结点生成概率、相邻两个任务层之间连接边添加概率以及各个子任务的计算负载;Task parameter configuration unit, used to configure DAG task parameters; DAG task parameters include the number of task layers, the number of child nodes of the target node, the generation probability of child nodes of the target node, and the connecting edges between two adjacent task layers. Add probabilities and computational loads for individual subtasks;
任务生成单元,用于根据DAG任务参数生成DAG任务以得到DAG任务数据集。A task generation unit is used to generate a DAG task according to the DAG task parameters to obtain a DAG task data set.
在一些具体实施例中,数据集获取模块12具体可以包括:In some specific embodiments, the data set acquisition module 12 may include:
结点特征矩阵生成单元,用于根据DAG任务数据集内DAG任务中每个子任务的特征生成结点特征矩阵;The node feature matrix generation unit is used to generate the node feature matrix based on the characteristics of each sub-task in the DAG task in the DAG task data set;
邻接矩阵生成单元,用于根据DAG任务数据集内不同子任务之间的连接关系生成邻接矩阵;The adjacency matrix generation unit is used to generate an adjacency matrix based on the connection relationship between different subtasks in the DAG task data set;
信息矩阵确定单元,用于基于结点特征矩阵和邻接矩阵得到DAG任务对应的信息矩阵。The information matrix determination unit is used to obtain the information matrix corresponding to the DAG task based on the node feature matrix and the adjacency matrix.
在一些具体实施例中,训练模块13具体可以包括:In some specific embodiments, the training module 13 may specifically include:
向量表示确定单元,用于将信息矩阵输入至网络模型,利用有向图神经网络根据子任务的特征和子任务间依赖关系输出得到每个子任务的向量表示;The vector representation determination unit is used to input the information matrix into the network model, and uses the directed graph neural network to output the vector representation of each subtask based on the characteristics of the subtask and the dependency relationship between the subtasks;
优先级排序确定单元,用于利用顺序解码器,根据子任务的向量表示基于注意力机制和DAG任务的上下文环境对DAG任务内的子任务进行优先级排序;The priority ranking determination unit is used to use the sequential decoder to prioritize the subtasks within the DAG task based on the vector representation of the subtasks based on the attention mechanism and the context environment of the DAG task;
任务调度长度确定单元,用于根据优先级排序利用DAG任务调度模拟器计算DAG任务的任务调度长度;A task scheduling length determination unit, configured to use the DAG task scheduling simulator to calculate the task scheduling length of the DAG task according to priority ranking;
模型优化单元,用于根据任务调度长度和目标函数,利用强化学习更新网络模型的模型参数,直至网络模型收敛。The model optimization unit is used to update the model parameters of the network model using reinforcement learning according to the task scheduling length and objective function until the network model converges.
进一步的,本申请实施例还公开了一种电子设备,参见图6所示,图中的内容不能被认为是对本申请的使用范围的任何限制。Furthermore, the embodiment of the present application also discloses an electronic device, as shown in FIG. 6 . The content in the figure cannot be considered as any limitation on the scope of the present application.
图6为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,存储器22用于存储计算机程序,计算机程序由处理器21加载并执行,以实现前述任一实施例公开的DAG任务调度方法中的相关步骤。FIG. 6 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input-output interface 25 and a communication bus 26. The memory 22 is used to store computer programs, and the computer programs are loaded and executed by the processor 21 to implement relevant steps in the DAG task scheduling method disclosed in any of the foregoing embodiments.
本实施例中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。In this embodiment, the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here No specific limitation is made.
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统221、计算机程序222及包括DAG任务在内的数据223等,存储方式可以是短暂存储或者永久存储。In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk, an optical disk, etc. The resources stored thereon include an operating system 221, a computer program 222, and data 223 including DAG tasks. Storage The method can be temporary storage or permanent storage.
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的DAG任务调度方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。Among them, the operating system 221 is used to manage and control each hardware device and the computer program 222 on the electronic device 20 to realize the calculation and processing of the massive data 223 in the memory 22 by the processor 21. It can be Windows Server, Netware, Unix, Linux etc. In addition to computer programs that can be used to complete the DAG task scheduling method executed by the electronic device 20 disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs that can be used to complete other specific tasks.
进一步的,本申请实施例还公开了一种非易失性可读存储介质,非易失性可读存储介质中存储有计算机可执行指令,计算机可执行指令被处理器加载并执行时,实现前述任一实施例公开的DAG任务调度方法步骤。Further, embodiments of the present application also disclose a non-volatile readable storage medium. Computer-executable instructions are stored in the non-volatile readable storage medium. When the computer-executable instructions are loaded and executed by the processor, the The steps of the DAG task scheduling method disclosed in any of the foregoing embodiments.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设 备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
以上对本申请所提供的一种DAG任务调度方法、装置、设备及介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The above is a detailed introduction to a DAG task scheduling method, device, equipment and medium provided by this application. Specific examples are used in this article to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for assistance. Understand the methods and core ideas of this application; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation methods and application scope based on the ideas of this application. In summary, the content of this specification does not should be understood as a limitation on this application.

Claims (20)

  1. 一种DAG任务调度方法,其特征在于,包括:A DAG task scheduling method, characterized by including:
    按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义所述网络模型的目标函数;Construct a network model in the order of directed graph neural network and sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
    获取DAG任务数据集,并对所述DAG任务数据集内每个所述DAG任务生成对应的信息矩阵;Obtain a DAG task data set, and generate a corresponding information matrix for each DAG task in the DAG task data set;
    利用所述信息矩阵对所述网络模型进行训练,并根据所述目标函数利用强化学习更新所述网络模型的模型参数,以得到训练后的DAG任务调度模型;Use the information matrix to train the network model, and use reinforcement learning to update the model parameters of the network model according to the objective function to obtain a trained DAG task scheduling model;
    利用所述DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据所述调度顺序利用并行计算系统执行所述待执行DAG任务。The DAG task scheduling model is used to determine the scheduling order of subtasks within the DAG task to be executed, and a parallel computing system is used to execute the DAG task to be executed according to the scheduling order.
  2. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还包括:The DAG task scheduling method according to claim 1, characterized in that before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
    基于聚合函数和非线性激活函数构建用于DAG任务特征学习的图卷积层;Construct a graph convolution layer for DAG task feature learning based on aggregation functions and nonlinear activation functions;
    按照输入层、K层图卷积层、输出层的顺序构建得到所述有向图神经网络。The directed graph neural network is constructed in the order of input layer, K-layer graph convolution layer and output layer.
  3. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述按照有向图神经网络、顺序解码器的顺序构建网络模型之前,还包括:The DAG task scheduling method according to claim 1, characterized in that before constructing the network model in the order of directed graph neural network and sequential decoder, it also includes:
    以DAG任务内子任务的优先级分配状态为变量,为所述DAG任务定义上下文环境的向量表达式;Using the priority allocation status of subtasks within the DAG task as a variable, define a vector expression of the context environment for the DAG task;
    基于注意力机制和所述上下文环境的向量表达式构建用于优先级排序的顺序解码器。A sequential decoder for prioritization is built based on the attention mechanism and the vector representation of the context.
  4. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述以最小任务调度长度为目标定义所述网络模型的目标函数,包括:The DAG task scheduling method according to claim 1, wherein defining the objective function of the network model with the minimum task scheduling length as the target includes:
    以DAG任务在不同时间步下优先级排序对应的任务调度长度和任务调度长度下限为自变量,生成DAG任务的调度长度减速率评价指标;所述任务调度长度下限根据DAG任务的关键路径的路径长度确定;Using the task scheduling length corresponding to the priority sorting of DAG tasks at different time steps and the lower limit of task scheduling length as independent variables, the scheduling length deceleration rate evaluation index of the DAG task is generated; the lower limit of the task scheduling length is based on the path of the critical path of the DAG task The length is determined;
    基于策略梯度算法和所述调度长度减速率评价指标构建奖励函数;Construct a reward function based on the policy gradient algorithm and the scheduling length deceleration rate evaluation index;
    基于所述奖励函数构建所述网络模型的目标函数。An objective function of the network model is constructed based on the reward function.
  5. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述获取DAG任务数据集,包括:The DAG task scheduling method according to claim 1, characterized in that said obtaining the DAG task data set includes:
    配置DAG任务参数;所述DAG任务参数包括任务层数、目标结点的子结点个数、目标结点的子结点生成概率、相邻两个任务层之间连接边添加概率以及各个子任务的计算负载;Configure DAG task parameters; the DAG task parameters include the number of task layers, the number of child nodes of the target node, the probability of generating child nodes of the target node, the probability of adding a connecting edge between two adjacent task layers, and the probability of each child node. The computational load of the task;
    根据所述DAG任务参数生成DAG任务以得到所述DAG任务数据集。Generate a DAG task according to the DAG task parameters to obtain the DAG task data set.
  6. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述对所述DAG任务数据集内每个所述DAG任务生成对应的信息矩阵,包括:The DAG task scheduling method according to claim 1, characterized in that generating a corresponding information matrix for each DAG task in the DAG task data set includes:
    根据所述DAG任务数据集内所述DAG任务中每个子任务的特征生成结点特征矩阵;Generate a node feature matrix according to the characteristics of each subtask in the DAG task in the DAG task data set;
    根据所述DAG任务数据集内不同子任务之间的连接关系生成邻接矩阵;Generate an adjacency matrix according to the connection relationship between different subtasks in the DAG task data set;
    基于所述结点特征矩阵和所述邻接矩阵得到所述DAG任务对应的信息矩阵。The information matrix corresponding to the DAG task is obtained based on the node feature matrix and the adjacency matrix.
  7. 根据权利要求1至6任一项所述的DAG任务调度方法,其特征在于,所述利用所述信息矩阵对所述网络模型进行训练,并根据所述目标函数利用强化学习更新所述网络 模型的模型参数,包括:The DAG task scheduling method according to any one of claims 1 to 6, characterized in that the network model is trained using the information matrix, and the network model is updated using reinforcement learning according to the objective function. model parameters, including:
    将所述信息矩阵输入至所述网络模型,利用所述有向图神经网络根据所述子任务的特征和子任务间依赖关系输出得到每个子任务的向量表示;Input the information matrix into the network model, and use the directed graph neural network to output a vector representation of each subtask based on the characteristics of the subtask and the dependency relationship between the subtasks;
    利用所述顺序解码器,根据所述子任务的向量表示基于注意力机制和所述DAG任务的上下文环境对所述DAG任务内的子任务进行优先级排序;Using the sequential decoder, prioritize the subtasks within the DAG task based on the attention mechanism and the context of the DAG task according to the vector representation of the subtask;
    根据所述优先级排序利用DAG任务调度模拟器计算所述DAG任务的任务调度长度;Use a DAG task scheduling simulator to calculate the task scheduling length of the DAG task according to the priority ranking;
    根据所述任务调度长度和所述目标函数,利用强化学习更新所述网络模型的模型参数,直至所述网络模型收敛。According to the task scheduling length and the objective function, reinforcement learning is used to update the model parameters of the network model until the network model converges.
  8. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述利用所述DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,包括:The DAG task scheduling method according to claim 1, characterized in that using the DAG task scheduling model to determine the scheduling order of subtasks within the DAG task to be executed includes:
    将待执行DAG任务的结点特征矩阵和邻接矩阵输入至所述DAG任务调度模型,得到所述调度顺序。The node feature matrix and adjacency matrix of the DAG task to be executed are input into the DAG task scheduling model to obtain the scheduling sequence.
  9. 根据权利要求4所述的DAG任务调度方法,其特征在于,所述关键路径为路径长度最长的完整路径。The DAG task scheduling method according to claim 4, wherein the critical path is the complete path with the longest path length.
  10. 根据权利要求7所述的DAG任务调度方法,其特征在于,所述利用强化学习更新所述网络模型的模型参数包括:The DAG task scheduling method according to claim 7, wherein the using reinforcement learning to update the model parameters of the network model includes:
    以最小化所述任务调度长度为目标,利用强化学习算法,并通过奖励调度长度更短的所述优先级排序,持续优化所述网络模型。With the goal of minimizing the task scheduling length, a reinforcement learning algorithm is used, and the network model is continuously optimized by rewarding the priority sorting with a shorter scheduling length.
  11. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述有向图神经网络用于识别所述DAG任务内子任务的任务特征并输出每个子任务对应的嵌入式表示。The DAG task scheduling method according to claim 1, wherein the directed graph neural network is used to identify task characteristics of subtasks within the DAG task and output an embedded representation corresponding to each subtask.
  12. 根据权利要求11所述的DAG任务调度方法,其特征在于,所述任务特征包括执行时间和依赖关系。The DAG task scheduling method according to claim 11, wherein the task characteristics include execution time and dependencies.
  13. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述目标函数用于指导所述网络模型的学习,以使所述网络模型根据输入的所述DAG任务输出所述DAG任务的最小任务调度长度。The DAG task scheduling method according to claim 1, wherein the objective function is used to guide the learning of the network model, so that the network model outputs the minimum value of the DAG task according to the input DAG task. Task scheduling length.
  14. 根据权利要求1所述的DAG任务调度方法,其特征在于,所述并行计算系统描述为一个四元组ARC=(P,L,V,B),其中:P={p i∣i=1,2,….,m}是处理节点集合;L={l ij∣p i,p j∈P}是处理节点之间通信链路集合;V={v i∣i=1,2,….,m}是处理节点的计算速度集合,v i表示p i的计算速度,且满足v 1≤v 2≤…≤v k;B={b ij∣l ij∈L}是通信链路带宽的集合,b ij表示通信链路l ij的带宽。 The DAG task scheduling method according to claim 1, characterized in that the parallel computing system is described as a four-tuple ARC=(P, L, V, B), where: P={p i |i=1 ,2,….,m} is the set of processing nodes; L={l ij |p i ,p j ∈P} is the set of communication links between processing nodes; V={v i ∣i=1,2,… .,m} is the calculation speed set of the processing node, vi represents the calculation speed of p i , and satisfies v 1 ≤ v 2 ≤...≤v k ; B={b ij ∣l ij ∈L} is the communication link bandwidth The set of , b ij represents the bandwidth of communication link l ij .
  15. 根据权利要求2所述的DAG任务调度方法,其特征在于,所述k层图卷积层的图卷积操作由aggregate函数和update函数实现,如下:The DAG task scheduling method according to claim 2, characterized in that the graph convolution operation of the k-layer graph convolution layer is implemented by an aggregate function and an update function, as follows:
    Figure PCTCN2022142437-appb-100001
    Figure PCTCN2022142437-appb-100001
    Figure PCTCN2022142437-appb-100002
    Figure PCTCN2022142437-appb-100002
    其中,aggregate函数聚合来自子任务t i的直接前驱发来的消息,update函数对聚合后的消息执行非线性变换,Pred(t i)为t i的直接前驱子任务集。 Among them, the aggregate function aggregates messages from the immediate predecessors of subtask ti , the update function performs nonlinear transformation on the aggregated messages, and Pred(t i ) is the set of direct predecessor subtasks of ti .
  16. 根据权利要求5所述的DAG任务调度方法,其特征在于,所述根据所述DAG任务参数生成DAG任务以得到所述DAG任务数据集包括:The DAG task scheduling method according to claim 5, wherein generating a DAG task according to the DAG task parameters to obtain the DAG task data set includes:
    基于所述DAG任务参数,利用并行任务生成模型获取DAG任务,以得到所述DAG任务数据集。Based on the DAG task parameters, a parallel task generation model is used to obtain the DAG task to obtain the DAG task data set.
  17. 根据权利要求7所述的DAG任务调度方法,其特征在于,所述利用所述顺序解码器,根据所述子任务的向量表示基于注意力机制和所述DAG任务的上下文环境对所述DAG任务内的子任务进行优先级排序包括:The DAG task scheduling method according to claim 7, characterized in that, using the sequential decoder, according to the vector representation of the sub-task, the DAG task is evaluated based on an attention mechanism and the context environment of the DAG task. The subtasks within are prioritized including:
    利用所述顺序解码器,根据所述子任务的向量表示基于注意力机制选择子任务结点,以生成大小为n的优先级排序π=[π 12,…,π n]。 Using the sequential decoder, subtask nodes are selected based on the attention mechanism according to the vector representation of the subtask to generate a priority ranking π=[π 1 , π 2 ,..., π n ] of size n.
  18. 一种DAG任务调度装置,其特征在于,包括:A DAG task scheduling device, characterized by including:
    网络构建模块,用于按照有向图神经网络、顺序解码器的顺序构建网络模型,并以最小任务调度长度为目标定义所述网络模型的目标函数;A network building module, used to build a network model in the order of a directed graph neural network and a sequential decoder, and define the objective function of the network model with the minimum task scheduling length as the goal;
    数据集获取模块,用于获取DAG任务数据集,并对所述DAG任务数据集内每个所述DAG任务生成对应的信息矩阵;A data set acquisition module, used to obtain a DAG task data set, and generate a corresponding information matrix for each DAG task in the DAG task data set;
    训练模块,用于利用所述信息矩阵对所述网络模型进行训练,并根据所述目标函数利用强化学习更新所述网络模型的模型参数,以得到训练后的DAG任务调度模型;A training module used to train the network model using the information matrix, and update the model parameters of the network model using reinforcement learning according to the objective function to obtain a trained DAG task scheduling model;
    调度顺序确定模块,用于利用所述DAG任务调度模型确定待执行DAG任务内子任务的调度顺序,并根据所述调度顺序利用并行计算系统执行所述待执行DAG任务。A scheduling sequence determination module is configured to use the DAG task scheduling model to determine the scheduling sequence of subtasks within the DAG task to be executed, and to use a parallel computing system to execute the DAG task to be executed according to the scheduling sequence.
  19. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    存储器,用于保存计算机程序;Memory, used to hold computer programs;
    处理器,用于执行所述计算机程序,以实现如权利要求1至17任一项所述的DAG任务调度方法。A processor, configured to execute the computer program to implement the DAG task scheduling method according to any one of claims 1 to 17.
  20. 一种非易失性可读存储介质,其特征在于,用于存储计算机程序;其中计算机程序被处理器执行时实现如权利要求1至17任一项所述的DAG任务调度方法。A non-volatile readable storage medium, characterized in that it is used to store a computer program; wherein when the computer program is executed by a processor, the DAG task scheduling method according to any one of claims 1 to 17 is implemented.
PCT/CN2022/142437 2022-06-15 2022-12-27 Dag task scheduling method and apparatus, device, and storage medium WO2023241000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210671115.8A CN114756358B (en) 2022-06-15 2022-06-15 DAG task scheduling method, device, equipment and storage medium
CN202210671115.8 2022-06-15

Publications (1)

Publication Number Publication Date
WO2023241000A1 true WO2023241000A1 (en) 2023-12-21

Family

ID=82337171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142437 WO2023241000A1 (en) 2022-06-15 2022-12-27 Dag task scheduling method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114756358B (en)
WO (1) WO2023241000A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system
CN117648174A (en) * 2024-01-29 2024-03-05 华北电力大学 Cloud computing heterogeneous task scheduling and container management method based on artificial intelligence

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756358B (en) * 2022-06-15 2022-11-04 苏州浪潮智能科技有限公司 DAG task scheduling method, device, equipment and storage medium
CN116151315B (en) * 2023-04-04 2023-08-15 之江实验室 Attention network scheduling optimization method and device for on-chip system
CN116739090B (en) * 2023-05-12 2023-11-28 北京大学 Deep neural network reasoning measurement method and device based on Web browser
CN116755397B (en) * 2023-05-26 2024-01-23 北京航空航天大学 Multi-machine collaborative task scheduling method based on graph convolution strategy gradient
CN116880994B (en) * 2023-09-07 2023-12-12 之江实验室 Multiprocessor task scheduling method, device and equipment based on dynamic DAG
CN116974729B (en) * 2023-09-22 2024-02-09 浪潮(北京)电子信息产业有限公司 Task scheduling method and device for big data job, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
US20200042856A1 (en) * 2018-07-31 2020-02-06 International Business Machines Corporation Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit
CN111756653A (en) * 2020-06-04 2020-10-09 北京理工大学 Multi-coflow scheduling method based on deep reinforcement learning of graph neural network
CN114625517A (en) * 2022-04-13 2022-06-14 北京赛博云睿智能科技有限公司 DAG graph computation distributed big data workflow task scheduling platform
CN114756358A (en) * 2022-06-15 2022-07-15 苏州浪潮智能科技有限公司 DAG task scheduling method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220100763A1 (en) * 2020-09-30 2022-03-31 Microsoft Technology Licensing, Llc Optimizing job runtimes via prediction-based token allocation
CN113127169B (en) * 2021-04-07 2023-05-02 中山大学 Efficient link scheduling method for dynamic workflow in data center network
CN114327925A (en) * 2021-09-30 2022-04-12 国网山东省电力公司营销服务中心(计量中心) Power data real-time calculation scheduling optimization method and system
CN114239711A (en) * 2021-12-06 2022-03-25 中国人民解放军国防科技大学 Node classification method based on heterogeneous information network small-sample learning
CN114422381B (en) * 2021-12-14 2023-05-26 西安电子科技大学 Communication network traffic prediction method, system, storage medium and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228314A (en) * 2016-08-11 2016-12-14 电子科技大学 The workflow schedule method of study is strengthened based on the degree of depth
US20200042856A1 (en) * 2018-07-31 2020-02-06 International Business Machines Corporation Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit
CN111756653A (en) * 2020-06-04 2020-10-09 北京理工大学 Multi-coflow scheduling method based on deep reinforcement learning of graph neural network
CN114625517A (en) * 2022-04-13 2022-06-14 北京赛博云睿智能科技有限公司 DAG graph computation distributed big data workflow task scheduling platform
CN114756358A (en) * 2022-06-15 2022-07-15 苏州浪潮智能科技有限公司 DAG task scheduling method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system
CN117555306B (en) * 2024-01-11 2024-04-05 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system
CN117648174A (en) * 2024-01-29 2024-03-05 华北电力大学 Cloud computing heterogeneous task scheduling and container management method based on artificial intelligence
CN117648174B (en) * 2024-01-29 2024-04-05 华北电力大学 Cloud computing heterogeneous task scheduling and container management method based on artificial intelligence

Also Published As

Publication number Publication date
CN114756358B (en) 2022-11-04
CN114756358A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
WO2023241000A1 (en) Dag task scheduling method and apparatus, device, and storage medium
Tong et al. QL-HEFT: a novel machine learning scheduling scheme base on cloud computing environment
US10754709B2 (en) Scalable task scheduling systems and methods for cyclic interdependent tasks using semantic analysis
Han et al. Tailored learning-based scheduling for kubernetes-oriented edge-cloud system
US8869159B2 (en) Scheduling MapReduce jobs in the presence of priority classes
US11657289B2 (en) Computational graph optimization
CN109783213B (en) Workflow fault tolerance scheduling method for reliability in edge computing environment
Wadhwa et al. Optimized task scheduling and preemption for distributed resource management in fog-assisted IoT environment
Gaussier et al. Online tuning of EASY-backfilling using queue reordering policies
Kaur et al. Load balancing optimization based on deep learning approach in cloud environment
CN111208975A (en) Concurrent execution service
US20100242042A1 (en) Method and apparatus for scheduling work in a stream-oriented computer system
Saif et al. Hybrid meta-heuristic genetic algorithm: Differential evolution algorithms for scientific workflow scheduling in heterogeneous cloud environment
Peng et al. Genetic Algorithm‐Based Task Scheduling in Cloud Computing Using MapReduce Framework
Poquet Simulation approach for resource management
Venkataswamy et al. Rare: Renewable energy aware resource management in datacenters
Tuli et al. Optimizing the performance of fog computing environments using ai and co-simulation
CN116069473A (en) Deep reinforcement learning-based Yarn cluster workflow scheduling method
Vahidipour et al. Priority assignment in queuing systems with unknown characteristics using learning automata and adaptive stochastic Petri nets
Jalali Khalil Abadi et al. A comprehensive survey on scheduling algorithms using fuzzy systems in distributed environments
CN114327925A (en) Power data real-time calculation scheduling optimization method and system
Chen et al. Applications of machine learning approach on multi-queue message scheduling
Alkhanak et al. A hyper-heuristic approach using a prioritized selection strategy for workflow scheduling in cloud computing
US11934870B2 (en) Method for scheduling a set of computing tasks in a supercomputer
Jananee et al. Allocation of cloud resources based on prediction and performing auto-scaling of workload

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22946664

Country of ref document: EP

Kind code of ref document: A1