CN110825527B

CN110825527B - Deadline-budget driven scientific workflow scheduling method in cloud environment

Info

Publication number: CN110825527B
Application number: CN201911089637.1A
Authority: CN
Inventors: 夏元清; 陶思远; 叶玲娟; 戴荔; 张金会; 刘坤; 翟弟华; 邹伟东; 崔冰; 郭泽华; 闫莉萍; 孙中奇
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2022-01-04
Anticipated expiration: 2039-11-08
Also published as: CN110825527A

Abstract

The invention discloses a deadline-budget driven scientific workflow scheduling method in a cloud environment, which introduces the current task budget and the residual workflow budget on the basis of converting the deadline into the sub-deadline of a task node by simultaneously considering both the deadline and the budget, and can reduce the cost required by the task node to finish the calculation within the sub-deadline through the limitation on the cost of calculation resources, thereby reducing the scheduling cost of the scientific workflow and ensuring that the scheduling time and the time cost of the scientific workflow can be successfully scheduled under the conditions of the deadline and the budget constraint.

Description

Deadline-budget driven scientific workflow scheduling method in cloud environment

Technical Field

The invention belongs to the technical field of workflow scheduling, and particularly relates to a deadline-budget driven scientific workflow scheduling method in a cloud environment.

Background

Cloud computing is an emerging business computing model that has evolved from distributed computing, parallel computing, and grid computing. The cloud environment can provide a large number of computing resources to form a resource pool, and the resource pool can be used as required and is flexible and convenient. With the intensive research of scientific technology, scientific workflows requiring a lot of information processing and data calculation have come up, for example: montage, LIGO, cybersheke, Epigenomic, and the like. Aiming at the problem, the cloud resources provide powerful guarantee for the computing requirement of the large-scale scientific workflow.

Scientific workflow scheduling is a key issue in cloud computing. The high complexity and large scale of the workflow have high requirements on storage and calculation, and the cloud workflow scheduling is proposed based on the development of the cloud computing technology. The cloud workflow scheduling utilizes the advantages of cloud computing to establish an intelligent scheduling system with strong computing capability.

The cloud computing enables the scheduling system to formulate a scheduling scheme of the complex workflow, can judge the feasibility of workflow scheduling, select different scheduling algorithms according to the characteristics, purposes and condition constraints of the workflow, schedule and match tasks and computing resources in the workflow, and utilize cloud resources to actually schedule the tasks.

The scheduling algorithm plays a vital role in the process that the scheduling system matches tasks and computing resources in the cloud workflow, and particularly has important influence on the aspects of efficiently providing work results for scientific workers and providing business scheduling schemes for enterprise managers in the cloud workflow scheduling with deadline and budget limitation. The key problem of the design task and the calculation resource matching mode scheduling algorithm research. The prior art mainly comprises ProLiS, L-ACO algorithm and the like, and mainly adopts a matching mode based on deadline allocation and computing resources meeting the earliest completion time in sub deadline, and the main problems of the mode are as follows: the defect of excessive expenses caused by upgrading operation in the process of matching tasks and computing resources leads to the rise of scheduling cost.

Disclosure of Invention

In view of this, the invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which realizes successful scheduling of scientific workflows under the limitation conditions of the deadline and the budget and effectively reduces the scheduling cost.

The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which comprises the following steps:

step 1, setting a deadline of a scientific workflow; setting the number of ants and the maximum iteration number N in the ant colony algorithm, wherein N is a positive integer greater than or equal to 1; heuristic information in the ant colony algorithm is task priority of task nodes in a Directed Acyclic Graph (DAG) of a scientific workflow, and the task priority is obtained by calculation of set calculation data volume of the task nodes, communication transmission data volume among the task nodes, average bandwidth of a calculation resource pool, calculation rate of calculation resources with highest calculation performance rating and set probability parameters; the pheromone trail in the ant colony algorithm is the assignment of edges connecting task nodes in the DAG graph, and the assignments are made to be equal;

step 2, each ant calculates the sub-cutoff time of the task node according to the heuristic information and the cutoff period; generating a scheduling priority sequence of task nodes in the DAG according to the heuristic information and the pheromone trail;

according to the scheduling priority sequence, computing resources are sequentially selected for the task nodes to be scheduled, and computing time and cost of the scheduling priority sequence are computed, and the method comprises the following steps:

step 2.1, calculating the residual budget of the workflow corresponding to the current task node and the budget of the current task;

step 2.2, selecting the computing resources with the cost less than or equal to the budget of the current task in the computing resource pool to form an available computing resource set of the current task node;

step 2.3, when the set is not empty, selecting the computing resource with the computing time meeting the sub-deadline requirement of the current task node and the highest computing rate in the set; if the set does not have the computing resources with the computing time meeting the sub deadline requirement of the current task node, selecting the computing resources with the highest computing rate in the set, and if the computing performance rating of the computing resources with the highest computing rate in the set is not the highest level, upgrading the computing resources;

when the set is empty and the residual budget of the workflow is greater than or equal to 0, selecting the computing resource with the highest computing rate from the computing resource pool, and upgrading the computing resource if the computing performance rating of the computing resource is not the highest level;

when the set is empty and the residual budget of the workflow is less than 0, selecting the computing resource with the minimum cost and the computing time meeting the sub-deadline requirement of the current task node from the computing resource pool; if the computing resource with the computing time meeting the sub-deadline requirement of the current task node does not exist in the computing resource pool, selecting the computing resource with the minimum cost in the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource;

step 2.3, summing the calculation time and cost of all task nodes obtained in the step 2.2 to obtain the calculation time and cost of the scheduling priority sequence;

step 3, determining a local optimal scheduling priority sequence in the scheduling priority sequences calculated by all ants according to a comparison principle; then according to the comparison principle, comparing the local optimal scheduling priority sequence with the current global optimal scheduling priority sequence, and determining the current global optimal scheduling priority sequence of the iteration;

judging whether the current iteration times are greater than or equal to N, if so, finishing the execution, and the current global optimal scheduling priority sequence is the final global optimal scheduling priority sequence;

otherwise, performing pheromone trace deposition and equal proportion evaporation operation on pheromone traces on each side in the DAG graph corresponding to the local optimal scheduling priority sequence to obtain an updated pheromone trace, and taking the updated pheromone trace as a next generation pheromone trace; if the current iteration number is less than the set maximum relaxation algebra, performing relaxation treatment on the deadline to form a relaxed deadline, and taking the relaxed deadline as the deadline of the next generation; adding 1 to the current iteration times; step 2 is performed.

Further, the comparison principle is as follows: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.

Further, the comparison principle is realized by an epsilon-comparison method.

Further, the relaxation process employs the following equation:

wherein D is_ε(k) A deadline after relaxation for the kth iteration; d is the set expiration period; m_baseComputing time of the scientific workflow when all task nodes in the scientific workflow select computing resources with computing performance rating as the lowest level; k is a radical of_TSetting a maximum relaxation algebra for the maximum iteration number; k is the current iteration number, and k is more than or equal to 0 and less than k_T(ii) a cp is a parameter of the curve that controls the end-period relaxation.

Further, the scheduling priority sequence is generated by adopting a K ahn algorithm.

Has the advantages that:

according to the invention, two constraints of deadline and budget are considered on the existing scientific workflow scheduling algorithm, on the basis of converting the deadline into the sub-deadline of the task node, the current task budget and the residual budget of the workflow are introduced, and the cost required by the task node to finish the calculation within the sub-deadline can be reduced by limiting the calculation resource cost, so that the scheduling cost of the scientific workflow is reduced, and the scheduling time and the time cost of the scientific workflow can be successfully scheduled under the conditions of the deadline and the budget constraint.

Drawings

Fig. 1 is a flowchart of a deadline-budget driven scientific workflow scheduling method in a cloud environment provided by the present invention.

Fig. 2(a) is a graph comparing the scheduling success rate on the cybersake workflow for different scheduling algorithms.

FIG. 2(b) is a graph comparing the scheduling success rate of different scheduling algorithms on the Epigenomic workflow.

Fig. 2(c) is a graph comparing the scheduling success rate on the LIGO workflow for different scheduling algorithms.

FIG. 2(d) is a comparison graph of scheduling success rates on a Montage workflow according to different scheduling algorithms.

FIG. 3(a) is a graph of scheduling cost versus the cybersheke workflow for different scheduling algorithms.

FIG. 3(b) is a graph comparing the scheduling cost of different scheduling algorithms on an Epigenomic workflow.

FIG. 3(c) is a graph comparing the scheduling cost on LIGO workflow for different scheduling algorithms.

FIG. 3(d) is a graph comparing the scheduling cost of different scheduling algorithms on a Montage workflow.

Fig. 4 is a cost optimization diagram of the deadline-budget driven scientific workflow scheduling method in the cloud environment based on different scheduling algorithms.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which mainly comprises the following steps: and performing topological sequencing on a DAG (direct-access) graph corresponding to the scientific workflow by adopting an ant colony algorithm to obtain different scheduling finite sequences, selecting optimal computing resources for the task nodes according to double constraints of the sub-deadline of the task nodes and the residual budget of the workflow, and iterating to obtain an optimal workflow scheduling scheme.

The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which specifically comprises the following steps as shown in fig. 1:

step 1, ant colony initialization and iteration information are initialized.

Setting the number of ants in the ant colony algorithm, such as the ant colony of colSize ants, wherein colSize is a positive integer greater than or equal to 1; the maximum iteration number N of the ant colony algorithm is a positive integer which is greater than or equal to 1. And representing the scientific workflow by using a Directed Acyclic Graph (DAG), and setting the deadline of the scientific workflow.

The pheromone trace in the ant colony algorithm is assignment of edges connecting task nodes in the DAG graph, and assignment of each edge in the DAG graph is initialized, namely the pheromone trace is the same value and can be made equal to 1.

The probability parameter γ calculated according to the probability factor θ set by the user is as shown in formula (1):

wherein, ccr_jAs a task node t_jThe ratio of the calculated rate to the transmission rate is (ω)_i/p(s*))/(data_i，j/bw)，ω_iAs a task node t_iCalculated amount of data, p(s)^*) For calculating the calculation rate of the calculation resource with the highest grade, data_i，jAs a task node t_iAnd t_jThe data volume is transmitted in the communication, and bw is the average bandwidth set in the computing resource pool; rand () is a block generated in [0, 1)A function of inter-random numbers; i. j is the serial number of the task node, and i and j are positive integers smaller than or equal to colSize.

According to the calculated data amount omega of each task node in the DAG graph_iData amount data for communication and transmission between task nodes_i，jAverage bandwidth bw set in the computing resource pool, rate p(s) of computing resources ranked highest^*) And probability parameter gamma, calculating heuristic information of ant colony algorithm by adopting formula (2), namely task priority rank of each task node_i：

Wherein, the task node t_jAs a task node t_iThe child node of (1).

And 2, generating a scheduling priority sequence by each ant in the ant colony, namely a scheduling scheme.

Calculating the assignable sub-deadline time subD of the task node by adopting a formula (3) according to the heuristic information obtained by calculation in the step 1 and the set scientific workflow deadline:

wherein, rank_entryFor an ingress task node t_entryAnd is set to the length of the workflow critical path. Task node t_entryThe virtual head node is a virtual head node having a calculation data amount of 0 and a communication transfer data amount of 0, which are generated for uniformly starting tasks in the workflow DAG graph.

Performing topological sorting on the workflow DAG graph by using a Kahn algorithm to generate different scheduling priority sequences, namely, calculating the probability of selecting the next task node by adopting a formula (4) according to pheromone traces and heuristic information corresponding to the DAG graph in the current iteration algebra in the topological sorting process:

wherein, prob_i，jTo select task node t_iThen selecting a task node t in the next step_jProbability of (1), τ_i，jTo connect task nodes t_iAnd t_jThe pheromone trace values on the edge between, alpha and beta, are positive parameters and constants for the control pheromone trace and the heuristic information, respectively.

Most of the existing algorithms select the earliest possible completion (the fastest calculation rate) calculation resource for the task node in the sub-deadline time after obtaining the sub-deadline time of each task, and if the calculation resource still cannot meet the condition that the completion time of the current task node is within the sub-deadline time and the calculation performance rating of the calculation resource is not the highest level, upgrade the calculation resource until the requirement of the sub-deadline time is met, however, such a selection mode excessively compresses the completion time of the task node, excessively increases the calculation resource cost, and thus causes the cost of the overall scheduling scheme to be increased. Therefore, the invention combines the current task budget and the workflow residual budget to select the computing resources of the task node, thereby achieving the purpose of limiting the excessive cost generated by the task node.

Firstly, a workflow residual budget calculation method in the prior art is adopted, that is, a workflow residual budget SAB is calculated according to a user-defined spending budget B, a time spending of a scheduled task and a minimum possible spending of an unscheduled task, as shown in formula (5):

is cost (t)_i) As a task node t_iThe actual cost of (1) is the cost obtained after the task node is matched with the computing resource; cost_min(t_j) As a task node t_jThe minimum cost of the task node can be obtained by pre-executing and comparing the computing data of the task node on various types of computing resources.

And then calculating to obtain the current task budget CTB of the current task node according to the residual budget of the workflow, as shown in formula (6):

wherein, cost_average(t_j) As a task node t_jCan be calculated by the task node t_jThe amount of data of (a) is pre-calculated at the expense of executing on all types of computing resources.

According to the current task budget, selecting the computing resources with the cost less than the current task budget from the computing resource pool to form an available computing resource set of the current task node, and recording the available computing resource set as

As shown in equation (7):

wherein s is_qAs a computing resource of type q, c_k，qAs a task node t_kAt a computing resource s_qThe cost of.

According to the residual budget of the workflow and the available computing resource set of the current task, the task is scheduled according to three conditions:

when the collection

Not being empty, then in the set

Selecting a computing resource with the computing time meeting the sub-deadline requirement of the current task node and the highest computing rate; if set

When no computing resource with computing time meeting the sub-deadline requirement of the current task node exists, the set is selected

Computing resources with the fastest computing rate in the event of a cluster

If the calculation performance rating of the calculation resource with the fastest calculation rate is not the highest level, upgrading the calculation resource;

when the collection

If the working flow is empty and the residual budget of the working flow is more than or equal to 0, selecting the computing resource with the highest computing rate from the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource;

when the collection

When the task node is empty and the residual budget of the workflow is less than 0, selecting the computing resource with the minimum cost and the computing time meeting the sub-deadline requirement of the current task node from the computing resource pool; if the computing resource with the computing time meeting the sub deadline requirement of the current task node does not exist in the computing resource pool, selecting the computing resource with the minimum cost in the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource.

And finally, summing the calculated time and cost of all the task nodes to obtain the calculated time and cost of the scheduling priority sequence.

And 3, ant colony iteration and updating to obtain an optimal scheduling scheme.

Determining a local optimal scheduling priority sequence in the scheduling priority sequences calculated by all ants according to a comparison principle; and then according to the same comparison principle, comparing the local optimal scheduling priority sequence with the current global optimal scheduling priority sequence, and determining the current global optimal scheduling priority sequence of the iteration.

In the invention, in order to obtain a local optimal scheme at the initial stage of iteration and to facilitate the scheduling finite sequence of later-stage iteration to tend to the optimal scheme by updating the pheromone trace, a loose relaxation deadline is used in a certain iteration algebra at the initial stage of iteration, and the restriction condition is relaxed to facilitate screening.

Setting a maximum relaxation algebra k according to the set maximum iteration number_T，k_TIs a positive integer greater than 0 and less than the maximum number of iterations; performing relaxation processing on the deadline of the workflow in the maximum relaxation algebra, and calculating by adopting a formula (8) to obtain the slack deadline in the kth iteration, wherein the slack deadline is shown in the formula (8):

wherein D is_ε(k) A deadline after relaxation for the kth iteration; m_baseThe completion time required by the workflow is calculated for all tasks in the workflow by adopting the computing resource with the lowest computing performance rating; k is the current iteration algebra, and k is a positive integer which is greater than 0 and less than the maximum iteration times; cp is a parameter of the curve that controls the end-period relaxation.

Beyond the maximum relaxation algebra k_TThen, the method provided by the invention is more strict in scheme screening, so that the deadline is strictly controlled to be the set deadline D.

After obtaining the scheduling scheme of the workflow according to the step 2, each ant compares the workflow based on the following comparison principle: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.

The comparison principle can be realized by an epsilon-comparison method, and as shown in formula (9), scheduling schemes calculated by all ants in the current iteration algebra of the ant colony are compared to obtain a local optimal scheme. Wherein the epsilon-comparison method comprises the following steps:

(f, φ) is a scheduling scheme, where f is the total cost of the scheduling scheme and φ is the final completion time of the scheduling scheme. The comparison method compares the cost and the scheduling time of the two scheduling schemes by 3 conditions to obtain the scheduling scheme with excellent performances in both the aspects of spending and finishing time, so that the scheduling scheme meeting the deadline and the budget constraint can be finally obtained.

And after the local optimal scheme is obtained, comparing the local optimal scheme with the global optimal scheme by adopting the same method, and taking the better scheme as a new global optimal scheme.

And then judging whether the current iteration times are more than or equal to the maximum iteration times, if so, finishing the execution, wherein the current global optimal scheduling priority sequence is the final global optimal scheduling priority sequence, and outputting the global optimal scheduling priority sequence.

If the current iteration number is less than the maximum iteration number, then:

in the obtained local optimal scheduling scheme, pheromone trail updating of the ant colony algorithm is performed on pheromone trails on the sides of the DAG graph of the scheduling priority sequence, namely pheromone deposition and equal-proportion pheromone evaporation operation are performed on the selected pheromone trails, and the calculation mode of the specific pheromone trail updating is shown as a formula (10):

τ_i，j(k+1)＝τ_i，j(k)×(1-ρ)+Δτ_i，j(k) (10)

wherein, tau_i，j(k) For a workflow DAG at task t in the kth iteration_iAnd t_jThe values of the pheromone traces on the edges of the interconnections, p being the evaporation coefficient of the pheromone, Δ τ_i，j(k) For the k-th iteration at task t_iAnd t_jThe deposition amount of pheromone on the side of the inter-connection is defined as shown in the formula (11):

wherein s is_best(k) For the local optimum in the k-th iteration, f_best(k) For the cost of the solution.

Then, the updated pheromone trace is used as a next generation pheromone trace, the iteration number is added by 1, and the step 2 is executed.

By adopting the deadline-budget driven scientific workflow scheduling method in the cloud environment, the four typical scientific workflows of CyberShake, Epigenomic, LIGO and Montage are respectively scheduled under any strict deadline and budget constraint, compared with the ICPCP, PSO, ProLiS and L-ACO algorithms in the prior art, as shown in fig. 2(a), fig. 2(b), fig. 2(c), fig. 2(d), fig. 3(a), fig. 3(b), fig. 3(c), fig. 3(d) and fig. 4, the method of the present invention can achieve 100% successful scheduling, and the scheduling cost is reduced to a certain extent, particularly the CyberShake workflow used for seismology has outstanding reduction amplitude, which shows that the scientific workflow scheduling method driven by deadline-budget under cloud environment is effective and effective in scheduling the scientific workflow.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A deadline-budget driven scientific workflow scheduling method in a cloud environment is characterized by comprising the following steps:

step 2.4, summing the calculation time and cost of all task nodes obtained in the step 2.2 to obtain the calculation time and cost of the scheduling priority sequence;

2. The method of claim 1, wherein the comparison criteria is: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.

3. A method according to claim 2, characterized in that said comparison principle is implemented using an epsilon-comparison method.

4. The method of claim 1, wherein the relaxation process uses the following equation:

wherein D is_ε(k) A deadline after relaxation for the kth iteration; d is the set expiration period; m_baseComputing time of the scientific workflow when all task nodes in the scientific workflow select computing resources with computing performance rating as the lowest level; k is a radical of_TSetting a maximum relaxation algebra for the maximum iteration number; k is the current iteration number and is more than or equal to 0<k_T(ii) a cp is a parameter of the curve that controls the end-period relaxation.

5. The method of claim 1, wherein the scheduling priority sequence is generated using a Kahn algorithm.