CN110825527B - Deadline-budget driven scientific workflow scheduling method in cloud environment - Google Patents

Deadline-budget driven scientific workflow scheduling method in cloud environment Download PDF

Info

Publication number
CN110825527B
CN110825527B CN201911089637.1A CN201911089637A CN110825527B CN 110825527 B CN110825527 B CN 110825527B CN 201911089637 A CN201911089637 A CN 201911089637A CN 110825527 B CN110825527 B CN 110825527B
Authority
CN
China
Prior art keywords
computing
deadline
scheduling priority
priority sequence
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911089637.1A
Other languages
Chinese (zh)
Other versions
CN110825527A (en
Inventor
夏元清
陶思远
叶玲娟
戴荔
张金会
刘坤
翟弟华
邹伟东
崔冰
郭泽华
闫莉萍
孙中奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201911089637.1A priority Critical patent/CN110825527B/en
Publication of CN110825527A publication Critical patent/CN110825527A/en
Application granted granted Critical
Publication of CN110825527B publication Critical patent/CN110825527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a deadline-budget driven scientific workflow scheduling method in a cloud environment, which introduces the current task budget and the residual workflow budget on the basis of converting the deadline into the sub-deadline of a task node by simultaneously considering both the deadline and the budget, and can reduce the cost required by the task node to finish the calculation within the sub-deadline through the limitation on the cost of calculation resources, thereby reducing the scheduling cost of the scientific workflow and ensuring that the scheduling time and the time cost of the scientific workflow can be successfully scheduled under the conditions of the deadline and the budget constraint.

Description

Deadline-budget driven scientific workflow scheduling method in cloud environment
Technical Field
The invention belongs to the technical field of workflow scheduling, and particularly relates to a deadline-budget driven scientific workflow scheduling method in a cloud environment.
Background
Cloud computing is an emerging business computing model that has evolved from distributed computing, parallel computing, and grid computing. The cloud environment can provide a large number of computing resources to form a resource pool, and the resource pool can be used as required and is flexible and convenient. With the intensive research of scientific technology, scientific workflows requiring a lot of information processing and data calculation have come up, for example: montage, LIGO, cybersheke, Epigenomic, and the like. Aiming at the problem, the cloud resources provide powerful guarantee for the computing requirement of the large-scale scientific workflow.
Scientific workflow scheduling is a key issue in cloud computing. The high complexity and large scale of the workflow have high requirements on storage and calculation, and the cloud workflow scheduling is proposed based on the development of the cloud computing technology. The cloud workflow scheduling utilizes the advantages of cloud computing to establish an intelligent scheduling system with strong computing capability.
The cloud computing enables the scheduling system to formulate a scheduling scheme of the complex workflow, can judge the feasibility of workflow scheduling, select different scheduling algorithms according to the characteristics, purposes and condition constraints of the workflow, schedule and match tasks and computing resources in the workflow, and utilize cloud resources to actually schedule the tasks.
The scheduling algorithm plays a vital role in the process that the scheduling system matches tasks and computing resources in the cloud workflow, and particularly has important influence on the aspects of efficiently providing work results for scientific workers and providing business scheduling schemes for enterprise managers in the cloud workflow scheduling with deadline and budget limitation. The key problem of the design task and the calculation resource matching mode scheduling algorithm research. The prior art mainly comprises ProLiS, L-ACO algorithm and the like, and mainly adopts a matching mode based on deadline allocation and computing resources meeting the earliest completion time in sub deadline, and the main problems of the mode are as follows: the defect of excessive expenses caused by upgrading operation in the process of matching tasks and computing resources leads to the rise of scheduling cost.
Disclosure of Invention
In view of this, the invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which realizes successful scheduling of scientific workflows under the limitation conditions of the deadline and the budget and effectively reduces the scheduling cost.
The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which comprises the following steps:
step 1, setting a deadline of a scientific workflow; setting the number of ants and the maximum iteration number N in the ant colony algorithm, wherein N is a positive integer greater than or equal to 1; heuristic information in the ant colony algorithm is task priority of task nodes in a Directed Acyclic Graph (DAG) of a scientific workflow, and the task priority is obtained by calculation of set calculation data volume of the task nodes, communication transmission data volume among the task nodes, average bandwidth of a calculation resource pool, calculation rate of calculation resources with highest calculation performance rating and set probability parameters; the pheromone trail in the ant colony algorithm is the assignment of edges connecting task nodes in the DAG graph, and the assignments are made to be equal;
step 2, each ant calculates the sub-cutoff time of the task node according to the heuristic information and the cutoff period; generating a scheduling priority sequence of task nodes in the DAG according to the heuristic information and the pheromone trail;
according to the scheduling priority sequence, computing resources are sequentially selected for the task nodes to be scheduled, and computing time and cost of the scheduling priority sequence are computed, and the method comprises the following steps:
step 2.1, calculating the residual budget of the workflow corresponding to the current task node and the budget of the current task;
step 2.2, selecting the computing resources with the cost less than or equal to the budget of the current task in the computing resource pool to form an available computing resource set of the current task node;
step 2.3, when the set is not empty, selecting the computing resource with the computing time meeting the sub-deadline requirement of the current task node and the highest computing rate in the set; if the set does not have the computing resources with the computing time meeting the sub deadline requirement of the current task node, selecting the computing resources with the highest computing rate in the set, and if the computing performance rating of the computing resources with the highest computing rate in the set is not the highest level, upgrading the computing resources;
when the set is empty and the residual budget of the workflow is greater than or equal to 0, selecting the computing resource with the highest computing rate from the computing resource pool, and upgrading the computing resource if the computing performance rating of the computing resource is not the highest level;
when the set is empty and the residual budget of the workflow is less than 0, selecting the computing resource with the minimum cost and the computing time meeting the sub-deadline requirement of the current task node from the computing resource pool; if the computing resource with the computing time meeting the sub-deadline requirement of the current task node does not exist in the computing resource pool, selecting the computing resource with the minimum cost in the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource;
step 2.3, summing the calculation time and cost of all task nodes obtained in the step 2.2 to obtain the calculation time and cost of the scheduling priority sequence;
step 3, determining a local optimal scheduling priority sequence in the scheduling priority sequences calculated by all ants according to a comparison principle; then according to the comparison principle, comparing the local optimal scheduling priority sequence with the current global optimal scheduling priority sequence, and determining the current global optimal scheduling priority sequence of the iteration;
judging whether the current iteration times are greater than or equal to N, if so, finishing the execution, and the current global optimal scheduling priority sequence is the final global optimal scheduling priority sequence;
otherwise, performing pheromone trace deposition and equal proportion evaporation operation on pheromone traces on each side in the DAG graph corresponding to the local optimal scheduling priority sequence to obtain an updated pheromone trace, and taking the updated pheromone trace as a next generation pheromone trace; if the current iteration number is less than the set maximum relaxation algebra, performing relaxation treatment on the deadline to form a relaxed deadline, and taking the relaxed deadline as the deadline of the next generation; adding 1 to the current iteration times; step 2 is performed.
Further, the comparison principle is as follows: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.
Further, the comparison principle is realized by an epsilon-comparison method.
Further, the relaxation process employs the following equation:
Figure BDA0002266473740000041
wherein D isε(k) A deadline after relaxation for the kth iteration; d is the set expiration period; mbaseComputing time of the scientific workflow when all task nodes in the scientific workflow select computing resources with computing performance rating as the lowest level; k is a radical ofTSetting a maximum relaxation algebra for the maximum iteration number; k is the current iteration number, and k is more than or equal to 0 and less than kT(ii) a cp is a parameter of the curve that controls the end-period relaxation.
Further, the scheduling priority sequence is generated by adopting a K ahn algorithm.
Has the advantages that:
according to the invention, two constraints of deadline and budget are considered on the existing scientific workflow scheduling algorithm, on the basis of converting the deadline into the sub-deadline of the task node, the current task budget and the residual budget of the workflow are introduced, and the cost required by the task node to finish the calculation within the sub-deadline can be reduced by limiting the calculation resource cost, so that the scheduling cost of the scientific workflow is reduced, and the scheduling time and the time cost of the scientific workflow can be successfully scheduled under the conditions of the deadline and the budget constraint.
Drawings
Fig. 1 is a flowchart of a deadline-budget driven scientific workflow scheduling method in a cloud environment provided by the present invention.
Fig. 2(a) is a graph comparing the scheduling success rate on the cybersake workflow for different scheduling algorithms.
FIG. 2(b) is a graph comparing the scheduling success rate of different scheduling algorithms on the Epigenomic workflow.
Fig. 2(c) is a graph comparing the scheduling success rate on the LIGO workflow for different scheduling algorithms.
FIG. 2(d) is a comparison graph of scheduling success rates on a Montage workflow according to different scheduling algorithms.
FIG. 3(a) is a graph of scheduling cost versus the cybersheke workflow for different scheduling algorithms.
FIG. 3(b) is a graph comparing the scheduling cost of different scheduling algorithms on an Epigenomic workflow.
FIG. 3(c) is a graph comparing the scheduling cost on LIGO workflow for different scheduling algorithms.
FIG. 3(d) is a graph comparing the scheduling cost of different scheduling algorithms on a Montage workflow.
Fig. 4 is a cost optimization diagram of the deadline-budget driven scientific workflow scheduling method in the cloud environment based on different scheduling algorithms.
Detailed Description
The invention is described in detail below by way of example with reference to the accompanying drawings.
The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which mainly comprises the following steps: and performing topological sequencing on a DAG (direct-access) graph corresponding to the scientific workflow by adopting an ant colony algorithm to obtain different scheduling finite sequences, selecting optimal computing resources for the task nodes according to double constraints of the sub-deadline of the task nodes and the residual budget of the workflow, and iterating to obtain an optimal workflow scheduling scheme.
The invention provides a deadline-budget driven scientific workflow scheduling method in a cloud environment, which specifically comprises the following steps as shown in fig. 1:
step 1, ant colony initialization and iteration information are initialized.
Setting the number of ants in the ant colony algorithm, such as the ant colony of colSize ants, wherein colSize is a positive integer greater than or equal to 1; the maximum iteration number N of the ant colony algorithm is a positive integer which is greater than or equal to 1. And representing the scientific workflow by using a Directed Acyclic Graph (DAG), and setting the deadline of the scientific workflow.
The pheromone trace in the ant colony algorithm is assignment of edges connecting task nodes in the DAG graph, and assignment of each edge in the DAG graph is initialized, namely the pheromone trace is the same value and can be made equal to 1.
The probability parameter γ calculated according to the probability factor θ set by the user is as shown in formula (1):
Figure BDA0002266473740000061
wherein, ccrjAs a task node tjThe ratio of the calculated rate to the transmission rate is (ω)i/p(s*))/(datai,j/bw),ωiAs a task node tiCalculated amount of data, p(s)*) For calculating the calculation rate of the calculation resource with the highest grade, datai,jAs a task node tiAnd tjThe data volume is transmitted in the communication, and bw is the average bandwidth set in the computing resource pool; rand () is a block generated in [0, 1)A function of inter-random numbers; i. j is the serial number of the task node, and i and j are positive integers smaller than or equal to colSize.
According to the calculated data amount omega of each task node in the DAG graphiData amount data for communication and transmission between task nodesi,jAverage bandwidth bw set in the computing resource pool, rate p(s) of computing resources ranked highest*) And probability parameter gamma, calculating heuristic information of ant colony algorithm by adopting formula (2), namely task priority rank of each task nodei
Figure BDA0002266473740000062
Wherein, the task node tjAs a task node tiThe child node of (1).
And 2, generating a scheduling priority sequence by each ant in the ant colony, namely a scheduling scheme.
Calculating the assignable sub-deadline time subD of the task node by adopting a formula (3) according to the heuristic information obtained by calculation in the step 1 and the set scientific workflow deadline:
Figure BDA0002266473740000063
wherein, rankentryFor an ingress task node tentryAnd is set to the length of the workflow critical path. Task node tentryThe virtual head node is a virtual head node having a calculation data amount of 0 and a communication transfer data amount of 0, which are generated for uniformly starting tasks in the workflow DAG graph.
Performing topological sorting on the workflow DAG graph by using a Kahn algorithm to generate different scheduling priority sequences, namely, calculating the probability of selecting the next task node by adopting a formula (4) according to pheromone traces and heuristic information corresponding to the DAG graph in the current iteration algebra in the topological sorting process:
Figure BDA0002266473740000071
wherein, probi,jTo select task node tiThen selecting a task node t in the next stepjProbability of (1), τi,jTo connect task nodes tiAnd tjThe pheromone trace values on the edge between, alpha and beta, are positive parameters and constants for the control pheromone trace and the heuristic information, respectively.
Most of the existing algorithms select the earliest possible completion (the fastest calculation rate) calculation resource for the task node in the sub-deadline time after obtaining the sub-deadline time of each task, and if the calculation resource still cannot meet the condition that the completion time of the current task node is within the sub-deadline time and the calculation performance rating of the calculation resource is not the highest level, upgrade the calculation resource until the requirement of the sub-deadline time is met, however, such a selection mode excessively compresses the completion time of the task node, excessively increases the calculation resource cost, and thus causes the cost of the overall scheduling scheme to be increased. Therefore, the invention combines the current task budget and the workflow residual budget to select the computing resources of the task node, thereby achieving the purpose of limiting the excessive cost generated by the task node.
Firstly, a workflow residual budget calculation method in the prior art is adopted, that is, a workflow residual budget SAB is calculated according to a user-defined spending budget B, a time spending of a scheduled task and a minimum possible spending of an unscheduled task, as shown in formula (5):
Figure BDA0002266473740000072
is cost (t)i) As a task node tiThe actual cost of (1) is the cost obtained after the task node is matched with the computing resource; costmin(tj) As a task node tjThe minimum cost of the task node can be obtained by pre-executing and comparing the computing data of the task node on various types of computing resources.
And then calculating to obtain the current task budget CTB of the current task node according to the residual budget of the workflow, as shown in formula (6):
Figure BDA0002266473740000081
wherein, costaverage(tj) As a task node tjCan be calculated by the task node tjThe amount of data of (a) is pre-calculated at the expense of executing on all types of computing resources.
According to the current task budget, selecting the computing resources with the cost less than the current task budget from the computing resource pool to form an available computing resource set of the current task node, and recording the available computing resource set as
Figure BDA0002266473740000082
As shown in equation (7):
Figure BDA0002266473740000083
wherein s isqAs a computing resource of type q, ck,qAs a task node tkAt a computing resource sqThe cost of.
According to the residual budget of the workflow and the available computing resource set of the current task, the task is scheduled according to three conditions:
when the collection
Figure BDA0002266473740000084
Not being empty, then in the set
Figure BDA0002266473740000085
Selecting a computing resource with the computing time meeting the sub-deadline requirement of the current task node and the highest computing rate; if set
Figure BDA0002266473740000086
When no computing resource with computing time meeting the sub-deadline requirement of the current task node exists, the set is selected
Figure BDA0002266473740000087
Computing resources with the fastest computing rate in the event of a cluster
Figure BDA0002266473740000088
If the calculation performance rating of the calculation resource with the fastest calculation rate is not the highest level, upgrading the calculation resource;
when the collection
Figure BDA0002266473740000089
If the working flow is empty and the residual budget of the working flow is more than or equal to 0, selecting the computing resource with the highest computing rate from the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource;
when the collection
Figure BDA00022664737400000810
When the task node is empty and the residual budget of the workflow is less than 0, selecting the computing resource with the minimum cost and the computing time meeting the sub-deadline requirement of the current task node from the computing resource pool; if the computing resource with the computing time meeting the sub deadline requirement of the current task node does not exist in the computing resource pool, selecting the computing resource with the minimum cost in the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource.
And finally, summing the calculated time and cost of all the task nodes to obtain the calculated time and cost of the scheduling priority sequence.
And 3, ant colony iteration and updating to obtain an optimal scheduling scheme.
Determining a local optimal scheduling priority sequence in the scheduling priority sequences calculated by all ants according to a comparison principle; and then according to the same comparison principle, comparing the local optimal scheduling priority sequence with the current global optimal scheduling priority sequence, and determining the current global optimal scheduling priority sequence of the iteration.
In the invention, in order to obtain a local optimal scheme at the initial stage of iteration and to facilitate the scheduling finite sequence of later-stage iteration to tend to the optimal scheme by updating the pheromone trace, a loose relaxation deadline is used in a certain iteration algebra at the initial stage of iteration, and the restriction condition is relaxed to facilitate screening.
Setting a maximum relaxation algebra k according to the set maximum iteration numberT,kTIs a positive integer greater than 0 and less than the maximum number of iterations; performing relaxation processing on the deadline of the workflow in the maximum relaxation algebra, and calculating by adopting a formula (8) to obtain the slack deadline in the kth iteration, wherein the slack deadline is shown in the formula (8):
Figure BDA0002266473740000091
wherein D isε(k) A deadline after relaxation for the kth iteration; mbaseThe completion time required by the workflow is calculated for all tasks in the workflow by adopting the computing resource with the lowest computing performance rating; k is the current iteration algebra, and k is a positive integer which is greater than 0 and less than the maximum iteration times; cp is a parameter of the curve that controls the end-period relaxation.
Beyond the maximum relaxation algebra kTThen, the method provided by the invention is more strict in scheme screening, so that the deadline is strictly controlled to be the set deadline D.
After obtaining the scheduling scheme of the workflow according to the step 2, each ant compares the workflow based on the following comparison principle: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.
The comparison principle can be realized by an epsilon-comparison method, and as shown in formula (9), scheduling schemes calculated by all ants in the current iteration algebra of the ant colony are compared to obtain a local optimal scheme. Wherein the epsilon-comparison method comprises the following steps:
Figure BDA0002266473740000101
(f, φ) is a scheduling scheme, where f is the total cost of the scheduling scheme and φ is the final completion time of the scheduling scheme. The comparison method compares the cost and the scheduling time of the two scheduling schemes by 3 conditions to obtain the scheduling scheme with excellent performances in both the aspects of spending and finishing time, so that the scheduling scheme meeting the deadline and the budget constraint can be finally obtained.
And after the local optimal scheme is obtained, comparing the local optimal scheme with the global optimal scheme by adopting the same method, and taking the better scheme as a new global optimal scheme.
And then judging whether the current iteration times are more than or equal to the maximum iteration times, if so, finishing the execution, wherein the current global optimal scheduling priority sequence is the final global optimal scheduling priority sequence, and outputting the global optimal scheduling priority sequence.
If the current iteration number is less than the maximum iteration number, then:
in the obtained local optimal scheduling scheme, pheromone trail updating of the ant colony algorithm is performed on pheromone trails on the sides of the DAG graph of the scheduling priority sequence, namely pheromone deposition and equal-proportion pheromone evaporation operation are performed on the selected pheromone trails, and the calculation mode of the specific pheromone trail updating is shown as a formula (10):
τi,j(k+1)=τi,j(k)×(1-ρ)+Δτi,j(k) (10)
wherein, taui,j(k) For a workflow DAG at task t in the kth iterationiAnd tjThe values of the pheromone traces on the edges of the interconnections, p being the evaporation coefficient of the pheromone, Δ τi,j(k) For the k-th iteration at task tiAnd tjThe deposition amount of pheromone on the side of the inter-connection is defined as shown in the formula (11):
Figure BDA0002266473740000111
wherein s isbest(k) For the local optimum in the k-th iteration, fbest(k) For the cost of the solution.
Then, the updated pheromone trace is used as a next generation pheromone trace, the iteration number is added by 1, and the step 2 is executed.
By adopting the deadline-budget driven scientific workflow scheduling method in the cloud environment, the four typical scientific workflows of CyberShake, Epigenomic, LIGO and Montage are respectively scheduled under any strict deadline and budget constraint, compared with the ICPCP, PSO, ProLiS and L-ACO algorithms in the prior art, as shown in fig. 2(a), fig. 2(b), fig. 2(c), fig. 2(d), fig. 3(a), fig. 3(b), fig. 3(c), fig. 3(d) and fig. 4, the method of the present invention can achieve 100% successful scheduling, and the scheduling cost is reduced to a certain extent, particularly the CyberShake workflow used for seismology has outstanding reduction amplitude, which shows that the scientific workflow scheduling method driven by deadline-budget under cloud environment is effective and effective in scheduling the scientific workflow.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A deadline-budget driven scientific workflow scheduling method in a cloud environment is characterized by comprising the following steps:
step 1, setting a deadline of a scientific workflow; setting the number of ants and the maximum iteration number N in the ant colony algorithm, wherein N is a positive integer greater than or equal to 1; heuristic information in the ant colony algorithm is task priority of task nodes in a Directed Acyclic Graph (DAG) of a scientific workflow, and the task priority is obtained by calculation of set calculation data volume of the task nodes, communication transmission data volume among the task nodes, average bandwidth of a calculation resource pool, calculation rate of calculation resources with highest calculation performance rating and set probability parameters; the pheromone trail in the ant colony algorithm is the assignment of edges connecting task nodes in the DAG graph, and the assignments are made to be equal;
step 2, each ant calculates the sub-cutoff time of the task node according to the heuristic information and the cutoff period; generating a scheduling priority sequence of task nodes in the DAG according to the heuristic information and the pheromone trail;
according to the scheduling priority sequence, computing resources are sequentially selected for the task nodes to be scheduled, and computing time and cost of the scheduling priority sequence are computed, and the method comprises the following steps:
step 2.1, calculating the residual budget of the workflow corresponding to the current task node and the budget of the current task;
step 2.2, selecting the computing resources with the cost less than or equal to the budget of the current task in the computing resource pool to form an available computing resource set of the current task node;
step 2.3, when the set is not empty, selecting the computing resource with the computing time meeting the sub-deadline requirement of the current task node and the highest computing rate in the set; if the set does not have the computing resources with the computing time meeting the sub deadline requirement of the current task node, selecting the computing resources with the highest computing rate in the set, and if the computing performance rating of the computing resources with the highest computing rate in the set is not the highest level, upgrading the computing resources;
when the set is empty and the residual budget of the workflow is greater than or equal to 0, selecting the computing resource with the highest computing rate from the computing resource pool, and upgrading the computing resource if the computing performance rating of the computing resource is not the highest level;
when the set is empty and the residual budget of the workflow is less than 0, selecting the computing resource with the minimum cost and the computing time meeting the sub-deadline requirement of the current task node from the computing resource pool; if the computing resource with the computing time meeting the sub-deadline requirement of the current task node does not exist in the computing resource pool, selecting the computing resource with the minimum cost in the computing resource pool, and if the computing performance rating of the computing resource is not the highest level, upgrading the computing resource;
step 2.4, summing the calculation time and cost of all task nodes obtained in the step 2.2 to obtain the calculation time and cost of the scheduling priority sequence;
step 3, determining a local optimal scheduling priority sequence in the scheduling priority sequences calculated by all ants according to a comparison principle; then according to the comparison principle, comparing the local optimal scheduling priority sequence with the current global optimal scheduling priority sequence, and determining the current global optimal scheduling priority sequence of the iteration;
judging whether the current iteration times are greater than or equal to N, if so, finishing the execution, and the current global optimal scheduling priority sequence is the final global optimal scheduling priority sequence;
otherwise, performing pheromone trace deposition and equal proportion evaporation operation on pheromone traces on each side in the DAG graph corresponding to the local optimal scheduling priority sequence to obtain an updated pheromone trace, and taking the updated pheromone trace as a next generation pheromone trace; if the current iteration number is less than the set maximum relaxation algebra, performing relaxation treatment on the deadline to form a relaxed deadline, and taking the relaxed deadline as the deadline of the next generation; adding 1 to the current iteration times; step 2 is performed.
2. The method of claim 1, wherein the comparison criteria is: when the calculation time of the two scheduling priority sequences is less than the deadline, selecting the scheduling priority sequence with smaller cost; when the calculation time of the two scheduling priority sequences is equal, selecting the scheduling priority sequence with smaller cost; when the calculation time of two scheduling priority sequences is not equal, and the calculation time of at least one scheduling priority sequence is greater than or equal to the deadline, selecting the scheduling priority sequence with less calculation time.
3. A method according to claim 2, characterized in that said comparison principle is implemented using an epsilon-comparison method.
4. The method of claim 1, wherein the relaxation process uses the following equation:
Figure FDA0002266473730000031
wherein D isε(k) A deadline after relaxation for the kth iteration; d is the set expiration period; mbaseComputing time of the scientific workflow when all task nodes in the scientific workflow select computing resources with computing performance rating as the lowest level; k is a radical ofTSetting a maximum relaxation algebra for the maximum iteration number; k is the current iteration number and is more than or equal to 0<kT(ii) a cp is a parameter of the curve that controls the end-period relaxation.
5. The method of claim 1, wherein the scheduling priority sequence is generated using a Kahn algorithm.
CN201911089637.1A 2019-11-08 2019-11-08 Deadline-budget driven scientific workflow scheduling method in cloud environment Active CN110825527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911089637.1A CN110825527B (en) 2019-11-08 2019-11-08 Deadline-budget driven scientific workflow scheduling method in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911089637.1A CN110825527B (en) 2019-11-08 2019-11-08 Deadline-budget driven scientific workflow scheduling method in cloud environment

Publications (2)

Publication Number Publication Date
CN110825527A CN110825527A (en) 2020-02-21
CN110825527B true CN110825527B (en) 2022-01-04

Family

ID=69553860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911089637.1A Active CN110825527B (en) 2019-11-08 2019-11-08 Deadline-budget driven scientific workflow scheduling method in cloud environment

Country Status (1)

Country Link
CN (1) CN110825527B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459628B (en) * 2020-03-12 2023-11-28 大庆市凯德信信息技术有限公司 Spark platform task scheduling method based on improved quantum ant colony algorithm
CN111913800B (en) * 2020-07-15 2022-07-19 东北大学秦皇岛分校 Resource allocation method for optimizing cost of micro-service in cloud based on L-ACO
CN112308304B (en) * 2020-10-22 2023-06-23 西北工业大学 Workflow execution time optimization method and device
CN112783123B (en) * 2020-12-30 2021-11-19 北京理工大学 Workflow scheduling execution unit control method and controller
CN113176933B (en) * 2021-04-08 2023-05-02 中山大学 Dynamic cloud network interconnection method for massive workflow tasks
CN113064710B (en) * 2021-04-15 2022-09-09 北京理工大学 Cloud workflow scheduling method and system
CN113064711B (en) * 2021-04-15 2022-09-20 北京理工大学 Online multi-workflow dynamic scheduling method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8583467B1 (en) * 2012-08-23 2013-11-12 Fmr Llc Method and system for optimized scheduling of workflows
CN105897864A (en) * 2016-03-28 2016-08-24 东南大学 Scheduling method for cloud workflow
CN106055395A (en) * 2016-05-18 2016-10-26 中南大学 Method for constraining workflow scheduling in cloud environment based on ant colony optimization algorithm through deadline
CN109934416A (en) * 2019-03-25 2019-06-25 中国石油大学(华东) Research-on-research flows down the time optimization scheduling method of expense budget constraint in a kind of cloud
CN110111006A (en) * 2019-05-08 2019-08-09 中国石油大学(华东) Scientific workflow Cost Optimization dispatching method in a kind of cloud based on chaos Ant ColonySystem
CN110119316A (en) * 2019-05-17 2019-08-13 中国石油大学(华东) A kind of associated task scheduling strategy based on slackness and Ant ColonySystem

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8583467B1 (en) * 2012-08-23 2013-11-12 Fmr Llc Method and system for optimized scheduling of workflows
CN105897864A (en) * 2016-03-28 2016-08-24 东南大学 Scheduling method for cloud workflow
CN106055395A (en) * 2016-05-18 2016-10-26 中南大学 Method for constraining workflow scheduling in cloud environment based on ant colony optimization algorithm through deadline
CN109934416A (en) * 2019-03-25 2019-06-25 中国石油大学(华东) Research-on-research flows down the time optimization scheduling method of expense budget constraint in a kind of cloud
CN110111006A (en) * 2019-05-08 2019-08-09 中国石油大学(华东) Scientific workflow Cost Optimization dispatching method in a kind of cloud based on chaos Ant ColonySystem
CN110119316A (en) * 2019-05-17 2019-08-13 中国石油大学(华东) A kind of associated task scheduling strategy based on slackness and Ant ColonySystem

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deadline-Constrained Cost Optimization;Quanwang Wu;《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》;20171231;全文 *
云环境下的工作流调度方法研究;刘海涛;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20150715;全文 *

Also Published As

Publication number Publication date
CN110825527A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN110825527B (en) Deadline-budget driven scientific workflow scheduling method in cloud environment
EP3770774B1 (en) Control method for household appliance, and household appliance
Rahman et al. Adaptive workflow scheduling for dynamic grid and cloud computing environment
Fard et al. A multi-objective approach for workflow scheduling in heterogeneous environments
Rahman et al. A dynamic critical path algorithm for scheduling scientific workflow applications on global grids
CN108874525A (en) A kind of service request distribution method towards edge calculations environment
Xu et al. A multiple priority queueing genetic algorithm for task scheduling on heterogeneous computing systems
Udomkasemsub et al. A multiple-objective workflow scheduling framework for cloud data analytics
Niu et al. An efficient distributed algorithm for resource allocation in large-scale coupled systems
CN114996001A (en) Distributed machine learning task GPU resource scheduling and distributing method and system
CN106934539B (en) Workflow scheduling method with deadline and expense constraints
CN111209104A (en) Energy perception scheduling method for Spark application under heterogeneous cluster
CN112052092A (en) Risk-aware edge computing task allocation method
CN108737462A (en) A kind of cloud computation data center method for scheduling task based on graph theory
JP3940399B2 (en) An object-oriented framework for general purpose adaptive control
Natesan et al. Opposition learning-based grey wolf optimizer algorithm for parallel machine scheduling in cloud environment
Garg et al. Multi-objective workflow grid scheduling based on discrete particle swarm optimization
CN110098964A (en) A kind of disposition optimization method based on ant group algorithm
CN114090239B (en) Method and device for dispatching edge resources based on model reinforcement learning
Luo et al. Learning to optimize DAG scheduling in heterogeneous environment
Aliyu et al. Management of cloud resources and social change in a multi-tier environment: a novel finite automata using ant colony optimization with spanning tree
CN106802822A (en) A kind of cloud data center cognitive resources dispatching method based on moth algorithm
CN107784391B (en) Operation time random basic combat unit use guarantee resource optimal allocation method
CN111026534B (en) Workflow execution optimization method based on multiple group genetic algorithms in cloud computing environment
CN111475297B (en) Flexible operation configuration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant