CN111274009B - Data intensive workflow scheduling method based on stage division in cloud environment - Google Patents
Data intensive workflow scheduling method based on stage division in cloud environment Download PDFInfo
- Publication number
- CN111274009B CN111274009B CN202010033432.8A CN202010033432A CN111274009B CN 111274009 B CN111274009 B CN 111274009B CN 202010033432 A CN202010033432 A CN 202010033432A CN 111274009 B CN111274009 B CN 111274009B
- Authority
- CN
- China
- Prior art keywords
- workflow
- task
- tasks
- stage
- scheduling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000005540 biological transmission Effects 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims description 15
- 238000012216 screening Methods 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000004801 process automation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a data intensive workflow scheduling method based on stage division in a cloud environment, which comprises the steps of abstracting a workflow structure; defining a task candidate service provider; determining a workflow scheduling framework; dividing the workflow stage, and expanding according to data dependence; and calculating the completion time of each task executed by the candidate service provider at the current stage, arranging the completion time into a matrix, and distributing the tasks at the current stage. Until the tasks of all the stages are distributed, the method considers the influence of the transmission time of the data-intensive workflow, and improves the execution efficiency of the workflow.
Description
Technical Field
The invention belongs to the field of cloud computing, and particularly relates to a data intensive workflow scheduling method based on stage division in a cloud environment.
Background
Cloud computing is a novel business computing model, provides convenient, low-cost and readily available computing resources as services, and has the advantages of low service and maintenance costs, flexible control and the like. Workflow refers to the use of a computer to integrate or automate a business process as a part of it. The workflow management federation defines a workflow as all or part of a business process automation during which documents, information, or tasks are to be executed according to a series of procedural rules in each link. The workflow of the cloud computing model can support various complex information applications, such as climate modeling, seismic modeling, weather forecasting, and the like. Particularly in the interdisciplinary fields of bioinformatics and climate simulation, workflows are often data intensive, requiring large-scale computing resources to process gigabytes or terabytes of input data. The purpose of the cloud workflow scheduling is to solve the task scheduling problem in a workflow management system in a cloud computing environment and to deploy tasks to different service providers in the cloud environment by formulating a proper scheduling method. The current research optimizes the execution time and cost of the workflow through various scheduling algorithms, and provides powerful theoretical guarantee for the scheduling process of practical application so as to improve the efficiency of the workflow and save working resources.
The current workflow scheduling method mostly does not consider the influence of data transmission time in the optimization operation of execution cost and completion time. However, in a large number of data-intensive workflow application instances, the data transmission time of a task is not negligible compared to the task execution time, and for such a workflow scheduling problem, the patent proposes a data-intensive workflow scheduling method based on phase division.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the current lack of evaluation on the influence of data intensive workflow transmission time, the invention provides a data intensive workflow scheduling method based on stage division in a cloud environment.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a data intensive workflow scheduling method based on stage division in a cloud environment comprises the following steps:
in the cloud environment, a plurality of service providers in different regions rent one server for each service provider to utilize hardware resources thereof to execute various computing tasks. Each calculation task description in the workflow does not contain specific processing details, and is completed by a plurality of candidate servers through executing different algorithms, and the same task is delivered to different servers to be executed, so that the same task corresponds to different execution times.
There may be several candidate servers for each task, each server corresponding to a server that solves several but not necessarily every task. The process of cloud workflow scheduling is to decide which server each task should be handed over to for completion, that is, to which server to schedule for execution.
When d is ij >0, direct data transfer is required between the two tasks,and if the two tasks select the same service provider, the data transmission time is 0, otherwise the data transmission time cannot be ignored. If a service provider can only process a single task, both input and output data of the task must be transmitted between the service provider and other service providers.
There are m service providers S in the service provider set S p And (3) participating in the scheduling of the workflow, wherein p is 1. ST ═ ST p ={<t i ,et pi >|et pi Representing service providers s p Performing t i Execution time of t i E {1.. n } }, p ═ 1.. m }, where ST denotes ST p Set of (A), (B), ST p Representing service providers s p Can execute all t i N denotes that there are n tasks,<t i ,et pi >representing a task t i And service provider s p Performing t i The execution time of (1).
And 3, dividing the workflow W into a plurality of stages for scheduling, and enabling the completion time of all tasks in the current stage to be the earliest as possible. The staged scheduling can successively calculate the optimal scheduling result of each stage, so that the final scheduling result is relatively optimal, and the scheduling strategy is more suitable for the workflow with more average transmission time among tasks due to the staged scheduling. According to ST p The specific situation of (1) determining candidate service providers for the tasks and selecting the best service provider for the tasks until all the tasks are distributed, and the scheduling result isr i I.e. t i Is satisfied with the existence of<t i ,et pi >Belong to ST p This condition, wherein: s p Represents t i Final selected facilitator, rft i Represents t i R represents the set of allocation cases for all tasks.
step 51, candidate completion time calculation
For a task t to be allocated j The candidate service provider and the corresponding execution time set are CS j ={<s p ,et pj >|s p ∈S,t j ∈T},s p Representing a facilitator, S representing a set of facilitators, at a number of allocated t i Point to t to be allocated j For CS j Of each service provider s p Calculating t j Completion time ft executed under the server pj ;
Step 53, based on A u ( n*m ) Determining candidate facilitators FS to participate in the distribution i And minimum x column
First of all, it is necessary to pass through A u(n*m) Determining the ith column of newly-added candidate service provider set FS i ,FS i ={s p |s p ∈CS},|FS i I represents the size of the set, and then the satisfying condition needs to be foundThe minimum x in the process, so as to ensure that the number of the service providers participating in the distribution exceeds the number of tasks in the stage, thereby realizing physical parallelism.
Based on the given candidate server FS participating in the allocation obtained in step 53 i And the minimum x column assigns all tasks to different facilitators. Observation A u(n*m) In the x-th column and FS x All corresponding service providers and according to ft pi Sorting from small to large, the sorting result is<s p ,ft pi >.., wherein s p ∈FS x Select the smallest ft in the sorted result pi Confirming that the task performed by the current facilitator is t i The candidate service provider of the rest tasks in the current stage abandons the s p And observing whether the current condition meets the screening condition: and judging that feasible solutions possibly exist in the remaining candidate service providers according to the condition of meeting the screening condition, wherein the number sum of the candidate service providers of the remaining tasks is more than or equal to the number of the unallocated tasks.
If the condition is met, namely the current feasible solution possibly exists, updating the matrix and converting t in the matrix i One line is discarded, and t is i Selected s p Abandoning from other tasks, and continuing to calculate FS of new matrix from step 53 i And x, then executing the screening condition judgment in the step 53, and if all the service providers in the sequencing result are screened completely or have no feasible solution, selecting s with the minimum completion time in the sequencing result p Confirming selection of s p The current task of (1). The rest of the tasks in the current stage abandon the s p And updating the matrix.
Preferably: the distribution stage division method for the workflow in the step 4 comprises the following steps:
in step 41, the degree of entry of the starting point is 0, no point reaches the starting point, and the starting point is arranged at the beginning, i.e. the starting point is divided into starting stages.
And 42, removing the points of the good stages, and screening out nodes of the next stage from the rest nodes in the removed graph, wherein the nodes need to meet the requirement that the degree of income in the current graph is 0.
And 43, after the division in the previous stage is finished, continuing to execute the step 42 until all the nodes are divided.
Preferably: CS in step 51 j Of each service provider s p Calculating t j Completion time ft executed under the server pj :
Wherein,representing a task t i Is transmitted to task t j Amount of data of r i I.e. t i Is assigned the result of, wherein r i .s q Represents t i Final selected facilitator, rf ti Is meant for t i The actual completion time of.
Preferably, the following components: the transmission bandwidth bw is a constant.
Compared with the prior art, the invention has the following beneficial effects:
the invention considers the influence of the transmission time of the data intensive workflow, provides support for the scheduling method of practical application and is beneficial to improving the execution efficiency of the workflow.
Drawings
FIG. 1 is a workflow definition example diagram.
Fig. 2 is a diagram showing the correspondence between tasks, service providers, and servers.
FIG. 3 is a workflow scheduling framework diagram.
Fig. 4 is a graph of execution time of each task candidate facilitator in the detailed embodiment.
FIG. 5 is a diagram illustrating task allocation at various stages in an exemplary embodiment.
Fig. 6 is a schematic flow chart of the invention.
Detailed Description
The present invention is further illustrated in the accompanying drawings and described in the following detailed description, it is to be understood that such examples are included solely for the purposes of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications of the invention will become apparent to those skilled in the art after reading the present specification, and it is intended to cover all such modifications as fall within the scope of the invention as defined in the appended claims.
A data intensive workflow scheduling method based on stage division in a cloud environment is shown in FIG. 6, and includes the following steps:
Acquiring workflow information, establishing a DAG graph according to the workflow information, representing the workflow by the DAG graph, and representing the workflow based on the DAG graph, as shown in FIG. 1, wherein W is<T,D>W is the workflow that needs to be scheduled, where: t ═ T i N, T denotes a workflow W task set i Representing the ith task of the workflow W. D ═ D ij I, j 1.. n }, D denotes the set of transmitted data quantities in the workflow W, where D is the set of transmitted data quantities in the workflow W ij Denotes t i It is necessary to go to t j The amount of data transferred. The transmission time size may be calculated as dt ij V. bw (bw is t i And t j By default, a constant).
We abstract the structure of the workflow, and represent the workflow by a DAG graph, and consider the workflow as a directed weighted graph, where a front task is connected with a back task, the front and back tasks have data dependency, and the edges have weights, i.e. the transmission time between tasks, which is a typical data flow structure. Providing theoretical basis for workflow scheduling afterwards.
And 2, defining a task candidate service provider.
In a cloud environment, there are several servers in different regions, and each server is assumed to rent a server to perform various computing tasks by using hardware resources of the server. Each computation task description in the workflow does not contain specific processing details, and can be completed by a plurality of candidate servers through executing different algorithms, and the same task is delivered to different servers to be executed, so that the execution time is different.
The correspondence among tasks, servers, and servers in a workflow is shown in fig. 2. There may be multiple candidate facilitators per task. Each service provider corresponds to one server, and can solve a plurality of tasks but not necessarily can solve each task. The process of cloud workflow scheduling is to decide which server each task should be handed over to for completion, that is, to which server to schedule for execution. When d is ij >When the data transmission time is 0, direct data transmission is needed between the two tasks, and if the two tasks select the same service provider, the data transmission time is 0, otherwise, the data transmission time cannot be ignored. If a service provider can only process a single task, both input and output data of the task must be transmitted between the service provider and other service providers. In order to reduce the transmission time to a greater extent, the present document considers the service providers with more tasks to be solved as candidates when screening the service providers.
Assume that there are m facilitators S in the facilitator set S p (p 1.. m) participating in the scheduling of the workflow, in order to more accurately express the association between each candidate facilitator and the task to be scheduled, a correlation definition is given as follows: ST ═ ST p ={<t i ,et pi >|et pi Representing service providers s p Performing t i Execution time of t i ∈{1...n}},p=1...m}。
In the step, the concept of the candidate service providers of the tasks is completely provided, the corresponding candidate service providers are provided for each task, the candidate service providers execute the tasks corresponding to different execution times, and a data source is provided for further calculating the workflow completion time.
And step 3, a workflow scheduling framework.
Different task scheduling results determine different time costs in terms of both execution time and transmission time, and a scheduling method is provided herein to reduce the completion time of the entire workflow.
The basic flow of the workflow scheduling is to divide the workflow W into a plurality of stages for scheduling, and the basic allocation strategy is to make all operators in the current stage as possibleThe completion time of the transaction is the earliest. The division into multiple stages of scheduling mainly considers that the complexity of global scheduling is too large and belongs to the NP-hard problem, and the staged scheduling can successively calculate the better scheduling result of each stage, so that the obtained final scheduling result is relatively better. Due to the fact that the scheduling is carried out in a staged mode, the scheduling strategy is more suitable for the workflow with the transmission time between tasks being more average. According to the above scheduling concept and ST p The specific situation of (1) determining candidate service providers for the tasks and selecting the best service provider for the tasks until all the tasks are distributed, and the scheduling result is r i I.e. t i The result of the assignment of (1), wherein: s p Represents t i The final selected facilitator. rft i Represents t i The actual completion time of. The workflow scheduling framework is shown in fig. 3.
In this step, a workflow scheduling method framework of a stage division concept is provided, and it is ensured that a proper workflow can execute scheduling according to the workflow scheduling framework.
And 4, dividing the workflow stage.
The execution condition of each task in the workflow is that the predecessor task is completely executed and data is transmitted to the current task, while the workflow given based on the definition 1 form can derive potential time sequence predecessor among tasks through data dependence, so the distribution stage division of the workflow is mainly carried out based on the data dependence. TS (transport stream) u ={t i |t i Is the task of the u-th stage, u 1,2 u Set of tasks, TS, representing the u-th phase u And | represents the number of tasks in the u-th stage. The specific dividing idea is as follows:
in step 41, the in-degree of the starting point is 0, no point can reach it, and the starting point can be arranged at the beginning, i.e. the starting point is divided into the starting stages.
And 42, removing the points of the good stages, and screening out nodes of the next stage from the rest nodes in the removed graph, wherein the nodes need to meet the requirement that the degree of income in the current graph is 0.
And 43, after the division in the previous stage is finished, continuing to execute the step 42 until all the nodes are divided.
In the step, the workflow is divided into stages, the execution condition of each task in the workflow is that the predecessor task is completely executed and data is transmitted to the current task, and the potential time sequence predecessor among the tasks can be derived based on data dependence based on the abstract workflow structure in the step 1), so that the distribution stage division of the workflow is mainly carried out based on the data dependence.
Step 51, candidate completion time calculation
For a task t to be allocated j The candidate service provider and the corresponding execution time set are CS j ={<s p ,et pj >|s p ∈S,t j E.g. T. At a number of allocated t i In case of pointing to tj to be allocated, to CS j Of each service provider s p The following formula is executed to calculate the completion time ft of tj execution under the server pj . ri is t i Is assigned as a result of where r i .s q Represents t i Final selected facilitator, rf ti Is denoted by t i The actual completion time of.
Step 53, based on A u ( n*m ) Determining candidate facilitators FS to participate in the distribution i And minimum x column
First of all, it is necessary to pass through A u(n*m) Determining the ith column of newly-added candidate service provider set FS i ,FS i ={s p |s p ∈CS},|FS i I represents the size of the set, wherein the specific process is shown in algorithm 1, and then it is necessary to find the condition satisfyingThe minimum x in the process, so as to ensure that the number of the service providers participating in the distribution exceeds the number of tasks in the stage, thereby realizing physical parallelism. The specific process is shown in algorithm 2.
The main purpose of the present step is based on the FS given in step 53 i And x assigns all tasks to different facilitators for this phase. Observation A u(n*m) Column x and FS x All corresponding service providers and according to ft pi Sorting from small to large, the sorting result is<s p ,ft pi >.., wherein s p ∈FS x Select the smallest ft in the sorted results pi Confirming that the task performed by the current facilitator is t i . The candidate facilitator of the rest of the tasks in the current stage discards the s p . Observing whether the current situation meets the screening condition: the number sum of the candidate service providers of the remaining tasks is larger than or equal to the number of the unallocated tasks, and feasible solutions of the remaining candidate service providers can be judged if the screening conditions are met. Detailed description of the preferred embodimentAs shown in algorithm 3.
If the condition is met, namely the current feasible solution possibly exists, updating the matrix and converting t in the matrix i One line is discarded, and t is i Selected s p Abandon it from other tasks, continue to calculate FS of new matrix from step 53 i And x, and then the screening condition judgment in the step 3 is performed. If all the service providers in the sorting result are screened completely or have no feasible solution, s with the minimum completion time in the sorting result is selected p Confirming to select s p The current task of (1). The rest of the tasks in the current stage abandon the s p The matrix is updated, and the process proceeds to step 53 and step 54. A detailed description of the entire allocation method is shown in algorithm 4.
In this step, the workflow calculates the completion time of each facilitator executing the task aiming at a certain task to be distributed, arranges the completion time of each task in the current stage executed by the candidate facilitator through a matrix, distributes all tasks from back to front according to a scheduling algorithm, and distributes all tasks to different facilitators until the tasks in all stages are distributed.
Examples of the invention
In order to better understand the technical content of the present invention, a specific scheduling example is given and described with reference to the attached drawings.
Workflow specific information andthe service provider information is as follows. The workflow W contains a total of 20 tasks, where t 1 To start a task, t 20 To end the task, d ij Is listed in Table 1, and bw in this example is taken to be 1, i.e., the amount of task transmission time d, for the convenience of subsequent calculations ij /bw=d ij 。
TABLE 1 table of transmission between tasks
d 1,2 | 43 | d 1,3 | 37 | d 1,4 | 25 |
d 1,5 | 70 | d 2,5 | 46 | d 3,6 | 53 |
d 3,7 | 40 | d 3,11 | 76 | d 4,7 | 29 |
d 4,8 | 38 | d 4,13 | 88 | d 5,9 | 57 |
d 6,9 | 34 | d 6,10 | 29 | d 6,11 | 40 |
d 6,16 | 104 | d 7,12 | 42 | d 7,13 | 55 |
d 8,14 | 58 | d 8,15 | 57 | d 9,16 | 74 |
d 10,17 | 69 | d 11,17 | 49 | d 12,18 | 50 |
d 13,18 | 46 | d 14,18 | 31 | d 15,19 | 38 |
d 15,20 | 104 | d 16,20 | 43 | d 17,20 | 60 |
d 18,20 | 32 | d 19,20 | 35 |
Candidate facilitator set S ═ S for the entire workflow 1 ,s 2 ,s 3 ,s 4 ,s 5 ,s 6 ,s 7 }. The candidate facilitators for each task and their execution times are shown in fig. 4.
Allocation scheme according to the initial phase, t 1 Selecting quilt s 4 And (6) executing. And then, the scheduling scheme of each stage is based on the result of the scheduling scheme of the previous stage, the completion time of each task executed by different service providers in the current stage is firstly calculated, the completion time of each task is sequenced from small to large, and each task is allocated to different service providers to be executed according to an algorithm 4.
Taking the execution process of the fourth stage as an example, first, the allocation step 1 is executed, x is calculated to be equal to 3, and FS is calculated 3 ={s 5 ,s 6 And (4) sorting the completion time from small to large according to the possible current feasible solutions in the first three columns<s 5 ,342>,<s 6 ,350>,<s 6 ,381>,<s 6 ,383>,t 9 Quilt s 5 Executing, if the screening condition is not met, selecting t according to the sequence 10 Quilt s 6 Execution, remaining tasks discard s 6 And then, continuing to execute the distribution step 1, calculating to obtain that x is equal to 3, and obtaining a subsequent distribution result according to an algorithm 4.
In the task allocation diagram of each stage, the data of each row represents the completion time of the task executed by different service providers, the completion times are sequentially arranged from left to right, the scheduling result of the task is shown in bold, and as can be seen from fig. 5, the final completion time of the workflow W is 637. Compared with other workflow scheduling methods, the workflow completion time is optimized to a certain extent.
In summary, the invention provides a data intensive workflow scheduling method based on stage division in a cloud environment, which is used for optimizing the overall completion time of the data intensive workflow in the cloud environment, providing support for a scheduling method of practical application, and simultaneously contributing to the improvement of the workflow execution efficiency. The invention applies the traditional workflow scheduling idea to the migration innovation of data intensive workflow in the cloud environment.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (4)
1. A data intensive workflow scheduling method based on stage division in a cloud environment is characterized by comprising the following steps:
step 1, abstracting a workflow structure: obtaining workflow information, and establishing DA according to the workflow informationG graph, representing workflow by DAG graph, W ═<T,D>W is a workflow to be scheduled, comprising n tasks, T ═ T i I 1 … n, T denotes a workflow W task set, T i Denotes the ith task of the workflow W, D ═ D ij I, j ═ 1 … n }, D denotes the set of transmitted data volumes in the workflow W, D ij Represents t i Need to go to t j Amount of data transferred, transfer time size d ij Bw, bw is t i And t j The transmission bandwidth in between;
step 2, defining a task candidate service provider:
under the cloud environment, a plurality of service providers in different regions exist, and each service provider rents a server to execute various computing tasks by using hardware resources of the service provider; each calculation task description in the workflow does not contain specific processing details, and is completed by a plurality of candidate servers through executing different algorithms, and the same task is submitted to different servers to be executed and corresponds to different execution times;
each task has a plurality of candidate service providers; the cloud workflow scheduling process is to determine which server each task should be handed to for completion, that is, to which server to schedule for execution;
when d is ij >When the data transmission time is 0, the data transmission time is 0 if the two tasks select the same service provider, otherwise, the data transmission time cannot be ignored; if a certain service provider can only process a single task, the input and output data of the task must be transmitted between the service provider and other service providers;
there are m service providers S in the service provider set S p And (3) participating in the scheduling of the workflow, wherein p is 1 … m, and the association between each candidate service provider and the task to be scheduled is as follows: ST ═ ST p ={<t i ,et pi >|et pi Representing service providers s p Performing t i Execution time of t i E.g., T, i e {1 … n } }, p 1 … m }, where ST denotes ST p Set of (A), (B), ST p Representing service providers s p Can execute all t i N represents that there are n slavesThe business is to be conducted,<t i ,et pi >representing a task t i And service provider s p Performing t i The corresponding relation of the execution time of (c);
step 3, dividing the workflow W into a plurality of stages for scheduling, and enabling the completion time of all tasks in the current stage to be the earliest as much as possible; the staged scheduling can successively calculate a better scheduling result of each stage, so that a final scheduling result is relatively better, and the scheduling strategy is more suitable for a workflow with more average transmission time among tasks due to the staged scheduling; according to ST p The specific situation of (1) determining candidate service providers for the tasks and selecting the best service provider for the tasks until all the tasks are distributed, and the scheduling result isr i I.e. t i Is satisfied with the existence of<t i ,et pi >Belong to ST p This condition, wherein: s p Represents t i Final selected facilitator, rft i Represents t i R represents the set of allocation conditions for all tasks;
and 4, the execution condition of each task in the workflow is that the predecessor task is completely executed and data is transmitted to the current task, the potential time sequence predecessor among the tasks is derived by the workflow through data dependence, so the distribution stage division of the workflow is mainly based on the data dependence development, TS u ={t u |t u Is the task of stage u, u being 1,2 … l, TS u Set of tasks, TS, representing the u-th phase u I represents the number of tasks in the u stage;
step 5, task scheduling:
step 51, candidate completion time calculation
For a task t to be allocated j Its candidate facilitator and corresponding execution time set are denoted as CS j ={<s p ,ft pj >|s p ∈S,t j ∈T},s p Representing facilitators, S representing a set of facilitators, among several allocatedt i Point to t to be allocated j For CS j Each service provider s p Calculating t j Completion time ft executed under the server pj ;
Step 52, calculating the completion time of each task executed by the candidate facilitator at the current stage, which needs to pass through the matrix A u(n*m) Arranging the data to complete the allocation, A u(n*m) =[<s p ,ft pi >]n*m,A u(n*m) A matrix is formed by the completion time of all tasks to be distributed after being executed at different candidate service providers in the u stage; wherein each row corresponds to the completion time of a task when executed on different candidate service providers, and the values are arranged from small to large in sequence, ft pi Denotes s p Performing t i The completion time of (c);
step 53, based on A u ( n*m ) Determining candidate facilitators FS to participate in the distribution i And minimum x column
First of all, the first step is to pass through A u(n*m) Determining the ith row of newly-added candidate service provider set FS i ,FS i ={s p |s p ∈S},|FS i I represents the size of the set, and then the satisfying condition needs to be foundThe minimum x in the distribution table is used for ensuring that the number of the service providers participating in distribution exceeds the number of tasks in the stage, so that physical parallelism is realized;
step 54, based on FS i And x developing the current stage of allocation
Based on the given candidate service provider FS participating in the distribution obtained in step 53 i And the minimum x column assigns all tasks to different facilitators; observation A u(n*m) Column x and FS x All corresponding service providers, and according to ft pi Sorting from small to large, the sorting result is<s p ,ft pi >… } where s is p ∈FS x Select the smallest ft in the sorted results pi Confirming that the task performed by the current facilitator is t i Current phase rest tasksCandidate facilitator of (2) abandons the s p And observing whether the current condition meets the screening condition: judging whether the remaining candidate service providers have feasible solutions according to the condition that screening conditions are met, wherein the number sum of the candidate service providers of the remaining tasks is more than or equal to the number of the unallocated tasks;
if the condition is met, namely the current feasible solution exists, updating the matrix and converting t in the matrix i One line is discarded, and t is i Selected s p Abandon it from other tasks, continue to calculate FS of new matrix from step 53 i And x, then executing the screening condition judgment in the step 53, and if all the service providers in the sequencing result are screened completely or have no feasible solution, selecting s with the minimum completion time in the sequencing result p Confirming selection of s p The current task of (2); the rest tasks in the current stage abandon the s p And updating the matrix.
2. The phase-division-based data-intensive workflow scheduling method in the cloud environment according to claim 1, wherein: the distribution stage division method for the workflow in the step 4 comprises the following steps:
step 41, the degree of entrance of the starting point is 0, no point reaches the starting point, the starting point is arranged at the beginning, namely the starting point is divided into a starting stage;
42, removing the points of the divided stages, and screening out nodes of the next stage from the rest nodes in the removed graph, wherein the nodes need to meet the requirement that the degree of income in the current graph is 0;
and step 43, after the division in the previous stage is finished, continuing to execute step 42 until all the nodes are divided.
3. The phase-division-based data-intensive workflow scheduling method in the cloud environment according to claim 2, wherein: CS in step 51 j Each service provider s p Calculating t j Completion time ft executed under the server pj :
Wherein, Δ d ij Representing a task t i Is transmitted to task t j Amount of data of r i I.e. t i Is assigned the result of, wherein r i .s q Represents t i Final selected facilitator, rf ti Is denoted by t i The actual completion time of.
4. The method for data-intensive workflow scheduling based on staging in a cloud environment according to claim 3, wherein: the transmission bandwidth bw is a constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010033432.8A CN111274009B (en) | 2020-01-13 | 2020-01-13 | Data intensive workflow scheduling method based on stage division in cloud environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010033432.8A CN111274009B (en) | 2020-01-13 | 2020-01-13 | Data intensive workflow scheduling method based on stage division in cloud environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274009A CN111274009A (en) | 2020-06-12 |
CN111274009B true CN111274009B (en) | 2022-08-30 |
Family
ID=71001892
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010033432.8A Active CN111274009B (en) | 2020-01-13 | 2020-01-13 | Data intensive workflow scheduling method based on stage division in cloud environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274009B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114629959B (en) * | 2022-03-22 | 2023-11-17 | 北方工业大学 | Cloud environment context-aware internet traffic (IoT) service scheduling policy method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628665A (en) * | 2018-05-16 | 2018-10-09 | 天津科技大学 | Task scheduling based on data-intensive scientific workflow and virtual machine integration method |
CN110489214B (en) * | 2019-06-19 | 2022-09-20 | 南京邮电大学 | Dynamic task allocation for data intensive workflows in a cloud environment |
-
2020
- 2020-01-13 CN CN202010033432.8A patent/CN111274009B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111274009A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alkhanak et al. | A hyper-heuristic cost optimisation approach for scientific workflow scheduling in cloud computing | |
CN107301500B (en) | Workflow scheduling method based on key path task look-ahead | |
CN107992359B (en) | Task scheduling method for cost perception in cloud environment | |
CN106447173A (en) | Cloud workflow scheduling method supporting any flow structure | |
CN112364590B (en) | Construction method of practical logic verification architecture-level FPGA (field programmable Gate array) wiring unit | |
CN113742089B (en) | Method, device and equipment for distributing neural network computing tasks in heterogeneous resources | |
CN110008013A (en) | A kind of Spark method for allocating tasks minimizing operation completion date | |
CN112148468B (en) | Resource scheduling method and device, electronic equipment and storage medium | |
CN110347489B (en) | Multi-center data collaborative computing stream processing method based on Spark | |
WO2020186872A1 (en) | Expense optimization scheduling method for deadline constraint under cloud scientific workflow | |
Li et al. | An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters | |
CN106934539B (en) | Workflow scheduling method with deadline and expense constraints | |
CN112306642B (en) | Workflow scheduling method based on stable matching game theory | |
CN111274009B (en) | Data intensive workflow scheduling method based on stage division in cloud environment | |
Guan et al. | Fleet: Flexible efficient ensemble training for heterogeneous deep neural networks | |
CN110084507B (en) | Scientific workflow scheduling optimization method based on hierarchical perception in cloud computing environment | |
CN110048966B (en) | Coflow scheduling method for minimizing system overhead based on deadline | |
CN110489214B (en) | Dynamic task allocation for data intensive workflows in a cloud environment | |
CN111026534B (en) | Workflow execution optimization method based on multiple group genetic algorithms in cloud computing environment | |
CN110119268B (en) | Workflow optimization method based on artificial intelligence | |
Nematpour et al. | Enhanced genetic algorithm with some heuristic principles for task graph scheduling | |
CN116681245A (en) | Method and device for selecting workers and dispatching tasks in crowdsourcing system | |
CN116800610A (en) | Distributed data plane resource optimization method and system | |
CN110968428B (en) | Cloud workflow virtual machine configuration and task scheduling collaborative optimization method | |
In et al. | Policy-based scheduling and resource allocation for multimedia communication on grid computing environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: No. 66, New Model Road, Gulou District, Nanjing City, Jiangsu Province, 210000 Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS Address before: No.9 Wenyuan Road, Yadong Xincheng District, Qixia District, Nanjing, Jiangsu Province, 210000 Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |