CN115185651A

CN115185651A - Workflow optimization scheduling algorithm based on cloud computing

Info

Publication number: CN115185651A
Application number: CN202210478776.9A
Authority: CN
Inventors: 许海峰; 谭善鑫; 刘心彤
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-10-14

Abstract

The invention provides a workflow scheduling optimization method based on cloud computing, and provides an improved differential pollination workflow scheduling algorithm aiming at the structure, characteristics and workflow optimization scheduling in a cloud computing environment. The algorithm firstly analyzes the structures and the characteristics of a workflow scheduling model and a cloud computing resource model, obtains two functions mainly provided by a workflow system in cloud computing, and respectively provides resources for task selection and allocates proper virtual machines to execute corresponding tasks. Therefore, a three-layer workflow scheduling model based on cloud computing is provided, global pollination and local pollination of a flower pollination algorithm are correspondingly improved, and finally a differential pollination workflow scheduling algorithm in a cloud computing environment is provided. Compared with the traditional single-target optimization algorithm, the optimization method has obvious optimization effect under the same constraint condition on the virtual machine. Therefore, the invention has certain feasibility and effectiveness.

Description

Workflow optimization scheduling algorithm based on cloud computing

Technical Field

The invention relates to a manufacturing process scheduling optimization method based on a workflow, and belongs to the field of intelligent calculation and scheduling optimization.

Background

Data storage and calculation requirements of large enterprises are increasing day by day, and in the face of so many data processing requirements, workflow optimization problems in the enterprises are very important. Cloud computing has the advantages of low cost, large storage scale, high computing power and the like, and therefore is being widely applied to various fields.

Cloud computing services, cloud services for short, are a method of uniformly managing and scheduling computing resources and then providing services to users as needed. A workflow is a collection of tasks, wherein a task may depend on the execution of one or more tasks. The set of directed edges between tasks in the workflow can represent the dependency of the tasks, whether the execution of one task is established or not needs to see whether the dependency relationship meets the sequential dependency and the data dependency, the sequential dependency ensures that all predecessor tasks of the current task are executed completely, and the data dependency ensures that the execution data required by the task is correspondingly received after the previous task is executed completely.

Compared with the traditional distributed system, the cloud computing has the characteristics of complexity and changeability, so that influence factors on the evaluation standards are increased, if the evaluation standards cannot be optimized and balanced well, the cost is increased, and the use and experience of a user are influenced. The invention aims to research a cloud workflow execution time optimization scheduling algorithm, fully utilize task scheduling time gaps, save execution time, meet user requirements, reduce user cost and provider cost and achieve the purpose of sustainable development.

Disclosure of Invention

In order to achieve the purpose, the invention provides a differential pollination workflow optimization scheduling algorithm based on cloud computing, which comprises the following steps:

(1) And establishing a cloud computing workflow scheduling model according to the workflow execution relation.

(2) And calculating the consumption of workflow execution, and writing an algorithm to preprocess the workflow task according to the priority.

(3) The global and local improvement of the algorithm is realized through the cross operation, the exchange mutation and the inverse mutation of the task sequence.

(4) The differential pollination workflow optimization algorithm is provided according to the improvement

The step (1) is specifically as follows:

(1.1) creating a directed edge set between tasks according to the task dependency, which can be represented as a quadruple, P = (K, E, Z, X).

(1.2) task R represented by R _i Set of (1), 0<＝i<= N. Total number of tasks N, total number of edges E.

(1.3) by directed edge e _ij Will task r _i And task r _j Is connected, and 0<＝i，j<= N, i ≠ j, indicating dependency between tasks K _i All direct front-off node assemblies are parent (k) _i ) All the direct successor nodes are aggregated to child (k) _i )。

(1.4) the computation weight Z is measured by the number of commands and the transmission weight X is measured by the number of bits.

(1.5) Add virtual Start task node r _start And end task node r _end To satisfy both sequential dependency and data dependency execution workflows.

(1.6) creation of V _m Representing a set of virtual machines, vm _r A set of representations V _m The r-th virtual machine of (1).

(1.7) computing a Single virtual machine v _mr Computing power of (d), using PC (v) _mr ) And (4) showing.

(1.8) establishing a cloud computing-based three-layer workflow scheduling model meeting various requirements, wherein the three-layer workflow scheduling model comprises the following steps: a resource layer, a scheduling layer and a user layer.

The step (2) is specifically as follows:

(2.1) the optimization objective according to the purpose of the workflow can be expressed as formula (2-1):

Makespan＝minimize(t _end ) (2-1)

(2.2) calculating the workflow scheduling cost, wherein the workflow scheduling cost is mainly used for the calculation overhead of the virtual machine of cloud computing, and a calculation formula can be obtained from a formula (2-2) table:

(2.3) wherein UC (vm) _r ) Representing virtual machines vm _r Per unit of execution cost. The energy consumption of the modern processor integrated circuit is mainly from dynamic energy consumption E _dynamic The Dynamic energy consumption of the processor is reduced by adopting a DVFS (Dynamic Voltage and Frequency Scaling) technology design model, and the Dynamic energy consumption E is reduced _dynamic Can be obtained from equation (2-3):

P _dynamic ＝ACv ² f (2-3)

(2.4) where AC is a constant value depending on the device, v denotes a supply voltage, and f denotes a clock frequency. Therefore, the energy consumption in the cloud computing environment is the sum of the energy consumption of all the rented virtual machines and can be calculated by the formula (2-4):

(2.5) in order to maximize profits, the cloud service provider also needs to consider the performance parameter of the resource utilization rate, which can be calculated by the formula (2-5):

(2.6) wherein et _k Representing the time period, tt, for which a task is executed on a virtual machine _k The higher the whole time period representing the virtual machine is leased, the smaller the idle time of the resource is, and conversely, the more the idle time of the resource is, the more the resource is wasted.

(2.7) in order to apply the flower pollination algorithm to workflow scheduling on cloud computing, the flower pollination algorithm needs to be improved: before scheduling is adopted, priority calculation is carried out on tasks of a workflow, and the tasks are layered according to priority.

(2.8) define the upward priority of the task, representing the longest distance between task ti and bundle task tend. The calculation method is to calculate the upward priority of each task from back to front, i.e. from tend, and once forward, knowing that the task starts from the workflow DAG graph recursively, the upward priority of the task can be calculated by formula (2-6):

β _up ＝max(TT(t _i ,t _j )+ET(t _i )+β _up (t _j ))t _j ∈child(t _i ) (2-6)

(2.9) writing algorithm (1): pseudo code for priority hierarchical operation of tasks

The step (3) is specifically as follows:

(3.1) Global pollination according to the flower pollination algorithm can be represented by the mathematical formula (2-1):

(3.2) because the distance between the biological global pollination is long, the cross operation of the task sequence is adopted to carry out the cross global pollination operation on the flower individuals.

(3.3) based on the task sequence, the cross operation is obtained on the flowers A and B, and the pollen position 2,3,6 is randomly selected.

(3.4) pollen t selected for flower A ₂ ，t ₃ And t ₆ Pollen t in flower B ₂ ，t ₃ And t ₆ And performing price difference operation on the flower B according to the same pollen sequence of the flower A to obtain a new flower B at positions 1,3 and 7 respectively, and performing the same cross operation on the flower A to obtain a new flower A in the same way to obtain new cross maps of the flower A and the flower B.

(3.5) thereby obtaining algorithm (2): cross global pollination operation pseudo code

(3.6) the priority layering operation of the tasks can be completed in small steps or large steps for changing the scheduling sequence of the tasks. For small steps, crossover mutations can be used and two randomly selected task positions can be individually adapted, e.g.crossover mutations A, B. For the larger step, inverse mutation is used, two tasks are randomly selected, and all tasks between the tasks are subjected to inverse sequence operation, such as inverse mutation A and B.

(3.7) writing (2) mutation local pollination operation algorithm according to the two mutation operations

The step (4) is specifically as follows:

with the above improved strategy, the DMFPA algorithm is proposed as follows:

drawings

FIG. 1 cloud computing workflow scheduling model

FIG. 2 Cross-over, crossover and reverse mutations,

FIG. 3 is a flow chart of the DMFPA algorithm

FIG. 4 GA-Budget based execution time difference ratio

FIG. 5 GA-Deadline-based scheduling cost difference ratio

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Step 1: as shown in fig. 1, the actual workflow relationship is created according to the three-layer model workflow model to execute the following operations, which are resource layers: providing resources for upper layer application to select, and the scheduling layer: for the scheduling of the tasks, different resource comparison and selection processes are carried out, and the user layer comprises the following steps: the user can set services according to the requirements of the user and submit the workflow tasks to the scheduling layer.

Step 2: as shown in fig. 2, the actually input parameters A, B are performed with a crossover operation, an exchange mutation, and an inverse mutation, which are respectively an improvement of the global algorithm and an improvement of the local algorithm, in order to improve the value of the fitness and the quality of the solution to obtain the optimal solution.

And step 3: as shown in fig. 3, which is a flow chart of the DMFPA algorithm, the workflow is first prioritized and layered, then the global algorithm or the local algorithm is selected to be executed according to the comparison between the decision random number and the transition probability, and finally the final scheduling scheme is output by executing the relevant operation when the iteration number is satisfied.

Different workflow combinations and modeling methods are used in workflow scheduling studies, the workflows used herein contain between 10 and 100 tasks and the selection probability of each workflow is the same. Nodewight represents the computational weight of the workflow and EdgeWeight represents the communication weight of two tasks between the workflow. NodeToEdgeWeight represents the calculated traffic ratio. The comparison algorithms adopted in the experiment are GA-Deadline, GA-Budget, GA-EC and MOPSO.

The execution time difference ratio of each algorithm under different types of working flows after the algorithm is executed is shown in fig. 4. As can be seen from the figure, the difference between the execution times of the GA-Budget and the GA-Budget is 0, which reveals from the side that the execution time of the GA-Budget is the longest, because the GA-Budget tries to select the least expensive computing device to execute the task to achieve the minimum scheduling cost, the result is that the execution time is significantly increased, and the execution time of the DMFPA can be increased by about 10% -20% compared with that of the GA-EC and the GA-Budget respectively. Compared with GA-Budget, the MOPSO algorithm can correspondingly reduce the execution time by 3% -18% according to different workflow structures. In all of these cases, however, the proposed algorithm DMFPA is executed at a smaller time than the MOPSO algorithm.

The difference ratio of the scheduling cost of each algorithm under different types of working flows is shown in figure 5. As can be seen from the figure, the scheduling cost difference ratio of the GA-deadlines is 0, which reveals from the side that the execution time of the GA-deadlines is the shortest, because the GA-deadlines attempt to select the computing device with the highest price and performance to execute the task to achieve the situation with the shortest execution time, thereby causing the scheduling cost to increase significantly. Compared with GA-EC and GA-Deadline, the DMFPA algorithm can save the scheduling cost by 20-30 percent respectively. In the case of LNodeHEdge, the scheduling cost difference ratio of the proposed algorithm DMFPA and MOPSO algorithm is the same, in the case of LNodeHEdge and HNodeHEdge, the proposed algorithm DMFPA is superior to MOPSO algorithm, and in the case of rnodehedge and HNodeHEdge, the MOPSO algorithm is superior to the proposed algorithm DMFPA.

After each algorithm execution, it can be seen from table 1 that under the limitation of deadline and budget proposed by the user, the proposed DMFPA algorithm can obtain relatively less execution time and scheduling cost compared with the GA-based algorithm in most cases. The algorithm DMFPA can schedule 90% of the workflow to be completed within the specified budget, and can schedule 84% of the workflow to be completed within the specified deadline, and the data shows that the DMFPA algorithm has certain advantages over the algorithm MOPSO under the limit conditions of the deadline and the budget.

As can be seen from table 2, the resource utilization rate of the condition fertilizer analysis algorithm after the execution of each algorithm is between 40.2% and 46%, and a good resource utilization rate is achieved in consideration of the execution time, the scheduling cost, and the energy consumption. The complexity of the proposed DMFPA algorithm consists of the following parameters: population initialization, function fitness value, transition probability, global and local pollination. Of all these parameters, the function fitness value is the most important part of the algorithm. In order to randomly generate an initial population, the number of tasks required is N, the number of virtual machines is M, the number of populations is P, the complexity of mapping tasks and virtual machines is O (N + P + 1), the fitness value complexity of execution time is O (NM + NM2+ M2), the fitness value complexity of scheduling cost is O (NM), the fitness value complexity of energy consumption is O (NMh), the global and local pollination complexity is O (P), and for the overall DMFPA algorithm, the time complexity is O (N + P + NM2+ M2+ NMh).

Furthermore, it should be understood that although the specification describes embodiments, not every embodiment includes only a single embodiment, and such description is for clarity purposes only, and it will be understood by those skilled in the art that the specification as a whole and the embodiments may be combined as appropriate to form other embodiments as would be understood by those skilled in the art.

Claims

1. A workflow optimization scheduling algorithm based on cloud computing comprises the following specific steps:

(2) The tasks are preprocessed according to their priorities prior to workflow execution.

(3) The global algorithm and the local algorithm of the flower pollination algorithm are improved.

2. The workflow optimization scheduling algorithm based on cloud computing of claim 1, wherein: the step (1) is specifically as follows: the workflow system under the cloud environment is decomposed into two stages: the resource providing stage and the resource scheduling stage are respectively corresponding to a resource layer and a scheduling layer in the three-layer model, and the functions of the resource providing stage and the resource scheduling stage are respectively the processes of providing various computing resources and performing comparison and selection of different resources for task scheduling. And adding the uppermost user layer in the three-layer model, and submitting the workflow task to the scheduling layer according to the QoS required by the user.

3. The workflow optimization scheduling algorithm based on cloud computing according to claim 2, wherein: the step (2) is specifically as follows: firstly, the execution time of the tasks, the cost and the energy consumption of the calculation tasks and the resource utilization rate of the calculation tasks are considered, and the workflow is preprocessed according to the parameters and the workflow to be executed without destroying the interdependence among all the tasks.

4. The workflow optimization scheduling algorithm based on cloud computing of claim 3, wherein: the step (3) is specifically as follows: and performing cross operation on the task sequence of global pollination in the flower pollination algorithm to obtain a better new solution and fitness function value. The cross mutation or the reverse mutation is proposed for the local pollination in the flower pollination algorithm, and because the task priority is hierarchically operated, the constraint limiting conditions of the task can be met when the cross mutation and the reverse mutation are executed, so that a more optimal solution can be obtained.