CN115080236B

CN115080236B - Workflow deployment method based on graph segmentation

Info

Publication number: CN115080236B
Application number: CN202210730454.9A
Authority: CN
Inventors: 马英红; 吝李婉; 焦毅; 李红艳; 刘伟; 刘勤; 张琰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2024-04-16
Anticipated expiration: 2042-06-24
Also published as: CN115080236A

Abstract

The invention discloses a workflow deployment method based on graph segmentation, which mainly solves the problem that the task parallel execution efficiency is lower because the communication overhead is minimized at the expense of the task parallel execution efficiency in a workflow in the conventional workflow deployment algorithm based on clustering. The implementation scheme is as follows: 1) Establishing a workflow directed acyclic graph DAG model G; 2) Calculating task execution time and data transmission time among tasks in the workflow; 3) Merging the serial structures in the workflow model G to obtain a new workflow model diagram G'; 4) Dividing the new workflow model graph G' to obtain an optimal task partition; 5) And mapping the optimal task partition to the virtual machine according to the minimum execution time to complete the deployment of the workflow. The invention reduces the completion time of the workflow, improves the execution efficiency of the workflow, and can be used for the joint optimization of the data overhead and the task parallel execution efficiency in the workflow execution process.

Description

Workflow deployment method based on graph segmentation

Technical Field

The invention belongs to the technical field of cloud computing, and particularly relates to a workflow deployment method which can be used for joint optimization of data overhead and task parallel execution efficiency in a workflow execution process.

Background

In a cloud computing environment, a workflow refers to an associated task consisting of a set of dependent tasks, typically described in terms of a directed acyclic graph, DAG. Workflow is larger in scale and more complex in structure than independent tasks. The deployment of the workflow not only needs to consider the resource allocation of each task in the workflow, but also needs to consider the data transmission and execution sequence among the tasks, thereby greatly increasing the complexity of the task deployment. How to more scientifically and reasonably deploy workflows in a distributed and heterogeneous environment is still a research hotspot in the current academia.

Currently, clustered computing architectures such as MapReduce, spark are widely used in data center networks to analyze and process ever-increasing computing and networking tasks, break down complex large-scale tasks into simpler tasks and model them as workflows, and deliver them to cloud data centers with powerful parallel processing capabilities. A workflow comprises a plurality of tasks with interdependencies, which are connected to each other according to a certain order of priority, and the workflow has to be deployed taking into account the data transfer between the tasks. Studies have shown that in MapReduce applications, the time taken for intermediate data transmission is more than 30% of the overall workflow completion time, and for some large-scale commercial data centers, such as in the yahoo data center cluster, the process data transmission of the workflow is the most significant component of its network traffic, taking up approximately 60% of the total workflow completion time. At the same time, the process data transmission of the workflow is also a key cause of network congestion. Therefore, the workflow is optimally deployed, so that the process communication overhead is reduced, and the method is important to relieving the flow pressure of the data center and shortening the task completion time.

Representative heuristic workflow deployment algorithms mainly include: list-based deployment algorithms, cluster-based deployment algorithms, replication-based deployment algorithms. The workflow deployment algorithm based on clustering is mainly aimed at reducing communication overhead among workflow tasks, wherein tasks in the workflow are mapped into different clusters firstly, and then each cluster is mapped onto the same computing node as a whole. The core idea is to divide a plurality of tasks with edges connected (with data dependency relationship) into the same cluster, so as to save the communication overhead among the tasks in the cluster.

For example, ahmadsg et al, in its published paper "Data-intensive workflow optimization based on application task graph partitioning in heteroge-neous computing systems"(IEEE Fourth International Conference on Big Data and Cloud Computing.IEEE,2014:129-136), propose a partition-based, data-intensive workflow optimization algorithm PDWA for heterogeneous computing systems. In this algorithm, the workflow is divided into a specified size and number of task partitions to minimize the data transfer overhead between the partitions. PDWA defines the maximum number of tasks allowed to be included in each task partition, calculated by multiplying the total number of tasks in the workflow by a coefficient less than 1, and mapping each task partition to the computing node that minimizes the partition execution time. The method has the defects that the workflow is clustered according to the data dependency relationship among the workflow tasks, so that some tasks which have strong data dependency relationship and can be executed in parallel can be divided into the same cluster, the parallel execution performance of the tasks is poor, and finally the completion time of the workflow is influenced.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a workflow deployment method based on graph segmentation, so as to realize the balance between communication overhead minimization and parallelism maximization in the workflow clustering process and improve the execution efficiency of the workflow.

The technical idea of the invention is as follows: from the point of view of graph theory, the dependency and parallelism among tasks in the workflow are fully mined, a classical graph segmentation algorithm-community discovery algorithm is improved, and the joint optimization of data overhead and task parallelism in the workflow task partitioning process is realized.

According to the above thought, the technical scheme of the invention comprises the following steps:

(1) According to a task set T, a data dependency and time sequence relation E between tasks, a task complexity set L and a data transmission quantity set D in a workflow, a workflow directed acyclic graph DAG model is established: g= { T, E, L, D };

(2) Assigning a set of virtual machines s= { S _k |k=1, 2,3, & gt, q }, wherein q represents the number of virtual machines, and the physical machines corresponding to each virtual machine are different; calculating the execution time w _i,k of tasks on different virtual machines and the data transmission time c _i,j between tasks with data dependency relationship in the workflow, and the average execution time of tasks on all virtual machines and the average data transmission time/>, between tasks

(3) Two tasks with serial structure in the workflow model G are determined and combined:

in the workflow model diagram, if only one subtask exists in one task and only one father task exists in the subtask, the subtask and the father task form a serial structure;

Canceling data transmission between the task t _i and the task t _i+1 with serial structures, and adding and combining the complexity of the two tasks into a new task t _i';

(4) Partitioning a workflow model diagram:

(4a) Dividing a workflow model graph G into n subgraphs, wherein each subgraph comprises a vertex;

(4b) Sequentially searching and attempting to combine two subgraphs with edge connection, and calculating the module degree increment delta Q after each combination according to the task and the connection relation between the tasks contained in each subgraph:

If the tasks of the same layer exist in the two sub-graphs, calculating the sum of the average execution time of the tasks of the same layer in the new sub-graph after the two sub-graphs are combined, and comparing the sum with the maximum value maxW of the average execution time of all the tasks of the layer:

If sum > maxW a, then Δq= - (e _i,j+e_j,i-2a_ia)_j＝-2(e_i,j-a_ia_j);

Otherwise, Δq=e _i,j+e_j,i-2a_ia_j＝2(e_i,j-a_ia_j); wherein alpha is a comparison coefficient, the value is a number smaller than 1, e _i,j represents the proportion of the edge connecting weight between the ith sub-graph and the jth sub-graph to the sum of the total edge connecting weights in the graph G, and a _i represents the proportion of the sum of the edge connecting weights of all tasks in the ith sub-graph to the sum of the total edge connecting weights in the graph G.

If there is no task at the same layer in both subgraphs, Δq=e _i,j+e_j,i-2a_ia_j＝2(e_i,j-a_ia_j);

(4c) Combining the two sub-graphs with the maximum delta Q value, and updating the modularity Q=Q+max delta Q;

(4d) Repeating the steps (4 a) to (4 c) until the whole graph G is combined into a sub graph, and finding a graph dividing result corresponding to the maximum module degree value, namely the optimal task partition P= { P ₁,p₂,...p_x,...,p_h }, wherein P _x represents the xth task partition, and h represents the partition number;

(5) Deploying the best task partition onto the virtual machine:

(5a) The priority rank of each task is calculated from the task average execution time and the average data transfer time between tasks/> (t _i):

Wherein succ (t _i) represents a subtask set of task t _i;

(5b) The priority of the task partition is calculated according to the priority rank (t _i) of each task:

rank(p_x)＝max(rank(t_i),t_i∈p_x)

(5c) Arranging all the partitions in a descending order according to rank (p _x), selecting a task partition with the largest rank (p _x) value and undeployed each time, traversing all the virtual machines, calculating the sum of the total execution time of the task partition on the current virtual machine and the execution time of the deployed task on the virtual machine, and finding a virtual machine s _k with the smallest rank;

(5d) All tasks in the task partition are deployed together as a whole onto the virtual machine s _k, the plurality of tasks within the partition are arranged in descending rank (t _i) values, and the virtual machine will execute the tasks sequentially.

Compared with the prior art, the invention has the following advantages:

1. When the workflow model graph is segmented, the sum of the average execution time sum of the tasks at the same layer in the partition is compared with the maximum value maxW of the average execution time sum of all the tasks at the same layer, so that the partition process of the tasks is restrained, the situation that the same task partition contains too many tasks at the same layer which can be parallel is effectively avoided, the execution time balance of the tasks at the same layer among different partitions is ensured, the problem that the communication overhead is minimized at the cost of sacrificing the parallel execution efficiency of the tasks in the workflow in the existing workflow deployment algorithm based on clustering, and the task parallel execution efficiency is lower is caused is solved.

2. When the optimal task partition is deployed on the virtual machine, the priority is calculated for each task partition and is sequenced, and then the proper virtual machine is selected for deployment by taking the basis of minimum execution time of the task partition as the basis, so that the completion time of the workflow is further reduced, and the execution efficiency of the workflow is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is an exemplary diagram of a workflow DAG model in accordance with the present invention;

FIG. 3 is a schematic diagram showing the merging of DAG serial structures according to the present invention;

FIG. 4 is a schematic diagram of the segmentation of a workflow model diagram in accordance with the present invention;

FIG. 5 is a simulated comparison of performance changes of the present invention and existing workflow deployment algorithms as workflow scale increases;

FIG. 6 is a comparison graph of performance variation simulations of the present invention and existing workflow deployment algorithms with increasing communication computation ratio;

FIG. 7 is a comparison graph of performance variation simulations of the present invention and existing workflow deployment algorithms as the number of available virtual machines increases.

Detailed Description

Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps for the example are as follows:

And step 1, establishing a workflow directed acyclic graph DAG model.

The task set T in the workflow is expressed as: t= { T _i |i=1, 2,., n }, where T _i represents the i-th task and n is the number of tasks that the workflow contains;

The data dependency and timing relationship E between tasks is expressed as: e= { E _i,j|t_i,t_j ε T }, wherein E _i,j is 0 or 1, when E _i,j is 0, no dependency/edge exists between task T _i and task T _j, and when E _i,j is 1, no dependency/edge exists between task T _i and task T _j;

The task complexity set L is represented as: l= { L _i|t_i e T }, where L _i represents the computational complexity of task T _i;

The data transfer amount set D is expressed as: d= { D _i,j|t_i,t_j e T }, where D _i,j represents the amount of data transfer between task T _i and task T _j;

and according to the four element combinations, obtaining a workflow directed acyclic graph model:

G＝{T,E,L,D}，

The task set T is a vertex set of the graph G, the data dependency and time sequence relation E between tasks is an edge set of the graph G, the task complexity set L is a vertex weight set of the graph G, and the data transmission quantity set D is an edge weight set of the graph G.

The workflow directed acyclic graph model is shown in fig. 2, wherein a directed edge e _i,j connects two tasks t _i and t _j, t _i is called a parent task of t _j, t _j is called a child task of t _i, a task without any parent task is called an entrance task t _enter, a task without any child task is called an exit task t _exit, the number of layers where each task is located is determined by the maximum distance between a node and an entrance task, and the entrance task t ₁ is located at the first layer.

Without loss of generality, this example assumes that a workflow has only one unique ingress task and only one egress task, and when a workflow has multiple egress tasks and ingress tasks, by adding a virtual egress task vertex or a virtual ingress task vertex to the workflow, the workflow has only one egress task and ingress task, and the complexity of the virtual task is set to be zero, and the data transmission amount between the virtual task and other vertices is also zero, so that adding the virtual task vertex does not affect the deployment result of the workflow, and t ₁₀ in fig. 2 is a virtual egress task.

And 2, calculating the execution time of each task and the data transmission time among the tasks in the workflow.

2.1 Assigning a set of virtual machines s= { S _k |k=1, 2,3,., q }, where q represents the number of virtual machines, and the physical machines corresponding to each virtual machine are different;

2.2 Computing the working parameters of tasks on different virtual machines in the workflow:

2.2.1 Computing the execution time w _i,k of task t _i on different virtual machines and the average execution time of task t _i on all virtual machines in the workflow

Where l _i represents the computational complexity of task t _i, v _k represents the processing power of virtual machine s _k, and q represents the number of available virtual machines;

2.2.2 Calculating the data transfer time c _i,j between the task t _i and the task t _j with the data dependency relationship and the average data transfer time between the tasks

Wherein, task t _i and task t _j are respectively deployed to virtual machine s _k1 and virtual machine s _k2 for execution, d _i,j represents data transmission amount between task t _i and task t _j, and r _k1,k2 represents data transmission rate between virtual machine s _k1 and virtual machine s _k2; Representing an average data transfer rate between all virtual machines; when two tasks with a sequential dependency relationship are placed in the same virtual machine, the data transmission overhead between the two tasks is negligible, namely c _i,j is 0.

When the task t ₁ and the task t ₂ are placed in the same virtual machine as in fig. 2, no additional resources are needed for data transmission, so the data transmission overhead can be recorded as 0; when the task t ₁ and the task t ₂ are respectively placed in two different virtual machines, the data needs additional network resources to be transmitted, and the transmission rate r _k1,k2 =1 between the two virtual machines is assumed, so that the required data transmission time between the task t ₁ and the task t ₂ is 8.

And step 3, merging the serial structures of the workflow model diagrams.

And canceling data transmission between the task t _i and the task t _i+1 with the serial structure, adding and combining the two tasks into a new task t '_i, and updating the calculation complexity and the workflow model diagram to form a new model diagram G'.

As shown in fig. 3, the task t _i and the task t _i+1 are two tasks with a serial structure, and after the two tasks are combined, a new task t' _i is formed, and the new task computational complexity and the workflow model diagram are updated as follows:

w′_i＝w_i+w_i+1

pre(t′_i)＝pre(t_i)

succ(t′_i)＝succ(t_i+1)

Wherein w '_i represents the execution time of the new task t' _i, w _i represents the execution time of the task t _i, w _i+1 represents the execution time of the task t _i+1, pre (t '_i) represents the parent task set of the new task t' _i, succ (t '_i) represents the child task set of the new task t' _i, pre (t _i) represents the parent task set of the task t _i, succ (t _i+1) represents the child task set of the task t _i+1.

And 4, dividing the new workflow model graph G' formed after the serial structures are combined.

4.1 Dividing the workflow model graph G' into n sub-graphs, wherein each sub-graph comprises a vertex;

4.2 Searching and attempting to combine two sub-graphs with edge connection in turn, and calculating the module degree increment delta Q after each combination according to the task and the connection relation between the tasks contained in each sub-graph:

If sum > maxW a, then Δq= - (e _i,j+e_j,i-2a_ia)_j＝-2(e_i,j-a_ia_j);

Otherwise, Δq=e _i,j+e_j,i-2a_ia_j＝2(e_i,j-a_ia_j); wherein alpha is a comparison coefficient, the value is a number smaller than 1, e _i,j represents the proportion of the edge connecting weight between the ith sub-graph and the jth sub-graph to the sum of the total edge connecting weights in the graph G ', and a _i represents the proportion of the sum of the edge connecting weights of all tasks in the ith sub-graph to the sum of the total edge connecting weights in the graph G';

Wherein e _i,j is the ratio of the sum of the edge weights of the ith sub-graph and the jth sub-graph to the sum of the total edge weights in the graph G ', a _i is the ratio of the sum of the edge weights of all tasks in the ith sub-graph to the sum of the total edge weights in the graph G', and the ratio is calculated as follows:

a_i＝k_i/2m

Wherein A _i,j is the edge weight between the ith task and the jth task, and m is the sum of all the edge weights in the graph G'. k _i is the sum of all the edge weights connected to the ith task.

4.3 Repeating 4.1) and 4.2) until the entire graph G' merges into one community. And finding a corresponding community structure when the modularity value is maximum, namely the optimal partition scheme P= { P ₁,p₂,...p_h }.

FIG. 4 shows an example of workflow task partitioning, which is a workflow composed of five tasks, where t ₂,t₃,t₄ belongs to the same layer and can be executed in parallel, and their execution times are 15, 10, and 25 respectively, where when sub-partitioning is performed, the three tasks may be sequentially executed by being partitioned into the same partition, which obviously results in reduced parallelism of workflow execution and prolonged completion time; another possible result is that t ₃ and t ₄ are divided into the same partition, t ₂ is divided into another partition, the execution time of the tasks on the same layer between the two partitions is unbalanced, and the two partitions need to wait for each other when actually executing, so that the completion time of the workflow is prolonged; in contrast, if t ₂ and t ₃ are divided into the same partition and t ₄ is divided into another partition, the execution time of the tasks on the same layer in the two partitions is uniformly distributed, and thus, the increase of redundant waiting time can be avoided.

And 5, mapping the optimal task partition to the virtual machine to complete the deployment of the workflow.

5.1 Calculating a priority rank for each task from the task average execution time and the average data transfer time between tasks/> (t _i):

Wherein succ (t _i) represents a subtask set of task t _i;

5.2 Calculating the priority of the task partition according to the priority rank (t _i) of each task:

rank(p_x)＝max(rank(t_i),t_i∈p_x)

5.3 All the partitions are arranged in a descending order according to the priority rank (p _x) of the task partition, each time a task partition with the maximum rank (p _x) value and undeployed is selected, all the virtual machines are traversed, the sum of the total execution time of the task partition on the current virtual machine and the execution time of the deployed task on the virtual machine is calculated, and the virtual machine s _k with the minimum rank value is found;

5.4 All tasks in the task partition are deployed together as a whole onto the virtual machine s _k, and the tasks in the partition are arranged in descending order according to the rank (t _i) value, and the virtual machine sequentially executes the tasks according to the sequence to complete the deployment of the workflow.

The effects of the present invention are further described below in conjunction with simulation experiments:

1. simulation parameter setting:

the workflow directed acyclic graph DAG is randomly generated, and simulation parameters are set as shown in Table 1:

Table 1 workflow deployment simulation parameter settings

Parameters (parameters)	And (3) taking the value: fixed value (variation value)
		Workflow Scale N	50(20,40,60,80,100)
Communication calculation ratio CCR	0.5(0.4,0.8,1.2,1.6,2)
		Number of available virtual machines K	5(1,2,3,4,5,6)

In table 1, the workflow scale refers to the number of tasks included in the workflow; the communication computation ratio CCR refers to the ratio of the sum of the average data transmission time between tasks to the sum of the average execution time of all tasks, with higher CCR indicating that the workflow is a communication intensive application, and lower CCR indicating that the workflow is a computation intensive application;

2. simulation content and result analysis:

The two workflow deployment algorithms, namely the existing list-based deployment algorithm HEFT and the clustering-based deployment algorithm PDWA, are selected to evaluate and compare with the deployment result of the method according to three aspects, namely the influence of the workflow scale on the algorithm performance, the influence of the communication calculation comparison algorithm performance and the influence of the number of virtual machines on the algorithm performance.

Simulation 1, the number of virtual machines is set to be 5, the communication calculation ratio CCR is set to be 0.5, the number of workflow task nodes is increased from 20 to 100, and the performance changes of the three methods along with the increase of the workflow scale are compared, so that the result is shown in figure 5. Wherein:

Fig. 5 (a) is a graph showing the change of the scheduling length ratio SLR of three methods according to the increase of the workflow scale, wherein the abscissa is the number of tasks in the workflow, and the ordinate is the scheduling length ratio SLR representing the workflow completion time, and the larger the SLR value, the larger the workflow completion time, the worse the performance. As can be seen from fig. 5 (a), compared with the deployment algorithm PDWA based on clustering, the workflow completion time of the present invention is reduced by 75% on average, because PDWA artificially designates the size of the workflow partition, the number of partitions is relatively fixed, and the parallelism of task execution is not considered, resulting in the extension of the workflow completion time, while the method of the present invention considers the communication overhead and the parallelism of task execution when performing the workflow task partition, and avoids excessive waiting time caused by the execution time that can be executed in parallel or is unbalanced, thereby effectively reducing the workflow completion time.

Fig. 5 (b) is a diagram showing the comparison of speedup changes of three methods with the increase of the workflow scale, the abscissa thereof is the number of tasks in the workflow, and the ordinate thereof is speedup, speedup which represents the parallel execution efficiency of the workflow, and the larger the parallel execution efficiency of the deployment method is, the better the performance is. As can be seen from fig. 5 (b), as the workflow scale increases, the speedup values of the method of the present invention are always larger compared to the clustering-based deployment algorithm PDWA. The method fully considers the dependency and parallelism of the tasks in the workflow when the tasks are partitioned, reduces the data transmission overhead among communities, and simultaneously considers the parallel execution efficiency of the tasks in the workflow, so that the tasks at the same layer can be efficiently executed in parallel.

Fig. 5 (c) is a graph of the communication overhead change with increasing workflow size for three methods, with the abscissa representing the number of tasks in the workflow and the ordinate representing the communication overhead. As can be seen from fig. 5 (c), with the gradual increase of the workflow size, the communication overhead is in an upward trend, and compared with the list-based algorithm HEFT, the method and the cluster-based deployment algorithm PDWA of the present invention have lower communication overhead all the time, and with reference to fig. 5 (a) and 5 (b), it is illustrated that the method of the present invention can achieve a reduction in the workflow completion time and a significant improvement in the parallel execution efficiency while maintaining the smaller communication overhead.

Simulation 2, setting the number of virtual machines to 5, setting the number of workflow task nodes to 50, increasing the communication calculation ratio CCR from 0.4 to 2, comparing the performance changes of three methods with CCR increase, the results are shown in fig. 6, wherein:

Fig. 6 (a) is a graph showing the change of the scheduling length ratio SLR with increasing CCR for three methods, wherein the abscissa is the communication calculation ratio CCR and the ordinate is the scheduling length ratio SLR. As can be seen from fig. 6 (a), in the case of continuously increasing CCR, the workflow completion time of the method of the present invention is always smaller than that of the clustering-based deployment algorithm PDWA, and the performance of the method of the present invention and the list-based algorithm HEFT is more stable with the change of CCR values, while the SLR value of PDWA is significantly increased with the increase of CCR values, because the clustering-based deployment algorithm PDWA gradually loses advantages only with the goal of minimizing communication overhead on the basis of specifying the workflow partition size with the increase of the proportion of data transmission time.

Fig. 6 (b) is a plot of the change speedup of the three methods as CCR increases versus the number of tasks in the workflow on the abscissa and speedup on the ordinate. As can be seen from fig. 6 (b), the speedup values of the method of the present invention present a significant advantage over the clustering-based deployment algorithm PDWA with increasing CCR, and are more adaptive, because the present invention fully considers the parallelism of workflow tasks when performing workflow partitioning.

Fig. 6 (c) is a graph of the communication overhead change with CCR increase versus three methods, with the abscissa being the number of tasks in the workflow and the ordinate being the communication overhead. As can be seen in fig. 6 (c), the inventive method and the clustering-based deployment algorithm PDWA have lower communication overhead than the list-based algorithm HEFT. With reference to fig. 6 (a) and fig. 6 (b), it can be verified that the method of the present invention can effectively improve the execution efficiency of the workflow on the premise of maintaining a smaller communication overhead.

Simulation 3, setting the number of workflow tasks to 50, setting the communication calculation ratio CCR to 0.5, increasing the number of virtual machines from 1 to 6, and comparing the performance changes of three methods with the increase of the number of available virtual machines, wherein the results are shown in fig. 7, wherein:

FIG. 7 (a) is a graph showing the variation of the scheduling length ratio SLR with the number of available virtual machines in three ways, wherein the abscissa represents the number of available virtual machines and the ordinate represents the scheduling length ratio SLR. As can be seen from fig. 7 (a), as the number of available virtual machines increases, the SLR value of the workflow decreases continuously, and the advantages of the method of the present invention are also gradually apparent, because as the number of virtual machines increases, more tasks are allowed to be processed in parallel on different virtual machines in the workflow execution process, and when task partitioning is performed, the method of the present invention considers the balance of task parallelism and execution time and the data transmission overhead between the partitions, so that when a plurality of virtual machines execute in parallel, excessive waiting time between different partitions is avoided, and the completion time of the workflow is effectively reduced.

FIG. 7 (b) is a comparison of speedup changes in three methods as the number of available virtual machines increases, with the number of available virtual machines on the abscissa and speedup on the ordinate. As can be seen from fig. 7 (b), as the number of available virtual machines increases, the speedup value increases continuously, and the advantages of the method of the present invention become apparent. The method and the device not only enable the communication expenditure among the partitions to be minimum when partitioning, but also give consideration to the parallel execution efficiency of tasks in the workflow execution process, and allow more partitions to be processed in parallel on different virtual machines in the workflow execution process along with the increase of the number of the available virtual machines, so that speedup values can be increased along with the increase of the number of the available virtual machines.

Fig. 7 (c) is a graph of the communication overhead change with increasing number of available virtual machines versus three methods, with the abscissa representing the number of available virtual machines and the ordinate representing the communication overhead. As can be seen in fig. 7 (c), the inventive method and the clustering-based deployment algorithm PDWA have lower communication overhead than the list-based algorithm HEFT. As can be seen from fig. 7 (a) and fig. 7 (b), the method of the present invention can effectively improve the parallel execution efficiency and reduce the workflow completion time while maintaining lower communication overhead.

Claims

1. The workflow deployment method based on graph segmentation is characterized by comprising the following steps:

(3) Two tasks with serial structures in the workflow model G are determined and combined to obtain a new workflow model diagram G':

Canceling data transmission between the task t _i and the task t _i+1 with a serial structure, and adding and combining the two tasks into a new task t' _i;

(4) Splitting a new workflow model diagram G' formed after the serial structures are combined:

(4a) Dividing the workflow model graph G' into n sub-graphs, wherein each sub-graph comprises a vertex;

If sum > maxW a, then Δq= - (e _i,j+e_j,i-2a_ia)_j＝-2(e_i,j-a_ia_j);

Otherwise, Δq=e _i,j+e_j,i-2a_ia_j＝2(e_i,j-a_ia_j); wherein alpha is a comparison coefficient, the value is a number smaller than 1, e _i,j represents the proportion of the edge connecting weight between the ith sub-graph and the jth sub-graph to the sum of the total edge connecting weights in the graph G, and a _i represents the proportion of the sum of the edge connecting weights of all tasks in the ith sub-graph to the sum of the total edge connecting weights in the graph G;

(4d) Repeating the steps (4 a) to (4 c) until the whole graph G' is combined into a sub graph, and finding a graph dividing result corresponding to the maximum module degree value, namely the optimal task partition P= { P ₁,p₂,...p_x,...,p_h }, wherein P _x represents the xth task partition, and h represents the partition number;

(5) Mapping the optimal task partition to a virtual machine to complete the deployment of the workflow:

Wherein succ (t _i) represents a subtask set of task t _i;

rank(p_x)＝max(rank(t_i),t_i∈p_x)

2. The method according to claim 1, characterized in that: in the step (1), a workflow directed acyclic graph model G is established, and the following is realized:

(1a) The task set T in the workflow is expressed as: t= { T _i |i=1, 2,., n }, where T _i represents the i-th task and n is the number of tasks that the workflow contains;

(1b) The data dependency and timing relationship E between tasks is expressed as: e= { E _i,j|t_i,t_j ε T }, wherein E _i,j takes a value of 0 or 1, when E _i,j takes a value of 0, it indicates that there is no dependency/no edge between task T _i and task T _j, when E _i,j takes a value of 1, it indicates that there is a dependency/edge between task T _i and task T _j, and directional edge E _i,j connects two tasks T _i and T _j, called T _i as the parent task of T _j, T _j as the child task of T _i, and the task without any parent task is called the entry task;

(1c) The task complexity set L is represented as: l= { L _i|t_i e T }, where L _i represents the computational complexity of task T _i;

(1d) The data transfer amount set D is expressed as: d= { D _i,j|t_i,t_j e T }, where D _i,j represents the amount of data transfer between task T _i and task T _j;

(1e) And combining the four elements to obtain a workflow directed acyclic graph model G= { T, E, L, D }.

3. The method according to claim 1, characterized in that: in step (2), the formula for calculating the execution time w _i,k of the task on different virtual machines and the average execution time of the task on all the virtual machines in the workflow is as follows:

Where w _i,k represents the execution time of task t _i on virtual machine s _k, represents the average execution time of task t _i on all virtual machines, l _i represents the computational complexity of task t _i, v _k represents the processing power of virtual machine s _k, and q represents the number of available virtual machines.

4. The method according to claim 1, characterized in that: in the step (2), the average data transmission time between tasks with data dependency relationship c _i,j is calculated as follows:

Wherein c _i,j represents data transmission time between task t _i and task t _j, task t _i and task t _j are respectively deployed to virtual machine s _k1 and virtual machine s _k2 for execution, d _i,j represents data transmission amount between task t _i and task t _j, and r _k1,k2 represents data transmission rate between virtual machine s _k1 and virtual machine s _k2; Representing the average data transfer time between the ith task t _i and the jth task t _j,/> representing the average data transfer rate between all virtual machines; when two tasks with a sequential dependency relationship are placed in the same virtual machine, the data transmission overhead between the two tasks is negligible, namely c _i,j is 0.

5. The method according to claim 1, characterized in that: in step (4 b), the number of layers each task is located in is determined by the maximum distance between the node and the ingress task.

6. The method according to claim 1, characterized in that: step (4 b) calculates the ratio e _i,j of the edge weight between the ith sub-graph and the jth sub-graph involved in the module degree increment Δq formula to the sum of the total edge weights in the graph G', and determines the ratio as follows:

Wherein A _i,j is the edge weight between the ith task and the jth task, and m is the sum of all the edge weights in the graph G'.

7. The method according to claim 1, characterized in that: step (4 b) calculates the ratio a _i of the sum of the edge weights of all tasks in the ith sub-graph involved in the module degree increment Δq formula to the sum of the total edge weights in the graph G', and determines the ratio as follows:

a_i＝k_i/2m

Where k _i is the sum of all the edge weights connected to the ith task and m is the sum of all the edge weights in graph G'.