CN104239141A

CN104239141A - Task optimized-scheduling method in data center on basis of critical paths of workflow

Info

Publication number: CN104239141A
Application number: CN201410452173.7A
Authority: CN
Inventors: 马华东; 高一鸿; 张海涛; 丁鸿凯; 赵纯
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2014-09-05
Filing date: 2014-09-05
Publication date: 2014-12-24
Anticipated expiration: 2034-09-05
Also published as: CN104239141B

Abstract

The invention relates to a task optimized-scheduling method in a data center on the basis of critical paths of a workflow. The task optimized scheduling method comprises the following steps of according to the difference of all subtasks in the applied workflow, determining key factors influencing the system performance by the data center, comparing the cost performance of all the subtasks according to an optimization objective and the workflow characteristics, and determining resource nodes distributed to all the subtasks so as to provide two scheduling methods of the resource nodes on the basis of different requirements of users and enable the rent of the resource nodes to be optimized or the processing time of the workflow to be shortest, so the maximum data processing capability purchased by unit fund is realized while the rent expense of resources is saved, or the cost performance of the system is improved, the operating efficiency of the system is enhanced, the processing time of the workflow is reduced, and simultaneously, the execution plan of the resource nodes is generated. The task optimized-scheduling method has the beneficial effects that the two factors of the time and the quantity of the nodes in the rent cost of resource nodes are both considered, the minimized rent expense of the resource nodes is used for purchasing the resource nodes with the strongest computing capability; a mathematical model of the method can adapt to various types of applications, and the operating cost of the applications is greatly reduced.

Description

Based on the optimizing and scheduling task method of workflow critical path in data center

Technical field

The present invention relates to a kind of optimizing and scheduling task method based on workflow critical path in data center, belong to the technical field of cloud computing.

Background technology

For the main research that the resource node dispatching method of task is in field of cloud computer technology in data center.During the application that user submits to needs to dispose on the data centre to data center, this application is by the data stream needing data center to carry out processing and formed the workflow of these data analysis.The data stream that wherein user needs data center to carry out processing may be single data stream, and more susceptible condition is multiple data stream.Resource node dispatching method is responsible for distributing resource node needed for data center processing data to user, and guarantees that each resource node distributing to user is by utilization effectively and reasonably.Resource node is the virtual server utilizing Intel Virtualization Technology to configure, the data processing task that this virtual server deploy user submits to.Data center mainly pays close attention to two aspects for the resource node dispatching method of task: the resource node demand forecast of user and ensuring method, and data center is to resource node Optimization Scheduling.Wherein:

The resource node demand forecast of user and service guarantees method: data center, by the resource node to the forecast dispatching respective numbers of user's request, spends with the rent of saving resource node; Also rationally to be arranged on the physical region in server cluster to resource node, ensure that user realizes the target in service level.In order to reach the target of resource node Optimum utilization better, data center resource node scheduling method wishes to use minimum resource node to meet the demand for services of user.Due in running environment, the process workload data of user to the resource node demand of data center, i.e. its application service is among change all the time, causes user to there is gap to the expectation of service and the real resource node service condition of active user.Therefore, the resource node dispatching method of data center must utilize the workload of a series of mathematical methods such as probability statistics predicted application service processing data quickly and accurately, and then estimates the resource node quantity that needs to use.Meanwhile, carry out correspondence adjustment according to following a period of time user to the characteristics of demand of resource node to the deployment allocative decision of the resource node of data center on physical server, guarantee data center promises to undertake and can not to change to the service quality of user.

Data center is to the Optimization Scheduling of resource node: data center is when disposing multiple allocated resources node thereon, the resource contention of generation and optimization problem and corresponding proposition is optimized dispatching method to resource node.Its target is that the resource node of data center can be obtained filling Appropriate application, and its main method can be two kinds: the Optimization Scheduling based on priority and the Optimization Scheduling based on workflow.Wherein:

Resource node Optimization Scheduling based on priority is: the different characteristics for the subtask forming application in data center is come Resources allocation node, namely under some type constraint condition (as the resource utilization of data center resource node, the running environment etc. of data center systems software), the own characteristic of subtask is quantified as corresponding weights, and sorts according to the weights size of subtask.Then, data center using the weights of subtask order as the dispatching sequence of resource node, by the resource node priority allocation of suitable quantity to the highest subtask of priority.

Optimization Scheduling based on application workflow is: use the workflow of application to describe the application implementation of load, and the method distributes to the resource node of each subtask right quantity in workflow.Wherein, the workflow of application for describing deployment complicated applications on the data centre, and is made up of the dependence between the subtask of workflow and each subtask.In concrete resource node Optimization Scheduling, the workflow of application is a directed acyclic graph by formalized description, is formed the executive plan of application by the resource node subtask of this work flow diagram being distributed to right quantity.And be all the resource node Optimized Operation based on workflow to the adjustment and optimisation of this work flow diagram.

At present, resource node Optimization Scheduling based on the workflow of application is optimized according to the work flow diagram of application, obtain between the constraint condition and other constraint condition (comprising time constraint condition, power consumption constraint or equally loaded etc.) of work flow diagram self and balance, optimization process is performed to the execution efficiency of the resource node rental expenses or resource node of distributing to workflow.Some resource node Optimization scheduling algorithms are under setting constraint condition, not only can provide the resource node Optimized Operation scheme of application, the optimal case that the resource node that can also provide this kind of optimization problem is rented, namely provides a Pareto optimization curve to indicate the optimum solution distribution situation of such problem.Based in the resource node Optimization Scheduling of workflow, the own characteristic of the subtask of work flow diagram different (as: the heterogeneous networks structure etc. of execution sequence, hardware system attribute and data center), can cause the optimum solution of resource node Optimized Operation scheme marked change can occur.

In sum, the resource node Optimization Scheduling based on the workflow of application can meet the subtask resource node Optimal Scheduling of complicated applications better, and can significantly improve the effect of optimization of system.

But the weak point of work at present is: do not consider that in the workflow of application, subtask is for the demand difference of data center resource node.Because each subtask realizes different data processing targets, the demand for resource node certainly exists difference.Therefore, under the condition that demand in subtask to resource there are differences, in order to the execution efficiency of renting expense, improving whole system of optimization system resource node, need the proportionate relationship of considering the resource node of each subtask in workflow to be rented to the input and output between cost and its output, namely need to consider that the cost performance of resource node dispatching method is on the impact of system performance.

In addition, existing resource node optimization dispatching method does not consider in the workflow that the application that data center disposes comprises between each subtask otherness.The execution efficiency of application will be affected to the resource node quantity that particular type subtask is distributed, therefore need the importance of all types of subtask in application workflow, and whether the resource node that occupies of subtask contributes to improving system performance and makes effective judgement.These problems all become the focus problem that scientific and technical personnel in the industry pay close attention to.

Summary of the invention

In view of this, the object of this invention is to provide a kind of data center to the optimizing and scheduling task method based on workflow critical path, the present invention proposes based on the demand difference of different subtasks to resource node in application workflow: for two class constraint conditions of system for restricting performance in actual environment, the method for scheduling task that proposition two kinds is different respectively, make the optimization realizing system performance in data center based on the resource node scheduling of application workflow: first, under time constraint condition, namely user is within the data processing time setting deadline of workflow, make to realize the resource node rental expenses of application needed for workflow to minimize, secondly, for the own resource node that data center has determined, and under the condition of setting resource node total quantity, distribute subtask to each resource node, the execution time of subtask is minimized.

In order to achieve the above object, the invention provides a kind of optimizing and scheduling task method based on workflow critical path in data center, it is characterized in that: data center receives user and submits to and need to dispose on the data centre, during the application carrying out by data center the data stream that processes and the workflow of these data analysis is formed, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node expense optimization or the workflow of renting: namely realize the data-handling capacity that unit fund buys and maximize and the rental expenses of saving resource, or improve cost performance and strengthen running efficiency of system and reduce processing time of workflow, generate resource node executive plan simultaneously, the method comprises each operation steps following:

Step 1, data center arranges the resource node Optimal Operation Model of the workflow of each application: the resource node scheduling model of this workflow is the treatment scheme following workflow, according to user's actual need, the subtask in workflow is distributed to resource node, the i.e. virtual server of suitable number, and generate executive plan; Simultaneously based on workflow subtask characteristic, the validity that assessment resource node distributes, under the logical organization prerequisite of not destruction work stream, the resource node usage quantity of Optimization Work stream;

Step 2, the following characteristic according to the work flow diagram determination workflow of application deployment: the computing power of subtask and the cost performance of the type resource node during deployment subtask in the critical path of workflow, the type using resource node, workflow; Because of the resource requirement of subtask each in work flow diagram and computing power different, from this work flow diagram, search out a critical path according to graph theory knowledge, as the Resourse Distribute foundation of subtask resource node Optimization Scheduling;

Step 3, data center, according to the dependence between each workflow subtask characteristic and each subtask, determines the data transmission period in work flow diagram;

Step 4, merge subtask: according to optimization aim, the two-terminal task on the limit that the subtask of work flow diagram selects the transmission time maximum is merged, generate new compound subtask, reduce data transmission period between subtask: actual deployment method is by two sub-task deployments in same physical machine, the data transmission period between the resource node reducing these two subtasks;

Step 5, adjustment work flow diagram; According to user's request and constraint condition, it is user resource allocation node; When for targeted customer's Resources allocation node, decide how to each subtask Resources allocation node according to optimization aim by the cost performance comparing subtask, the final executive plan generating application.

Innovation advantage of the present invention is: data centre dispatching method of the prior art, only optimizes the resource node in Distribution Calculation process for the executive plan of applying.But these dispatching algorithms all have ignored distribute to particular child task type resource node quantity on the impact of system execution efficiency aspect, do not consider that the resource node quantity of task and cost performance will make width in the cost of user's borrowing resource node of application deployment on the data centre rise yet.The inventive method considers two aspects of rent cost simultaneously: time factor and resource node quantity factor, can realize being used for buying computing power at most or the strongest resource node by minimized resource node rental; And the mathematical model that the inventive method is set up can adapt to various types of application well, can solve the system optimization problem under various boundary conditions, and make the operating cost realization decline by a relatively large margin of application.Therefore, the present invention has good popularizing application prospect.

Accompanying drawing explanation

Based on the optimizing and scheduling task method operation steps process flow diagram of workflow critical path in Tu1Shi data center of the present invention.

Fig. 2 is the workflow of application and the structural relation schematic diagram of work program.

Fig. 3 is the present invention's resource node Optimization Scheduling operational flowchart under setting-up time constraint condition.

Fig. 4 is the present invention's resource node Optimization Scheduling operational flowchart under setting resource node number constraint condition.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.

Optimizing and scheduling task method based on workflow critical path in data center of the present invention is: data center receives user and submits to and need to dispose application (this application is by the data stream needing this data center to carry out processing and formed the workflow of these data analysis) on the data centre.Time, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node overhead-optimized or the workflow of renting: the data-handling capacity saving resource namely bought by maximizing unit fund is rented expense or reduced the processing time of workflow, improve cost performance and strengthen running efficiency of system, generate resource node executive plan simultaneously.

See Fig. 1, introduce each concrete operation step following of the inventive method:

Step 1, data center arranges the resource node Optimal Operation Model of the workflow of each application:

The resource node scheduling model of this workflow is the treatment scheme following workflow, according to user's actual need, the subtask in workflow is distributed to resource node, the i.e. virtual server of suitable number, and generates executive plan; Simultaneously based on workflow subtask characteristic, the validity that assessment resource node distributes, under the logical organization prerequisite of not destruction work stream, realizes optimization process to the resource service condition of the subtask of composition application; And both contributed to the characteristic describing application subtask, evaluation work properties of flow can be helped again

See Fig. 2, introduce a kind of resource dispatching model for data centric workflows provided by the invention, this scheduling model comprises two parts: the workflow of application and the executive plan of application, workflow defining is one group of subtask and dependence thereof by the present invention, i.e. the logic implementation of an application described with work flow diagram.Work flow diagram is a width directed acyclic graph, and the node in figure represents the subtask of workflow, and connecting line or limit represent the dependence between each subtask, the data transmission namely between subtask.The subtask of workflow is the minimum unit analyzing data in application, and different subtask also exists difference for the user demand of resource node and the processing power of resource node.Data center, based on the workflow subtask of each application, distributes the resource node of setting quantity according to user's actual need, the final executive plan forming this application.

Resource node is the virtual server of multiple resources that data center utilizes Intel Virtualization Technology to occupy on physical server to comprise CPU, internal memory, hard-disc storage space and transmission bandwidth.Executive plan is one group of dependence running between the resource node of subtask and this resource node.Resource node is the base unit to workflow Resources allocation, namely utilizes the virtual server that Intel Virtualization Technology generates.Dependence between resource node represents the data transmission produced between virtual server; Therefore the set of resource node number that each subtask that the executive plan in Fig. 2 is workflow and workflow uses.

Step 2, the following characteristic according to the work flow diagram determination workflow of application deployment: the computing power of subtask and the cost performance of the type resource node during deployment subtask in the critical path of workflow, the type using resource node, workflow; Because of the resource requirement of subtask each in work flow diagram and computing power different, from this work flow diagram, search out a critical path according to graph theory knowledge, as the Resourse Distribute foundation of subtask resource node Optimization Scheduling.

Work flow diagram critical path of the present invention is the paths in the work flow diagram calculated according to the data-handling capacity of subtask, critical path is the subset on limit between whole subtask of a workflow and each subtask, the data processing time of this subset is the longest, and determines the deadline of whole workflow by it.The data processing time of each subset is the cumulative sum in the execution time of whole subtask and the transmission time on every bar limit.

Resource node type is the dissimilar resource node used to the configuration of workflow subtask, and the configuration parameter of resource node is described with the CPU of the server used, internal memory and hard-disc storage space and transmission bandwidth, different resource configuration parameters requires the rent paying the different amount of money.

The computing power of workflow subtask is when setting this subtask of resource node deploy of type, the data volume that can process in the unit interval;

The cost performance that resource node is rented in subtask is the data processing speed of the setting subtask that cost unit fund can buy, and is also called speed-resource and compares: the relation between the executive capability of subtask self and the resource node rental expenses needed for it; And use formula describe it, wherein, δ is speed-resource ratio, the computing power that v and c is respectively this subtask and the resource node expense of renting thereof of an application subtask.

In fact, the inventive method is exactly differentiate the difference existed between each subtask of workflow in data center, determines the key factor of influential system performance, and the feature according to workflow formulates the resource scheduling scheme optimized.Finally, all types of system overheads of data center are made to realize minimizing from the angle improving cost performance.

Step 3, data center, according to the dependence between each workflow subtask characteristic and each subtask, determines the data transmission period in work flow diagram.

Step 4, merge according to the two-terminal task of optimization aim to the limit that the subtask of work flow diagram selects the transmission time maximum, generate new compound subtask, reduce data transmission period between subtask: actual deployment is by two sub-task deployments in same physical machine, reduce the data transmission period expense between the resource node disposed two subtasks.This step 4 comprises following content of operation:

(41) calculate the workload of the data transmission between each workflow subtask of applying respectively, the workload of data transmission is larger, then the time loss for transmitting is larger, and the rent spent is higher.

(42) for reducing the data transmission period between subtask, using data transmission workload between subtask as the index weighing transmission cost, successively the subtask at the two ends, the limit maximum transmission time in work flow diagram is merged from big to small, become new compound subtask.

When merging the workflow subtask of application, under different constrained objectives, the merging method of this step is also different:

(A) under setting-up time constraint condition, the merging method using minimum of resources node to complete subtask is: the critical path first determining work flow diagram, according to the data transmission period of subtask in the critical path path merging method according to the described step (42) that uses in order from high to low, new compound subtask is merged in subtask in critical path, reduces the time loss of data transmission.

(B) under the constraint condition of resource node setting quantity, the merging method shortening the subtask deadline of application workflow is: first descending sort is carried out according to the length of data transmission period in the limit between workflow subtask each in work flow diagram, re-use the subtask merging method of described step (42), subtask is merged, until do not have subtask and other non-composite subtask to merge in workflow.

(43) because the compound subtask in regulation merging process no longer merges with other subtasks, therefore when needing to merge with other non-composite subtask without any subtask, merging process terminates.

Step 5, adjustment work flow diagram; The demand proposed according to user and constraint condition are user resource allocation node; When for targeted customer's Resources allocation node, decide how to each subtask Resources allocation node according to optimization aim by the cost performance comparing subtask, the final executive plan generating application.This step 5 comprises following content of operation:

(51) structure of the work flow diagram after adjustment merging, and determine the critical path of new work flow diagram.

(52) according to optimization aim, under setting-up time constraint condition, perform resource node Optimization Scheduling: be first workflow critical path on subtask Resources allocation node, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask.

In step (52), the target performing resource node Optimization Scheduling under setting-up time constraint condition is: the deadline that workflow processing often organizes data can not exceed user's setting-up time, and the deadline of this workflow comprises data transmission period between the data processing time of workflow subtask, subtask and subtask internal data waits for the processing time.Dispatching method now comprises following content of operation (as shown in Figure 3):

(52A) according to amalgamation result, the work flow diagram that induction-arrangement makes new advances, then according to the time constraint condition of workflow, optimize the data latency time of the inside, subtask in critical path.

(52B) according to the computing formula of the workflow deadline T of setting: T=T _exe+ T _trans+ T _wait, to the subtask Resources allocation node in the critical path of workflow, determine data processing time, and T is not more than user's setting-up time T _total.In formula,

The data latency time T of inside, subtask _waitlevel off to 0;

The data processing time T of subtask _exe=t ₁+ t ₂+ ... + t _i+ ... + t _d, natural number i is the sequence number of subtask, and its maximal value is d; D _ibe the data volume that i-th subtask needs process, v _ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω _iit is the resource node number that i-th sub-task matching arrives;

Data transmission period T between subtask _trans=t ₁+ t ₂+ ... + t _j+ ... + t _e, natural number j is the sequence number on limit, and its maximal value is e, D _jfor jth bar limit needs the data volume of transmission, v _jit is the data rate on a jth limit.

Now, formula is followed: wherein constraint condition is: T _exe≤ T _total-T _transand w _i≤ b _irule, to the subtask Resources allocation node of critical path, and the resource node quantity of this critical path subtask must be averaged out between resource node rental expenses and corresponding computing power, namely the current resource node quantity determined with buy these rental expenses needed for resource node executive capability and realize minimizing; Wherein, ω _i, δ _iand b _ibe respectively the resource node quantity of the resource node number of i-th sub-required by task, resource-velocity ratio and setting, namely can distribute to the maximum resource nodes of i-th task, and w _j≤ b _j.

(52C) determine the resource node of the minimum number of other subtask in work flow diagram, generate the executive plan of application.

Perform side by side in time with above-mentioned steps (52), another operation steps alternative be:

(53) according to optimization aim, under the constraint condition of resource node setting quantity, perform resource node Optimization Scheduling: utilize exhaustive method to be preferably subtask Resources allocation node in the critical path of workflow, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask.

Wherein, to the rule of the exhaustive distribution method of the resource node of workflow subtask be: be first no more than user at resource node sum and set on quantity basis, choose the alternatives of total cost performance higher than setting numerical value of resource node, delete the scheme spending financial charges more in wherein feasible scheme again, after reducing Scheme Choice workload, therefrom choose the scheme that in alternatives, the workflow final deadline is minimum.

The target that this step (53) performs resource node Optimization Scheduling under setting resource node number constraint condition is: the resource node using limited quantity, makes workflow processing often organize deadline of data the shortest, the fastest.The dispatching method of this step comprises following content of operation (shown in Figure 4):

(53A) according to the amalgamation result on limit, subtask, the work flow diagram that induction-arrangement makes new advances, resets stand-by period constraint condition in workflow.

(53B) according to the computing formula of the workflow deadline T of setting: T=T _exe+ T _trans+ T _wait, to the subtask Resources allocation node of workflow critical path, determine data processing time; Wherein,

The data processing time T of the subtask of workflow _exe=t ₁+ t ₂+ ... + t _i+ ... + t _d, natural number i is subtask sequence number, and its maximal value is d, D _ibe the data volume of i-th subtask process, v _ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω _iit is the resource node number that i-th sub-task matching arrives;

Data transmission period T between subtask _trans=t ₁+ t ₂+ ... + t _j+ ... + t _e, natural number j is the sequence number on limit, and its maximal value is e, D _jfor the data total amount of jth bar limit transmission, v _jit is the data rate on jth bar limit;

T _waitfor the data latency time of inside, subtask, its numerical value levels off to 0;

Now, adopt exhaustive way to choose optimal case in the scheme that resource node cost performance is higher, to make the final deadline of workflow the fastest, reduce the use cost of system resource node simultaneously;

The target setting function of this dispatching method is: min (T _exe+ T _wait+ T _trans), its constraint condition is: with ω _i≤ b _i; In formula, w _ibe the resource node number that i-th sub-task matching arrives, all subtasks resource node sum should be not more than setting resource node sum N,

Objective function utilizes total cost performance of resource node inequality reduce the screening scope of optimum solution; Wherein, for the current resource node number determined and the ratio of rental expenses needed for the executive capability buying these resource nodes, M is the minimum standard of the total cost performance of resource node of default, is also the resource node rental expenses numerical ceiling purchasing unit computing power; This step is that attempt is used for buying subtask as much as possible data-handling capacity by the least possible resource node rental, shortens the final deadline of workflow.

Inventions have been and repeatedly implement test, the result of test is successful, achieves goal of the invention.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims

1. in a data center based on the optimizing and scheduling task method of workflow critical path, it is characterized in that: data center receives user and submits to and need to dispose on the data centre, during the application carrying out by data center the data stream that processes and the workflow of these data analysis is formed, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node expense optimization or the workflow of renting: namely realize the data-handling capacity that unit fund buys and maximize and the rental expenses of saving resource, or improve cost performance and strengthen running efficiency of system and reduce processing time of workflow, generate resource node executive plan simultaneously, the method comprises following operative step:

2. method according to claim 1, is characterized in that: described workflow defining is one group of subtask and dependence thereof, i.e. the logic implementation of an application described with work flow diagram; Described work flow diagram is a width directed acyclic graph, and the node in figure represents the subtask of workflow, and connecting line or limit represent the dependence between each subtask, the data transmission namely between subtask; The subtask of workflow is the minimum unit analyzing data in application, and different subtask also exists difference for the user demand of resource node and the processing power of resource node; Data center, based on the workflow subtask of each application, distributes the resource node of setting quantity according to user's actual need, the final executive plan forming this application.

3. method according to claim 1, is characterized in that: described resource node is the virtual server of multiple resources that data center utilizes Intel Virtualization Technology to occupy on physical server to comprise CPU, internal memory, hard-disc storage space and transmission bandwidth; Described executive plan is one group of dependence running between the resource node of subtask and this resource node; Resource node is the base unit to workflow Resources allocation, namely utilizes the virtual server that Intel Virtualization Technology generates; Dependence between resource node represents the data transmission produced between virtual server; Therefore the set of resource node number that each subtask that executive plan is workflow and workflow uses.

4. method according to claim 1, it is characterized in that: the critical path of described work flow diagram is the paths in the work flow diagram calculated according to the data-handling capacity of subtask, critical path is the subset on limit between whole subtask of a workflow and each subtask, the data processing time of this subset is the longest, and determines the deadline of whole workflow by it; The data processing time of described subset is the cumulative sum in the execution time of whole subtask and the transmission time on every bar limit;

Described resource node type is the dissimilar resource node used to the configuration of workflow subtask, and the configuration parameter of resource node is described with the CPU of the server used, internal memory and hard-disc storage space and transmission bandwidth, different resource configuration parameters requires the rent paying the different amount of money;

The computing power of described workflow subtask is when setting this subtask of resource node deploy of type, the data volume that can process in the unit interval;

The cost performance that resource node is rented in described subtask is the data processing speed of the setting subtask that cost unit fund can buy, and is also called speed-resource and compares: the relation between the executive capability of subtask self and the resource node rental expenses needed for it; And use formula describe it, wherein, δ is speed-resource ratio, the computing power that v and c is respectively this subtask and the resource node expense of renting thereof of an application subtask.

5. method according to claim 1, is characterized in that: described step 4 comprises following content of operation:

(41) calculate the workload of the data transmission between each workflow subtask of applying respectively, the workload of data transmission is larger, then the time loss for transmitting is larger, and the rent spent is higher;

(42) for reducing the data transmission period between subtask, using data transmission workload between subtask as the index weighing transmission cost, successively the subtask at the two ends, the limit maximum transmission time in work flow diagram is merged from big to small, become new compound subtask;

6. method according to claim 5, is characterized in that: when the workflow subtask of described application merges, and under different constrained objectives, its merging method is also different:

(A) under setting-up time constraint condition, the method using minimum of resources node to merge subtask is: the critical path first determining work flow diagram, according to the data transmission period of subtask in the critical path path merging method according to the described step (42) that uses in order from high to low, new compound subtask is merged in subtask in critical path, reduces the time loss of data transmission;

(B) under resource node setting number constraint condition, the merging method shortening the subtask deadline of application workflow is: first descending sort is carried out according to the length of data transmission period in the limit between workflow subtask each in work flow diagram, re-use the subtask merging method of described step (42), subtask is merged, until do not have subtask and other non-composite subtask to merge in workflow.

7. method according to claim 1, is characterized in that: described step 5 comprises following content of operation:

(51) structure of the work flow diagram after adjustment merging, and determine the critical path of new work flow diagram;

(52) according to optimization aim, under setting-up time constraint condition, perform resource node Optimization Scheduling: be first workflow critical path on subtask Resources allocation node, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask; Or

8. method according to claim 7, it is characterized in that: in described step (52), the target performing resource node Optimization Scheduling under setting-up time constraint condition is: the deadline that workflow processing often organizes data can not exceed user's setting-up time, and the deadline of this workflow comprises data transmission period between the data processing time of workflow subtask, subtask and subtask internal data waits for the processing time; The dispatching method of this step (52) comprises following content of operation:

(52A) according to amalgamation result, the work flow diagram that induction-arrangement makes new advances, then according to the time constraint condition of workflow, optimize the data latency time of the inside, subtask in critical path;

(52B) according to the computing formula of the workflow deadline T of setting: T=T _exe+ T _trans+ T _wait, to the subtask Resources allocation node in the critical path of workflow, determine data processing time, and T is not more than user's setting-up time T _total; In formula,

The data latency time T of inside, subtask _waitlevel off to 0;

Data transmission period T between subtask _trans=t ₁+ t ₂+ ... + t _j+ ... + t _e, natural number j is the sequence number on limit, and its maximal value is e, D _jand v _jbe respectively data volume and data rate that jth bar limit needs transmission;

Now, formula is followed: wherein constraint condition is: T _exe≤ T _total-T _transand w _i≤ b _irule, to the subtask Resources allocation node of critical path, and the resource node quantity of this critical path subtask must be averaged out between resource node rental expenses and corresponding computing power, namely the current resource node quantity determined with buy these rental expenses needed for resource node executive capability and realize minimizing; Wherein, ω _i, δ _iand b _ibe respectively the resource node quantity of the resource node number of i-th sub-required by task, resource-velocity ratio and setting, namely can distribute to the maximum resource nodes of i-th task, and w _j≤ b _j;

9. method according to claim 7, it is characterized in that: in described step (53), the target performing resource node Optimization Scheduling under setting resource node number constraint condition is: the resource node using limited quantity, makes workflow processing often organize deadline of data the shortest, the fastest; The dispatching method of this step (53) comprises following content of operation:

(53A) according to the amalgamation result on limit, subtask, the work flow diagram that induction-arrangement makes new advances, resets stand-by period constraint condition in workflow;

10. method according to claim 9, it is characterized in that: the rule of the exhaustive distribution method of the described resource node to workflow subtask is as follows: be first no more than user at resource node sum and set on quantity basis, choose the alternatives of total cost performance higher than setting numerical value of resource node, delete the scheme spending financial charges more in wherein feasible scheme again, after reducing Scheme Choice workload, therefrom choose the scheme that in alternatives, the workflow final deadline is minimum.