CN104239141A - Task optimized-scheduling method in data center on basis of critical paths of workflow - Google Patents

Task optimized-scheduling method in data center on basis of critical paths of workflow Download PDF

Info

Publication number
CN104239141A
CN104239141A CN201410452173.7A CN201410452173A CN104239141A CN 104239141 A CN104239141 A CN 104239141A CN 201410452173 A CN201410452173 A CN 201410452173A CN 104239141 A CN104239141 A CN 104239141A
Authority
CN
China
Prior art keywords
subtask
workflow
resource node
data
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410452173.7A
Other languages
Chinese (zh)
Other versions
CN104239141B (en
Inventor
马华东
高一鸿
张海涛
丁鸿凯
赵纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201410452173.7A priority Critical patent/CN104239141B/en
Publication of CN104239141A publication Critical patent/CN104239141A/en
Application granted granted Critical
Publication of CN104239141B publication Critical patent/CN104239141B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a task optimized-scheduling method in a data center on the basis of critical paths of a workflow. The task optimized scheduling method comprises the following steps of according to the difference of all subtasks in the applied workflow, determining key factors influencing the system performance by the data center, comparing the cost performance of all the subtasks according to an optimization objective and the workflow characteristics, and determining resource nodes distributed to all the subtasks so as to provide two scheduling methods of the resource nodes on the basis of different requirements of users and enable the rent of the resource nodes to be optimized or the processing time of the workflow to be shortest, so the maximum data processing capability purchased by unit fund is realized while the rent expense of resources is saved, or the cost performance of the system is improved, the operating efficiency of the system is enhanced, the processing time of the workflow is reduced, and simultaneously, the execution plan of the resource nodes is generated. The task optimized-scheduling method has the beneficial effects that the two factors of the time and the quantity of the nodes in the rent cost of resource nodes are both considered, the minimized rent expense of the resource nodes is used for purchasing the resource nodes with the strongest computing capability; a mathematical model of the method can adapt to various types of applications, and the operating cost of the applications is greatly reduced.

Description

Based on the optimizing and scheduling task method of workflow critical path in data center
Technical field
The present invention relates to a kind of optimizing and scheduling task method based on workflow critical path in data center, belong to the technical field of cloud computing.
Background technology
For the main research that the resource node dispatching method of task is in field of cloud computer technology in data center.During the application that user submits to needs to dispose on the data centre to data center, this application is by the data stream needing data center to carry out processing and formed the workflow of these data analysis.The data stream that wherein user needs data center to carry out processing may be single data stream, and more susceptible condition is multiple data stream.Resource node dispatching method is responsible for distributing resource node needed for data center processing data to user, and guarantees that each resource node distributing to user is by utilization effectively and reasonably.Resource node is the virtual server utilizing Intel Virtualization Technology to configure, the data processing task that this virtual server deploy user submits to.Data center mainly pays close attention to two aspects for the resource node dispatching method of task: the resource node demand forecast of user and ensuring method, and data center is to resource node Optimization Scheduling.Wherein:
The resource node demand forecast of user and service guarantees method: data center, by the resource node to the forecast dispatching respective numbers of user's request, spends with the rent of saving resource node; Also rationally to be arranged on the physical region in server cluster to resource node, ensure that user realizes the target in service level.In order to reach the target of resource node Optimum utilization better, data center resource node scheduling method wishes to use minimum resource node to meet the demand for services of user.Due in running environment, the process workload data of user to the resource node demand of data center, i.e. its application service is among change all the time, causes user to there is gap to the expectation of service and the real resource node service condition of active user.Therefore, the resource node dispatching method of data center must utilize the workload of a series of mathematical methods such as probability statistics predicted application service processing data quickly and accurately, and then estimates the resource node quantity that needs to use.Meanwhile, carry out correspondence adjustment according to following a period of time user to the characteristics of demand of resource node to the deployment allocative decision of the resource node of data center on physical server, guarantee data center promises to undertake and can not to change to the service quality of user.
Data center is to the Optimization Scheduling of resource node: data center is when disposing multiple allocated resources node thereon, the resource contention of generation and optimization problem and corresponding proposition is optimized dispatching method to resource node.Its target is that the resource node of data center can be obtained filling Appropriate application, and its main method can be two kinds: the Optimization Scheduling based on priority and the Optimization Scheduling based on workflow.Wherein:
Resource node Optimization Scheduling based on priority is: the different characteristics for the subtask forming application in data center is come Resources allocation node, namely under some type constraint condition (as the resource utilization of data center resource node, the running environment etc. of data center systems software), the own characteristic of subtask is quantified as corresponding weights, and sorts according to the weights size of subtask.Then, data center using the weights of subtask order as the dispatching sequence of resource node, by the resource node priority allocation of suitable quantity to the highest subtask of priority.
Optimization Scheduling based on application workflow is: use the workflow of application to describe the application implementation of load, and the method distributes to the resource node of each subtask right quantity in workflow.Wherein, the workflow of application for describing deployment complicated applications on the data centre, and is made up of the dependence between the subtask of workflow and each subtask.In concrete resource node Optimization Scheduling, the workflow of application is a directed acyclic graph by formalized description, is formed the executive plan of application by the resource node subtask of this work flow diagram being distributed to right quantity.And be all the resource node Optimized Operation based on workflow to the adjustment and optimisation of this work flow diagram.
At present, resource node Optimization Scheduling based on the workflow of application is optimized according to the work flow diagram of application, obtain between the constraint condition and other constraint condition (comprising time constraint condition, power consumption constraint or equally loaded etc.) of work flow diagram self and balance, optimization process is performed to the execution efficiency of the resource node rental expenses or resource node of distributing to workflow.Some resource node Optimization scheduling algorithms are under setting constraint condition, not only can provide the resource node Optimized Operation scheme of application, the optimal case that the resource node that can also provide this kind of optimization problem is rented, namely provides a Pareto optimization curve to indicate the optimum solution distribution situation of such problem.Based in the resource node Optimization Scheduling of workflow, the own characteristic of the subtask of work flow diagram different (as: the heterogeneous networks structure etc. of execution sequence, hardware system attribute and data center), can cause the optimum solution of resource node Optimized Operation scheme marked change can occur.
In sum, the resource node Optimization Scheduling based on the workflow of application can meet the subtask resource node Optimal Scheduling of complicated applications better, and can significantly improve the effect of optimization of system.
But the weak point of work at present is: do not consider that in the workflow of application, subtask is for the demand difference of data center resource node.Because each subtask realizes different data processing targets, the demand for resource node certainly exists difference.Therefore, under the condition that demand in subtask to resource there are differences, in order to the execution efficiency of renting expense, improving whole system of optimization system resource node, need the proportionate relationship of considering the resource node of each subtask in workflow to be rented to the input and output between cost and its output, namely need to consider that the cost performance of resource node dispatching method is on the impact of system performance.
In addition, existing resource node optimization dispatching method does not consider in the workflow that the application that data center disposes comprises between each subtask otherness.The execution efficiency of application will be affected to the resource node quantity that particular type subtask is distributed, therefore need the importance of all types of subtask in application workflow, and whether the resource node that occupies of subtask contributes to improving system performance and makes effective judgement.These problems all become the focus problem that scientific and technical personnel in the industry pay close attention to.
Summary of the invention
In view of this, the object of this invention is to provide a kind of data center to the optimizing and scheduling task method based on workflow critical path, the present invention proposes based on the demand difference of different subtasks to resource node in application workflow: for two class constraint conditions of system for restricting performance in actual environment, the method for scheduling task that proposition two kinds is different respectively, make the optimization realizing system performance in data center based on the resource node scheduling of application workflow: first, under time constraint condition, namely user is within the data processing time setting deadline of workflow, make to realize the resource node rental expenses of application needed for workflow to minimize, secondly, for the own resource node that data center has determined, and under the condition of setting resource node total quantity, distribute subtask to each resource node, the execution time of subtask is minimized.
In order to achieve the above object, the invention provides a kind of optimizing and scheduling task method based on workflow critical path in data center, it is characterized in that: data center receives user and submits to and need to dispose on the data centre, during the application carrying out by data center the data stream that processes and the workflow of these data analysis is formed, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node expense optimization or the workflow of renting: namely realize the data-handling capacity that unit fund buys and maximize and the rental expenses of saving resource, or improve cost performance and strengthen running efficiency of system and reduce processing time of workflow, generate resource node executive plan simultaneously, the method comprises each operation steps following:
Step 1, data center arranges the resource node Optimal Operation Model of the workflow of each application: the resource node scheduling model of this workflow is the treatment scheme following workflow, according to user's actual need, the subtask in workflow is distributed to resource node, the i.e. virtual server of suitable number, and generate executive plan; Simultaneously based on workflow subtask characteristic, the validity that assessment resource node distributes, under the logical organization prerequisite of not destruction work stream, the resource node usage quantity of Optimization Work stream;
Step 2, the following characteristic according to the work flow diagram determination workflow of application deployment: the computing power of subtask and the cost performance of the type resource node during deployment subtask in the critical path of workflow, the type using resource node, workflow; Because of the resource requirement of subtask each in work flow diagram and computing power different, from this work flow diagram, search out a critical path according to graph theory knowledge, as the Resourse Distribute foundation of subtask resource node Optimization Scheduling;
Step 3, data center, according to the dependence between each workflow subtask characteristic and each subtask, determines the data transmission period in work flow diagram;
Step 4, merge subtask: according to optimization aim, the two-terminal task on the limit that the subtask of work flow diagram selects the transmission time maximum is merged, generate new compound subtask, reduce data transmission period between subtask: actual deployment method is by two sub-task deployments in same physical machine, the data transmission period between the resource node reducing these two subtasks;
Step 5, adjustment work flow diagram; According to user's request and constraint condition, it is user resource allocation node; When for targeted customer's Resources allocation node, decide how to each subtask Resources allocation node according to optimization aim by the cost performance comparing subtask, the final executive plan generating application.
Innovation advantage of the present invention is: data centre dispatching method of the prior art, only optimizes the resource node in Distribution Calculation process for the executive plan of applying.But these dispatching algorithms all have ignored distribute to particular child task type resource node quantity on the impact of system execution efficiency aspect, do not consider that the resource node quantity of task and cost performance will make width in the cost of user's borrowing resource node of application deployment on the data centre rise yet.The inventive method considers two aspects of rent cost simultaneously: time factor and resource node quantity factor, can realize being used for buying computing power at most or the strongest resource node by minimized resource node rental; And the mathematical model that the inventive method is set up can adapt to various types of application well, can solve the system optimization problem under various boundary conditions, and make the operating cost realization decline by a relatively large margin of application.Therefore, the present invention has good popularizing application prospect.
Accompanying drawing explanation
Based on the optimizing and scheduling task method operation steps process flow diagram of workflow critical path in Tu1Shi data center of the present invention.
Fig. 2 is the workflow of application and the structural relation schematic diagram of work program.
Fig. 3 is the present invention's resource node Optimization Scheduling operational flowchart under setting-up time constraint condition.
Fig. 4 is the present invention's resource node Optimization Scheduling operational flowchart under setting resource node number constraint condition.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.
Optimizing and scheduling task method based on workflow critical path in data center of the present invention is: data center receives user and submits to and need to dispose application (this application is by the data stream needing this data center to carry out processing and formed the workflow of these data analysis) on the data centre.Time, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node overhead-optimized or the workflow of renting: the data-handling capacity saving resource namely bought by maximizing unit fund is rented expense or reduced the processing time of workflow, improve cost performance and strengthen running efficiency of system, generate resource node executive plan simultaneously.
See Fig. 1, introduce each concrete operation step following of the inventive method:
Step 1, data center arranges the resource node Optimal Operation Model of the workflow of each application:
The resource node scheduling model of this workflow is the treatment scheme following workflow, according to user's actual need, the subtask in workflow is distributed to resource node, the i.e. virtual server of suitable number, and generates executive plan; Simultaneously based on workflow subtask characteristic, the validity that assessment resource node distributes, under the logical organization prerequisite of not destruction work stream, realizes optimization process to the resource service condition of the subtask of composition application; And both contributed to the characteristic describing application subtask, evaluation work properties of flow can be helped again
See Fig. 2, introduce a kind of resource dispatching model for data centric workflows provided by the invention, this scheduling model comprises two parts: the workflow of application and the executive plan of application, workflow defining is one group of subtask and dependence thereof by the present invention, i.e. the logic implementation of an application described with work flow diagram.Work flow diagram is a width directed acyclic graph, and the node in figure represents the subtask of workflow, and connecting line or limit represent the dependence between each subtask, the data transmission namely between subtask.The subtask of workflow is the minimum unit analyzing data in application, and different subtask also exists difference for the user demand of resource node and the processing power of resource node.Data center, based on the workflow subtask of each application, distributes the resource node of setting quantity according to user's actual need, the final executive plan forming this application.
Resource node is the virtual server of multiple resources that data center utilizes Intel Virtualization Technology to occupy on physical server to comprise CPU, internal memory, hard-disc storage space and transmission bandwidth.Executive plan is one group of dependence running between the resource node of subtask and this resource node.Resource node is the base unit to workflow Resources allocation, namely utilizes the virtual server that Intel Virtualization Technology generates.Dependence between resource node represents the data transmission produced between virtual server; Therefore the set of resource node number that each subtask that the executive plan in Fig. 2 is workflow and workflow uses.
Step 2, the following characteristic according to the work flow diagram determination workflow of application deployment: the computing power of subtask and the cost performance of the type resource node during deployment subtask in the critical path of workflow, the type using resource node, workflow; Because of the resource requirement of subtask each in work flow diagram and computing power different, from this work flow diagram, search out a critical path according to graph theory knowledge, as the Resourse Distribute foundation of subtask resource node Optimization Scheduling.
Work flow diagram critical path of the present invention is the paths in the work flow diagram calculated according to the data-handling capacity of subtask, critical path is the subset on limit between whole subtask of a workflow and each subtask, the data processing time of this subset is the longest, and determines the deadline of whole workflow by it.The data processing time of each subset is the cumulative sum in the execution time of whole subtask and the transmission time on every bar limit.
Resource node type is the dissimilar resource node used to the configuration of workflow subtask, and the configuration parameter of resource node is described with the CPU of the server used, internal memory and hard-disc storage space and transmission bandwidth, different resource configuration parameters requires the rent paying the different amount of money.
The computing power of workflow subtask is when setting this subtask of resource node deploy of type, the data volume that can process in the unit interval;
The cost performance that resource node is rented in subtask is the data processing speed of the setting subtask that cost unit fund can buy, and is also called speed-resource and compares: the relation between the executive capability of subtask self and the resource node rental expenses needed for it; And use formula describe it, wherein, δ is speed-resource ratio, the computing power that v and c is respectively this subtask and the resource node expense of renting thereof of an application subtask.
In fact, the inventive method is exactly differentiate the difference existed between each subtask of workflow in data center, determines the key factor of influential system performance, and the feature according to workflow formulates the resource scheduling scheme optimized.Finally, all types of system overheads of data center are made to realize minimizing from the angle improving cost performance.
Step 3, data center, according to the dependence between each workflow subtask characteristic and each subtask, determines the data transmission period in work flow diagram.
Step 4, merge according to the two-terminal task of optimization aim to the limit that the subtask of work flow diagram selects the transmission time maximum, generate new compound subtask, reduce data transmission period between subtask: actual deployment is by two sub-task deployments in same physical machine, reduce the data transmission period expense between the resource node disposed two subtasks.This step 4 comprises following content of operation:
(41) calculate the workload of the data transmission between each workflow subtask of applying respectively, the workload of data transmission is larger, then the time loss for transmitting is larger, and the rent spent is higher.
(42) for reducing the data transmission period between subtask, using data transmission workload between subtask as the index weighing transmission cost, successively the subtask at the two ends, the limit maximum transmission time in work flow diagram is merged from big to small, become new compound subtask.
When merging the workflow subtask of application, under different constrained objectives, the merging method of this step is also different:
(A) under setting-up time constraint condition, the merging method using minimum of resources node to complete subtask is: the critical path first determining work flow diagram, according to the data transmission period of subtask in the critical path path merging method according to the described step (42) that uses in order from high to low, new compound subtask is merged in subtask in critical path, reduces the time loss of data transmission.
(B) under the constraint condition of resource node setting quantity, the merging method shortening the subtask deadline of application workflow is: first descending sort is carried out according to the length of data transmission period in the limit between workflow subtask each in work flow diagram, re-use the subtask merging method of described step (42), subtask is merged, until do not have subtask and other non-composite subtask to merge in workflow.
(43) because the compound subtask in regulation merging process no longer merges with other subtasks, therefore when needing to merge with other non-composite subtask without any subtask, merging process terminates.
Step 5, adjustment work flow diagram; The demand proposed according to user and constraint condition are user resource allocation node; When for targeted customer's Resources allocation node, decide how to each subtask Resources allocation node according to optimization aim by the cost performance comparing subtask, the final executive plan generating application.This step 5 comprises following content of operation:
(51) structure of the work flow diagram after adjustment merging, and determine the critical path of new work flow diagram.
(52) according to optimization aim, under setting-up time constraint condition, perform resource node Optimization Scheduling: be first workflow critical path on subtask Resources allocation node, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask.
In step (52), the target performing resource node Optimization Scheduling under setting-up time constraint condition is: the deadline that workflow processing often organizes data can not exceed user's setting-up time, and the deadline of this workflow comprises data transmission period between the data processing time of workflow subtask, subtask and subtask internal data waits for the processing time.Dispatching method now comprises following content of operation (as shown in Figure 3):
(52A) according to amalgamation result, the work flow diagram that induction-arrangement makes new advances, then according to the time constraint condition of workflow, optimize the data latency time of the inside, subtask in critical path.
(52B) according to the computing formula of the workflow deadline T of setting: T=T exe+ T trans+ T wait, to the subtask Resources allocation node in the critical path of workflow, determine data processing time, and T is not more than user's setting-up time T total.In formula,
The data latency time T of inside, subtask waitlevel off to 0;
The data processing time T of subtask exe=t 1+ t 2+ ... + t i+ ... + t d, natural number i is the sequence number of subtask, and its maximal value is d; D ibe the data volume that i-th subtask needs process, v ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω iit is the resource node number that i-th sub-task matching arrives;
Data transmission period T between subtask trans=t 1+ t 2+ ... + t j+ ... + t e, natural number j is the sequence number on limit, and its maximal value is e, D jfor jth bar limit needs the data volume of transmission, v jit is the data rate on a jth limit.
Now, formula is followed: wherein constraint condition is: T exe≤ T total-T transand w i≤ b irule, to the subtask Resources allocation node of critical path, and the resource node quantity of this critical path subtask must be averaged out between resource node rental expenses and corresponding computing power, namely the current resource node quantity determined with buy these rental expenses needed for resource node executive capability and realize minimizing; Wherein, ω i, δ iand b ibe respectively the resource node quantity of the resource node number of i-th sub-required by task, resource-velocity ratio and setting, namely can distribute to the maximum resource nodes of i-th task, and w j≤ b j.
(52C) determine the resource node of the minimum number of other subtask in work flow diagram, generate the executive plan of application.
Perform side by side in time with above-mentioned steps (52), another operation steps alternative be:
(53) according to optimization aim, under the constraint condition of resource node setting quantity, perform resource node Optimization Scheduling: utilize exhaustive method to be preferably subtask Resources allocation node in the critical path of workflow, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask.
Wherein, to the rule of the exhaustive distribution method of the resource node of workflow subtask be: be first no more than user at resource node sum and set on quantity basis, choose the alternatives of total cost performance higher than setting numerical value of resource node, delete the scheme spending financial charges more in wherein feasible scheme again, after reducing Scheme Choice workload, therefrom choose the scheme that in alternatives, the workflow final deadline is minimum.
The target that this step (53) performs resource node Optimization Scheduling under setting resource node number constraint condition is: the resource node using limited quantity, makes workflow processing often organize deadline of data the shortest, the fastest.The dispatching method of this step comprises following content of operation (shown in Figure 4):
(53A) according to the amalgamation result on limit, subtask, the work flow diagram that induction-arrangement makes new advances, resets stand-by period constraint condition in workflow.
(53B) according to the computing formula of the workflow deadline T of setting: T=T exe+ T trans+ T wait, to the subtask Resources allocation node of workflow critical path, determine data processing time; Wherein,
The data processing time T of the subtask of workflow exe=t 1+ t 2+ ... + t i+ ... + t d, natural number i is subtask sequence number, and its maximal value is d, D ibe the data volume of i-th subtask process, v ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω iit is the resource node number that i-th sub-task matching arrives;
Data transmission period T between subtask trans=t 1+ t 2+ ... + t j+ ... + t e, natural number j is the sequence number on limit, and its maximal value is e, D jfor the data total amount of jth bar limit transmission, v jit is the data rate on jth bar limit;
T waitfor the data latency time of inside, subtask, its numerical value levels off to 0;
Now, adopt exhaustive way to choose optimal case in the scheme that resource node cost performance is higher, to make the final deadline of workflow the fastest, reduce the use cost of system resource node simultaneously;
The target setting function of this dispatching method is: min (T exe+ T wait+ T trans), its constraint condition is: with ω i≤ b i; In formula, w ibe the resource node number that i-th sub-task matching arrives, all subtasks resource node sum should be not more than setting resource node sum N,
Objective function utilizes total cost performance of resource node inequality reduce the screening scope of optimum solution; Wherein, for the current resource node number determined and the ratio of rental expenses needed for the executive capability buying these resource nodes, M is the minimum standard of the total cost performance of resource node of default, is also the resource node rental expenses numerical ceiling purchasing unit computing power; This step is that attempt is used for buying subtask as much as possible data-handling capacity by the least possible resource node rental, shortens the final deadline of workflow.
Inventions have been and repeatedly implement test, the result of test is successful, achieves goal of the invention.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (10)

1. in a data center based on the optimizing and scheduling task method of workflow critical path, it is characterized in that: data center receives user and submits to and need to dispose on the data centre, during the application carrying out by data center the data stream that processes and the workflow of these data analysis is formed, data center is according to the difference in the workflow of application between each subtask, determine the key factor of influential system performance, according to optimization aim and workflow feature, the comparison of cost performance is performed to each subtask, determine the resource node distributing to each subtask, to make based on the effective scheduling resource node of the different demands of user, make the processing time optimization of resource node expense optimization or the workflow of renting: namely realize the data-handling capacity that unit fund buys and maximize and the rental expenses of saving resource, or improve cost performance and strengthen running efficiency of system and reduce processing time of workflow, generate resource node executive plan simultaneously, the method comprises following operative step:
Step 1, data center arranges the resource node Optimal Operation Model of the workflow of each application: the resource node scheduling model of this workflow is the treatment scheme following workflow, according to user's actual need, the subtask in workflow is distributed to resource node, the i.e. virtual server of suitable number, and generate executive plan; Simultaneously based on workflow subtask characteristic, the validity that assessment resource node distributes, under the logical organization prerequisite of not destruction work stream, the resource node usage quantity of Optimization Work stream;
Step 2, the following characteristic according to the work flow diagram determination workflow of application deployment: the computing power of subtask and the cost performance of the type resource node during deployment subtask in the critical path of workflow, the type using resource node, workflow; Because of the resource requirement of subtask each in work flow diagram and computing power different, from this work flow diagram, search out a critical path according to graph theory knowledge, as the Resourse Distribute foundation of subtask resource node Optimization Scheduling;
Step 3, data center, according to the dependence between each workflow subtask characteristic and each subtask, determines the data transmission period in work flow diagram;
Step 4, merge subtask: according to optimization aim, the two-terminal task on the limit that the subtask of work flow diagram selects the transmission time maximum is merged, generate new compound subtask, reduce data transmission period between subtask: actual deployment method is by two sub-task deployments in same physical machine, the data transmission period between the resource node reducing these two subtasks;
Step 5, adjustment work flow diagram; According to user's request and constraint condition, it is user resource allocation node; When for targeted customer's Resources allocation node, decide how to each subtask Resources allocation node according to optimization aim by the cost performance comparing subtask, the final executive plan generating application.
2. method according to claim 1, is characterized in that: described workflow defining is one group of subtask and dependence thereof, i.e. the logic implementation of an application described with work flow diagram; Described work flow diagram is a width directed acyclic graph, and the node in figure represents the subtask of workflow, and connecting line or limit represent the dependence between each subtask, the data transmission namely between subtask; The subtask of workflow is the minimum unit analyzing data in application, and different subtask also exists difference for the user demand of resource node and the processing power of resource node; Data center, based on the workflow subtask of each application, distributes the resource node of setting quantity according to user's actual need, the final executive plan forming this application.
3. method according to claim 1, is characterized in that: described resource node is the virtual server of multiple resources that data center utilizes Intel Virtualization Technology to occupy on physical server to comprise CPU, internal memory, hard-disc storage space and transmission bandwidth; Described executive plan is one group of dependence running between the resource node of subtask and this resource node; Resource node is the base unit to workflow Resources allocation, namely utilizes the virtual server that Intel Virtualization Technology generates; Dependence between resource node represents the data transmission produced between virtual server; Therefore the set of resource node number that each subtask that executive plan is workflow and workflow uses.
4. method according to claim 1, it is characterized in that: the critical path of described work flow diagram is the paths in the work flow diagram calculated according to the data-handling capacity of subtask, critical path is the subset on limit between whole subtask of a workflow and each subtask, the data processing time of this subset is the longest, and determines the deadline of whole workflow by it; The data processing time of described subset is the cumulative sum in the execution time of whole subtask and the transmission time on every bar limit;
Described resource node type is the dissimilar resource node used to the configuration of workflow subtask, and the configuration parameter of resource node is described with the CPU of the server used, internal memory and hard-disc storage space and transmission bandwidth, different resource configuration parameters requires the rent paying the different amount of money;
The computing power of described workflow subtask is when setting this subtask of resource node deploy of type, the data volume that can process in the unit interval;
The cost performance that resource node is rented in described subtask is the data processing speed of the setting subtask that cost unit fund can buy, and is also called speed-resource and compares: the relation between the executive capability of subtask self and the resource node rental expenses needed for it; And use formula describe it, wherein, δ is speed-resource ratio, the computing power that v and c is respectively this subtask and the resource node expense of renting thereof of an application subtask.
5. method according to claim 1, is characterized in that: described step 4 comprises following content of operation:
(41) calculate the workload of the data transmission between each workflow subtask of applying respectively, the workload of data transmission is larger, then the time loss for transmitting is larger, and the rent spent is higher;
(42) for reducing the data transmission period between subtask, using data transmission workload between subtask as the index weighing transmission cost, successively the subtask at the two ends, the limit maximum transmission time in work flow diagram is merged from big to small, become new compound subtask;
(43) because the compound subtask in regulation merging process no longer merges with other subtasks, therefore when needing to merge with other non-composite subtask without any subtask, merging process terminates.
6. method according to claim 5, is characterized in that: when the workflow subtask of described application merges, and under different constrained objectives, its merging method is also different:
(A) under setting-up time constraint condition, the method using minimum of resources node to merge subtask is: the critical path first determining work flow diagram, according to the data transmission period of subtask in the critical path path merging method according to the described step (42) that uses in order from high to low, new compound subtask is merged in subtask in critical path, reduces the time loss of data transmission;
(B) under resource node setting number constraint condition, the merging method shortening the subtask deadline of application workflow is: first descending sort is carried out according to the length of data transmission period in the limit between workflow subtask each in work flow diagram, re-use the subtask merging method of described step (42), subtask is merged, until do not have subtask and other non-composite subtask to merge in workflow.
7. method according to claim 1, is characterized in that: described step 5 comprises following content of operation:
(51) structure of the work flow diagram after adjustment merging, and determine the critical path of new work flow diagram;
(52) according to optimization aim, under setting-up time constraint condition, perform resource node Optimization Scheduling: be first workflow critical path on subtask Resources allocation node, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask; Or
(53) according to optimization aim, under the constraint condition of resource node setting quantity, perform resource node Optimization Scheduling: utilize exhaustive method to be preferably subtask Resources allocation node in the critical path of workflow, then the resource node of the minimum number needed for optimization aim can be reached for the distribution of remaining subtask.
8. method according to claim 7, it is characterized in that: in described step (52), the target performing resource node Optimization Scheduling under setting-up time constraint condition is: the deadline that workflow processing often organizes data can not exceed user's setting-up time, and the deadline of this workflow comprises data transmission period between the data processing time of workflow subtask, subtask and subtask internal data waits for the processing time; The dispatching method of this step (52) comprises following content of operation:
(52A) according to amalgamation result, the work flow diagram that induction-arrangement makes new advances, then according to the time constraint condition of workflow, optimize the data latency time of the inside, subtask in critical path;
(52B) according to the computing formula of the workflow deadline T of setting: T=T exe+ T trans+ T wait, to the subtask Resources allocation node in the critical path of workflow, determine data processing time, and T is not more than user's setting-up time T total; In formula,
The data latency time T of inside, subtask waitlevel off to 0;
The data processing time T of subtask exe=t 1+ t 2+ ... + t i+ ... + t d, natural number i is the sequence number of subtask, and its maximal value is d; D ibe the data volume that i-th subtask needs process, v ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω iit is the resource node number that i-th sub-task matching arrives;
Data transmission period T between subtask trans=t 1+ t 2+ ... + t j+ ... + t e, natural number j is the sequence number on limit, and its maximal value is e, D jand v jbe respectively data volume and data rate that jth bar limit needs transmission;
Now, formula is followed: wherein constraint condition is: T exe≤ T total-T transand w i≤ b irule, to the subtask Resources allocation node of critical path, and the resource node quantity of this critical path subtask must be averaged out between resource node rental expenses and corresponding computing power, namely the current resource node quantity determined with buy these rental expenses needed for resource node executive capability and realize minimizing; Wherein, ω i, δ iand b ibe respectively the resource node quantity of the resource node number of i-th sub-required by task, resource-velocity ratio and setting, namely can distribute to the maximum resource nodes of i-th task, and w j≤ b j;
(52C) determine the resource node of the minimum number of other subtask in work flow diagram, generate the executive plan of application.
9. method according to claim 7, it is characterized in that: in described step (53), the target performing resource node Optimization Scheduling under setting resource node number constraint condition is: the resource node using limited quantity, makes workflow processing often organize deadline of data the shortest, the fastest; The dispatching method of this step (53) comprises following content of operation:
(53A) according to the amalgamation result on limit, subtask, the work flow diagram that induction-arrangement makes new advances, resets stand-by period constraint condition in workflow;
(53B) according to the computing formula of the workflow deadline T of setting: T=T exe+ T trans+ T wait, to the subtask Resources allocation node of workflow critical path, determine data processing time; Wherein,
The data processing time T of the subtask of workflow exe=t 1+ t 2+ ... + t i+ ... + t d, natural number i is subtask sequence number, and its maximal value is d, D ibe the data volume of i-th subtask process, v ithe data processing speed of i-th subtask, for describing the computing power of resource node; ω iit is the resource node number that i-th sub-task matching arrives;
Data transmission period T between subtask trans=t 1+ t 2+ ... + t j+ ... + t e, natural number j is the sequence number on limit, and its maximal value is e, D jfor the data total amount of jth bar limit transmission, v jit is the data rate on jth bar limit;
T waitfor the data latency time of inside, subtask, its numerical value levels off to 0;
Now, adopt exhaustive way to choose optimal case in the scheme that resource node cost performance is higher, to make the final deadline of workflow the fastest, reduce the use cost of system resource node simultaneously;
The target setting function of this dispatching method is: min (T exe+ T wait+ T trans), its constraint condition is: with ω i≤ b i; In formula, w ibe the resource node number that i-th sub-task matching arrives, all subtasks resource node sum should be not more than setting resource node sum N,
Objective function utilizes total cost performance of resource node inequality reduce the screening scope of optimum solution; Wherein, for the current resource node number determined and the ratio of rental expenses needed for the executive capability buying these resource nodes, M is the minimum standard of the total cost performance of resource node of default, is also the resource node rental expenses numerical ceiling purchasing unit computing power; This step is that attempt is used for buying subtask as much as possible data-handling capacity by the least possible resource node rental, shortens the final deadline of workflow.
10. method according to claim 9, it is characterized in that: the rule of the exhaustive distribution method of the described resource node to workflow subtask is as follows: be first no more than user at resource node sum and set on quantity basis, choose the alternatives of total cost performance higher than setting numerical value of resource node, delete the scheme spending financial charges more in wherein feasible scheme again, after reducing Scheme Choice workload, therefrom choose the scheme that in alternatives, the workflow final deadline is minimum.
CN201410452173.7A 2014-09-05 2014-09-05 Optimizing and scheduling task method based on workflow critical path in data center Active CN104239141B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410452173.7A CN104239141B (en) 2014-09-05 2014-09-05 Optimizing and scheduling task method based on workflow critical path in data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410452173.7A CN104239141B (en) 2014-09-05 2014-09-05 Optimizing and scheduling task method based on workflow critical path in data center

Publications (2)

Publication Number Publication Date
CN104239141A true CN104239141A (en) 2014-12-24
CN104239141B CN104239141B (en) 2017-07-28

Family

ID=52227272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410452173.7A Active CN104239141B (en) 2014-09-05 2014-09-05 Optimizing and scheduling task method based on workflow critical path in data center

Country Status (1)

Country Link
CN (1) CN104239141B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016107488A1 (en) * 2015-01-04 2016-07-07 华为技术有限公司 Streaming graph optimization method and apparatus
CN106484725A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of data processing method, device and system
CN106845926A (en) * 2016-12-27 2017-06-13 中国建设银行股份有限公司 A kind of Third-party payment supervisory systems distributed data method for stream processing and system
CN107688488A (en) * 2016-08-03 2018-02-13 中国移动通信集团湖北有限公司 A kind of optimization method and device of the task scheduling based on metadata
CN108062243A (en) * 2016-11-08 2018-05-22 杭州海康威视数字技术股份有限公司 Generation method, task executing method and the device of executive plan
EP3348029A4 (en) * 2016-11-18 2018-07-18 Huawei Technologies Co., Ltd. System and method for ensuring quality of service in a compute workflow
CN108667864A (en) * 2017-03-29 2018-10-16 华为技术有限公司 A kind of method and apparatus carrying out scheduling of resource
CN108965167A (en) * 2018-07-19 2018-12-07 郑州云海信息技术有限公司 A kind of distribution method and device of cloud resource
CN109495541A (en) * 2018-10-15 2019-03-19 上海交通大学 Based on the cloud service workflow schedule method across data center
CN109542620A (en) * 2018-11-16 2019-03-29 中国人民解放军陆军防化学院 The scheduling of resource configuration method of associated task stream in a kind of cloud
CN110502343A (en) * 2019-08-23 2019-11-26 深圳市新系区块链技术有限公司 A kind of resource allocation methods, system, device and computer readable storage medium
CN110673939A (en) * 2019-09-23 2020-01-10 汉纳森(厦门)数据股份有限公司 Task scheduling method, device and medium based on airflow and yarn
US10691692B2 (en) 2016-04-29 2020-06-23 Fujitsu Limited Computer-implemented method of executing a query in a network of data centres
CN111800486A (en) * 2020-06-22 2020-10-20 山东大学 Cloud edge cooperative resource scheduling method and system
CN111967994A (en) * 2020-10-20 2020-11-20 支付宝(杭州)信息技术有限公司 Intelligent contract creating method and device
CN112181656A (en) * 2020-09-30 2021-01-05 山东工商学院 Data intensive workflow scheduling method and system
CN112424569A (en) * 2018-08-14 2021-02-26 宝马股份公司 Method and device for planning a route for autonomous driving
CN112835696A (en) * 2021-02-08 2021-05-25 兴业数字金融服务(上海)股份有限公司 Multi-tenant task scheduling method, system and medium
CN113791904A (en) * 2021-09-13 2021-12-14 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for processing query input
CN114064259A (en) * 2020-08-03 2022-02-18 北京超星未来科技有限公司 Resource scheduling method and device for heterogeneous computing resources

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030149685A1 (en) * 2002-02-07 2003-08-07 Thinkdynamics Inc. Method and system for managing resources in a data center
CN103019822A (en) * 2012-12-07 2013-04-03 北京邮电大学 Large-scale processing task scheduling method for income driving under cloud environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030149685A1 (en) * 2002-02-07 2003-08-07 Thinkdynamics Inc. Method and system for managing resources in a data center
CN103019822A (en) * 2012-12-07 2013-04-03 北京邮电大学 Large-scale processing task scheduling method for income driving under cloud environment

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10613909B2 (en) 2015-01-04 2020-04-07 Huawei Technologies Co., Ltd. Method and apparatus for generating an optimized streaming graph using an adjacency operator combination on at least one streaming subgraph
WO2016107488A1 (en) * 2015-01-04 2016-07-07 华为技术有限公司 Streaming graph optimization method and apparatus
CN106484725B (en) * 2015-08-31 2019-08-20 华为技术有限公司 A kind of data processing method, device and system
CN106484725A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of data processing method, device and system
US10691692B2 (en) 2016-04-29 2020-06-23 Fujitsu Limited Computer-implemented method of executing a query in a network of data centres
CN107688488A (en) * 2016-08-03 2018-02-13 中国移动通信集团湖北有限公司 A kind of optimization method and device of the task scheduling based on metadata
CN107688488B (en) * 2016-08-03 2020-10-20 中国移动通信集团湖北有限公司 Metadata-based task scheduling optimization method and device
CN108062243B (en) * 2016-11-08 2022-01-04 杭州海康威视数字技术股份有限公司 Execution plan generation method, task execution method and device
CN108062243A (en) * 2016-11-08 2018-05-22 杭州海康威视数字技术股份有限公司 Generation method, task executing method and the device of executive plan
CN109479024A (en) * 2016-11-18 2019-03-15 华为技术有限公司 System and method for ensuring the service quality of calculation workflow
US10331485B2 (en) 2016-11-18 2019-06-25 Huawei Technologies Co., Ltd. Method and system for meeting multiple SLAS with partial QoS control
EP3348029A4 (en) * 2016-11-18 2018-07-18 Huawei Technologies Co., Ltd. System and method for ensuring quality of service in a compute workflow
CN106845926A (en) * 2016-12-27 2017-06-13 中国建设银行股份有限公司 A kind of Third-party payment supervisory systems distributed data method for stream processing and system
CN108667864A (en) * 2017-03-29 2018-10-16 华为技术有限公司 A kind of method and apparatus carrying out scheduling of resource
CN108667864B (en) * 2017-03-29 2020-07-28 华为技术有限公司 Method and device for scheduling resources
CN108965167A (en) * 2018-07-19 2018-12-07 郑州云海信息技术有限公司 A kind of distribution method and device of cloud resource
CN112424569A (en) * 2018-08-14 2021-02-26 宝马股份公司 Method and device for planning a route for autonomous driving
CN112424569B (en) * 2018-08-14 2024-04-12 宝马股份公司 Method and apparatus for setting up a path plan for autonomous driving
CN109495541A (en) * 2018-10-15 2019-03-19 上海交通大学 Based on the cloud service workflow schedule method across data center
CN109542620A (en) * 2018-11-16 2019-03-29 中国人民解放军陆军防化学院 The scheduling of resource configuration method of associated task stream in a kind of cloud
CN109542620B (en) * 2018-11-16 2021-05-28 中国人民解放军陆军防化学院 Resource scheduling configuration method for associated task flow in cloud
CN110502343A (en) * 2019-08-23 2019-11-26 深圳市新系区块链技术有限公司 A kind of resource allocation methods, system, device and computer readable storage medium
CN110502343B (en) * 2019-08-23 2022-05-06 深圳市新系区块链技术有限公司 Resource allocation method, system, device and computer readable storage medium
CN110673939A (en) * 2019-09-23 2020-01-10 汉纳森(厦门)数据股份有限公司 Task scheduling method, device and medium based on airflow and yarn
CN110673939B (en) * 2019-09-23 2021-12-28 汉纳森(厦门)数据股份有限公司 Task scheduling method, device and medium based on airflow and yarn
CN111800486A (en) * 2020-06-22 2020-10-20 山东大学 Cloud edge cooperative resource scheduling method and system
CN114064259B (en) * 2020-08-03 2024-06-14 北京超星未来科技有限公司 Heterogeneous computing resource-oriented resource scheduling method and device
CN114064259A (en) * 2020-08-03 2022-02-18 北京超星未来科技有限公司 Resource scheduling method and device for heterogeneous computing resources
CN112181656A (en) * 2020-09-30 2021-01-05 山东工商学院 Data intensive workflow scheduling method and system
CN111967994A (en) * 2020-10-20 2020-11-20 支付宝(杭州)信息技术有限公司 Intelligent contract creating method and device
CN112835696B (en) * 2021-02-08 2023-12-05 兴业数字金融服务(上海)股份有限公司 Multi-tenant task scheduling method, system and medium
CN112835696A (en) * 2021-02-08 2021-05-25 兴业数字金融服务(上海)股份有限公司 Multi-tenant task scheduling method, system and medium
CN113791904A (en) * 2021-09-13 2021-12-14 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for processing query input

Also Published As

Publication number Publication date
CN104239141B (en) 2017-07-28

Similar Documents

Publication Publication Date Title
CN104239141A (en) Task optimized-scheduling method in data center on basis of critical paths of workflow
Gao et al. An energy and deadline aware resource provisioning, scheduling and optimization framework for cloud systems
US9218213B2 (en) Dynamic placement of heterogeneous workloads
Chen et al. A profit-aware virtual machine deployment optimization framework for cloud platform providers
Liu et al. Strategy-proof mechanism for provisioning and allocation virtual machines in heterogeneous clouds
Ma et al. Real-time virtual machine scheduling in industry IoT network: A reinforcement learning method
Venkataraman Threshold based multi-objective memetic optimized round robin scheduling for resource efficient load balancing in cloud
CN104407912A (en) Virtual machine configuration method and device
Xu et al. A new approach to the cloud-based heterogeneous MapReduce placement problem
Sun et al. ET2FA: A hybrid heuristic algorithm for deadline-constrained workflow scheduling in cloud
Tammaro et al. Dynamic resource allocation in cloud environment under time-variant job requests
Deng et al. Revenue maximization for dynamic expansion of geo-distributed cloud data centers
Wu et al. Toward designing cost-optimal policies to utilize IaaS clouds with online learning
Sun et al. An energy efficient and runtime-aware framework for distributed stream computing systems
Li et al. A strategy game system for QoS-efficient dynamic virtual machine consolidation in data centers
Theja et al. An evolutionary computing based energy efficient VM consolidation scheme for optimal resource utilization and QoS assurance
Li et al. Energy-efficient and load-aware VM placement in cloud data centers
Panwar et al. Analysis of various task scheduling algorithms in cloud environment
Khoshdel et al. A new approach for optimum resource utilization in cloud computing environments
Thai et al. Algorithms for optimising heterogeneous Cloud virtual machine clusters
Karthick et al. Comparative study of Genetic Algorithm Ant Colony Optimization and Particle Swarm Optimization based job scheduling algorithms for cloud environment
Shrivastava et al. EBTASIC: An Entropy-Based TOPSIS Algorithm for Task Scheduling in IaaS Clouds
Ajmera et al. Dynamic Virtual Machine Scheduling Using Residual Optimum Power-Efficiency In The Cloud Data Center
Rajasekar et al. Adaptive resource provisioning and scheduling algorithm for scientific workflows on IaaS cloud
Xiao An efficiency virtual resource auction mechanism based on reserve-price strategy in cloud environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant