WO2019218169A1 - 一种基于概率分布的工作流资源配置优化方法和系统 - Google Patents

一种基于概率分布的工作流资源配置优化方法和系统 Download PDF

Info

Publication number
WO2019218169A1
WO2019218169A1 PCT/CN2018/086936 CN2018086936W WO2019218169A1 WO 2019218169 A1 WO2019218169 A1 WO 2019218169A1 CN 2018086936 W CN2018086936 W CN 2018086936W WO 2019218169 A1 WO2019218169 A1 WO 2019218169A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
configuration
compared
configurations
workflow
Prior art date
Application number
PCT/CN2018/086936
Other languages
English (en)
French (fr)
Inventor
周池
申丙坤
毛睿
胡梓良
何丙胜
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2018/086936 priority Critical patent/WO2019218169A1/zh
Publication of WO2019218169A1 publication Critical patent/WO2019218169A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of cloud technologies, and in particular, to a workflow resource configuration optimization method and system based on probability distribution.
  • workflow resources can be configured through some existing algorithms, such as dynamic programming algorithms, but this algorithm uses the average performance of cloud platform resources as input to optimize the configuration of workflow resources, and does not consider cloud platform resources.
  • Some algorithms consider the dynamics of cloud platform resource performance, and solve the technical problems of dynamic programming algorithms, such as resource scheduling algorithms based on stochastic models.
  • resource scheduling algorithms based on stochastic models.
  • resource scheduling algorithms based on stochastic models.
  • resource scheduling algorithms based on stochastic models.
  • the configuration of workflow resources through this resource scheduling algorithm needs to be complicated. The calculations and analysis methods are more complicated.
  • Monte Carlo (MC) algorithm to realize optimization of workflow resource configuration based on probability distribution, but adopt Monte Carlo algorithm. It takes a lot of overhead and it is difficult to promote the application in practice.
  • the main purpose of the embodiments of the present invention is to provide a workflow resource configuration optimization method and system based on probability distribution, which can solve the problem of reducing the consumption in the workflow resource configuration optimization process while considering the cloud platform resource performance instability. And how to reduce the computational complexity of the workflow resource configuration optimization process.
  • a first aspect of the embodiments of the present invention provides a workflow resource configuration optimization method based on a probability distribution, where the optimization method includes:
  • the workflow nodes in the same location on the same path in the path optimization set are merged to obtain at least one merge path, and the merge path is A path that does not participate in the merging of the path optimization set is a path to be compared; if there are no paths of the same length in the path optimization set, all paths in the path optimization set are regarded as paths to be compared;
  • Determining a configuration of the set of paths the configuration including setting a type of a virtual machine that processes each workflow node on each path in the set of paths;
  • the first preset comparison manner includes: determining that the workflow nodes that are differently configured on the path to be compared in the two configurations are configured as Comparing the workflow nodes, calculating a running time probability distribution of the workflow nodes to be compared in the two configurations, and determining, according to the two configurations, the running time probability distribution of the workflow nodes to be compared Optimal configuration of the set of paths in the configuration;
  • the running time probability distributions are respectively calculated for each of the to-be-compared paths in each configuration; Determining, according to the runtime probability distribution of each path to be compared, a maximum running time probability distribution of the path set in each configuration; determining the maximum operating time probability distribution based on each configuration of the at least two configurations Optimal configuration of the set of paths in at least two configurations.
  • a workflow resource configuration optimization system based on a probability distribution where the optimization system includes:
  • An acquisition module for obtaining a path set containing all possible paths of the workflow
  • a pruning module configured to delete a path in the path set that does not meet a preset condition by using a pruning algorithm, to obtain a path optimization set
  • a merging module configured to merge the workflow nodes at the same location on each path of the same length in the path optimization set to obtain at least one merge path, if there is a path of the same length in the path optimization set, The merge path and the path that is not involved in the merge in the path optimization set are used as the path to be compared; if there are no paths of the same length in the path optimization set, all the paths in the path optimization set are to be compared path;
  • a configuration module configured to determine a configuration of the set of paths; the configuration includes setting a type of a virtual machine that processes each workflow node on each path in the set of paths;
  • a processing module configured to determine, according to the first preset comparison manner, that the two paths are optimal for the path set, if the number of the to-be-compared paths is one, and the path set has at least two configurations.
  • the principle of the configuration is to obtain an optimal configuration of the path set in the at least two configurations.
  • the first preset comparison manner includes: determining that the configurations allocated on the path to be compared are different in the two configurations.
  • An embodiment of the present invention provides a method and a system for optimizing a workflow resource configuration based on a probability distribution.
  • a path set including all possible paths of a workflow can be obtained; and the running time of the path set is not obtained by the pruning algorithm.
  • a path optimization set is obtained by deleting the path that meets the preset condition.
  • the merged path and the path that is not involved in the merged path are used as the path to be compared; when there are no paths of the same length in the path optimized set, all the paths in the path optimized set are regarded as paths to be compared;
  • the Monte Carlo algorithm is required to calculate all the paths.
  • the path involved in the subsequent processing is reduced by pruning, and the computation complexity and resources are reduced to some extent. Consumption, after pruning and merging, when the path to be compared When there are at least two, and there are at least two configurations of the path set, the running time probability distributions are respectively calculated for each of the paths to be compared under each configuration, based on the running time of each path to be compared under the same configuration.
  • the probability distribution determines a maximum runtime probability distribution of the path set in each configuration; determining an optimal configuration of the path set based on the maximum runtime probability distribution under each configuration of the at least two configurations, based on the pruning and merging
  • the number of paths to be compared is reduced, and the running time probability distribution of each path to be compared can be reused in the process of determining the optimal configuration, thereby reducing computational complexity and overhead, when there is only one path to be compared, and
  • the at least two configurations of the two paths are determined according to the first preset comparison manner, the principle of determining the optimal configuration of the path set in the two configurations is obtained, and the at least two configurations are obtained.
  • the optimal configuration of the path set; the first preset comparison manner includes: determining the distribution on the path to be compared under the two configurations Set the different workflow nodes as the workflow nodes to be compared, calculate the running time probability distribution of the workflow nodes to be compared under the two configurations, and determine the running time probability distribution of the workflow nodes to be compared based on the two configurations.
  • the description of the preset comparison mode can be used to compare the running time probability distributions of the partial paths in the two configurations when the path to be compared is only one. (ie, partial comparison), the method of using such partial comparison greatly reduces the amount of data to be compared, reduces the computational complexity, reduces the consumption, and improves the practicability of the embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method for optimizing a workflow resource configuration based on a probability distribution according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of all possible paths of a workflow W in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of pruning and merging operations on the path of the workflow W in FIG. 2;
  • FIG. 4 is a schematic structural diagram of a workflow resource configuration optimization system based on probability distribution according to an embodiment of the present invention.
  • the method for configuring a workflow resource has a problem of computational complexity or huge consumption.
  • the present invention provides a workflow resource configuration optimization method based on probability distribution, in which the optimization method
  • the pruning algorithm reduces the number of paths involved in the calculation of the workflow resource configuration, which reduces the computational complexity and resource consumption to a certain extent; and based on the partial comparison of the workflow nodes of the paths in the two configurations, two are obtained.
  • the optimal configuration scheme in the configuration greatly reduces the data calculation involved in the comparison process, and effectively reduces the computational complexity and consumption.
  • FIG. 1 is a schematic flowchart of a method for optimizing a workflow resource configuration based on a probability distribution according to a first embodiment of the present invention, where the method includes:
  • Step 101 Acquire a path set including all possible paths of the workflow
  • the path of the workflow is composed of workflow nodes that need to be executed to complete the workflow, and each workflow node represents a task on the workflow, and there may be more than one path of one workflow.
  • Figure 2 is a schematic illustration of all possible paths of the workflow W.
  • FIG. 3 is a schematic diagram showing the pruning and merging operations on the paths in the path set S of the workflow W. As shown in FIG. 2 and FIG. 3, the path set S of the workflow W contains 48 possible paths.
  • Step 102 Deleting a path in the path set whose running time does not meet the preset condition by using a pruning algorithm to obtain a path optimization set; wherein the paths in the path optimization set have the same length;
  • the embodiment optimizes the path set based on the pruning algorithm, and reduces the number of paths in the path set.
  • One criterion for determining the deleted path is that the running time of the path does not satisfy the preset condition, and the preset condition includes, but is not limited to, the running time of one path is significantly lower than the running time of other paths in the set.
  • the preset condition includes, but is not limited to, the running time of one path is significantly lower than the running time of other paths in the set.
  • the path P 1 is the path that needs to be deleted.
  • the path 0-12-16-17-18-19 is the path 0-4-10- A subset of 11-12-16-17-18-19
  • the previous path is relative to the latter path
  • its workflow node is a subset of the workflow node of the latter path
  • its running time must be later than the latter path.
  • the running time is short. Therefore, in an example, the path in the path set does not meet the preset condition by the pruning algorithm, and the path optimization set includes: comparing the paths in the path set, if a path in the path set belongs to the path set A subset of at least one other path is deleted from the path set to obtain a path optimized set. It can be understood that, in the path optimization set obtained by optimizing the path set, there is no path that is a subset of other paths.
  • Step 103 If there are paths of the same length in the path optimization set, merge the workflow nodes at the same position on the same path of the path optimization set to obtain at least one merge path, and merge paths and path optimization sets. The path that is not involved in the merge is used as the path to be compared; if there is no path of the same length in the path optimization set, all the paths in the path optimization set are taken as the path to be compared;
  • one workflow node on the path is a task on the path, and the workflow nodes are merged, that is, the tasks on the path are merged, and the same length and/or different may exist in the path optimization set.
  • Paths of length After the paths are merged, there may still be multiple paths in the path optimization set. For example, in a path optimization set, there are two paths A and B of length 10, two paths C and D of length 9 and a path E of length 8 to merge A and B in the path optimization set. B and C are combined to obtain two merge paths, and the two merge paths and path E are the paths to be compared in this embodiment.
  • the workflow nodes in the same location may be the same or different on the same path of the path optimization set.
  • the first location of some paths is workflow node 0, and some paths are first.
  • the location is workflow node 1.
  • merging paths of the same length in the path optimization set it is necessary to merge the workflow nodes based on the same position of each path in the path optimization set.
  • the workflow nodes in the same location on the same path in the path optimization set are merged, and at least one merge path is obtained:
  • the paths in the path optimization set are merged according to the following rules to obtain at least one merge path:
  • workflow nodes in the same location of the same length path are different in the path optimization set, the workflow nodes at the same position of the same length are juxtaposed as the same path on the same path.
  • Workflow node of the location If the workflow nodes in the same location of the same length path are different in the path optimization set, the workflow nodes at the same position of the same length are juxtaposed as the same path on the same path. Workflow node of the location;
  • workflow nodes in the same location of the same length path are the same in the path optimization set, the same workflow node at the same position of each path of the same length is used as the merge path of the paths of the same length. Workflow nodes in the same location.
  • workflow nodes are considered to be the same, and the tasks on the workflow node are different, and the workflow nodes are different.
  • the path set S is the path set of the work flow W
  • the set S' is the path optimization set of the path set S
  • the path in the set S" is the last merged path.
  • the path set S there is originally P 1 '- P 48 'The 48 paths, after pruning, cut 0-12-16-17-18-19, 1-13-16-17-18-19, 2-14-16-17- The four paths 18-19, 3-15-16-17-18-19, get the path optimization set S' containing the 44 paths of the path P 1 '–P 44 '.
  • Step 104 Determine a configuration of a path set, where the configuration includes setting a type of a virtual machine of each workflow node on each path in the processing path set;
  • Determining the Configuration of the Path Set For example, determining the settings of the virtual machine types on all of the workflow nodes in FIG. 2 can be implemented using an existing search method, which is not described herein. In this embodiment, all possible configurations of the path set may be arranged by using an existing search method, and then the respective configurations are compared in the subsequent steps to select an optimal configuration of the path set.
  • Step 105 If the number of the paths to be compared is one, and there are at least two configurations of the path set, the principle of determining the optimal configuration of the path set in the two configurations according to the first preset comparison manner is obtained, and at least two types are obtained. Optimal configuration of the path set in the configuration; if the number of paths to be compared is at least two, and there are at least two configurations for the path set, respectively, the running time probability distributions are respectively performed for each of the paths to be compared under each configuration.
  • the first preset comparison manner includes: determining a workflow node with different configurations allocated on the path to be compared in the two configurations as a workflow node to be compared, and calculating a running time probability distribution of the workflow node to be compared in the two configurations.
  • the optimal configuration of the path set in the two configurations is determined based on the runtime probability distribution of the workflow nodes to be compared under the two configurations.
  • the foregoing first preset comparison mode is a method for selecting an optimal configuration from two configurations (that is, two alternative ones). It can be understood that, based on the first preset comparison manner, two at least two configurations are configured.
  • the first preset comparison mode may be configured according to the two configurations of the at least two configurations. Performing an comparison to obtain an optimal configuration of the path set in the two configurations, and then comparing the optimal configuration and another configuration that is not compared in the at least two configurations according to the first preset comparison manner until the at least two are determined.
  • the optimal configuration of the path set in the configuration; or, in the first round of comparison, the paired configurations in at least two configurations (each configuration is not repeated) are compared according to a preset comparison manner to obtain each Optimal configuration in the path set (if there is a configuration that does not participate in the comparison, join the second round for comparison), and compare each pair of optimal configurations in the second round to obtain two optimal configurations.
  • comparing the workflow nodes to be compared under the two configurations that are compared each time is actually comparing the partial paths under each configuration (ie, partial comparison).
  • This partial comparison method can effectively reduce the amount of calculation involved in the comparison process and reduce the consumption as compared with the comparison of all paths under each configuration.
  • the configuration assigned to the workflow node varies, including but not limited to: the type of virtual machine assigned to the workflow node is different.
  • the optimization of the workflow resource configuration is generally performed by the cloud platform, and the performance of the cloud platform resource is unstable and satisfies a certain probability distribution.
  • the running time probability distribution of the task on the workflow node is calculated based on the performance of the cloud platform. Therefore, in this embodiment, the performance of the cloud platform has been obtained based on the calculated running time probability of the path to be compared.
  • the instability is taken into account. That is, in the process of obtaining the optimal configuration of the path to be compared in the embodiment, the dynamics of the cloud platform resource has been considered, and the shortcomings of the dynamic programming algorithm in the prior art are avoided.
  • the workflow W is taken as an example, and the process of obtaining the optimal configuration of the path to be compared by using the partial comparison method will be described in detail in conjunction with FIG. 2 and FIG. 3. It is assumed that the workflow W in FIG. 2 has two configurations, namely, configuration A and configuration B. In configurations A and B, only the virtual machine configured on the workflow node 10 is in the path P 1 to be compared in the workflow W. If the type is different, the workflow node to be compared is the node 10, and the running time probability distribution of the workflow node 10 to be compared under configuration A and configuration B is calculated.
  • the existing calculation method is used to calculate the probability distribution of the performance of the virtual machines respectively allocated on the workflow node 10 under the configuration A and the configuration B, and the CPU task amount and the network task amount of the task of the workflow node 10. Then, the runtime probability distributions of the workflow nodes 10 under configurations A and B are compared to determine an optimal configuration.
  • determining, according to the running time probability distribution of the workflow node to be compared in the two configurations, determining an optimal configuration of the path set in the two configurations includes:
  • the second configuration is an optimal configuration of the path set in the two configurations; wherein the preset threshold is not less than 0.5.
  • the workflow W has two configurations A and B as an example.
  • the runtime probability distribution of the workflow node 10 to be compared in configuration A is f X (x)
  • the running time of the workflow node 10 to be compared in configuration B is The probability distribution is f Y (y)
  • the calculated probability value P(X>Y) is the probability that the running time X of the workflow node 10 in the configuration A is greater than the running time Y of the workflow node 10 in the configuration B. If P(X>Y)>0.5, then It is considered that the probability that the workflow node 10 under configuration A has a longer running time is higher, and configuration B is the optimal configuration in configurations A and B.
  • this embodiment introduces a pruning algorithm for the calculation of P(X>Y), and proposes a simplified P of upper and lower bounds based on the range of free variables.
  • determining, according to the running time probability distribution of the workflow node to be compared in the two configurations, determining an optimal configuration of the path set in the two configurations includes:
  • F x ( ) indicates the cumulative distribution function of the runtime of the workflow node to be compared in the first configuration
  • Xl and Xr are the lower and upper bounds of the value range of X, respectively
  • Yl and Yr are the lower bounds and upper bounds of the range of Y, respectively. boundary;
  • the second configuration is an optimal configuration of the two configurations; wherein the first preset threshold is not Below 0.5; if the maximum value of the probability value P(X>Y) calculated according to the second formula is less than the second preset threshold, the first configuration is an optimal configuration of the two configurations; wherein, the second The preset threshold is no higher than 0.5.
  • the ADD algorithm can generally be used to calculate the runtime probability of multiple workflow nodes. For example, under Configuration A and Configuration B, the types of virtual machines allocated on Workflow Nodes 10 and 11 in Figures 2 and 3 are different. Then, the running time probability distribution of the workflow node to be compared is calculated according to the following formula:
  • X is the running time of the workflow node 10 under one configuration (configuration A or configuration B)
  • Y is the running time of the workflow node 11 under the configuration
  • f Y () is the operation of Y
  • the time probability distribution function, f X () is the runtime probability distribution function of X
  • f Z () is the total runtime probability distribution function of the workflow nodes 10 and 11.
  • an example of this embodiment also proposes a pruning method Task bundling to optimize the ADD algorithm: a serial workflow node to be assigned to the same type of virtual machine The tasks on the same task are assigned to the same virtual machine as a task, and these tasks are successively scheduled on the same virtual machine. This can increase the usage of the CPU and reduce the data transfer between the tasks that depend on each other.
  • the method further includes: if there are serial workflow nodes assigned to the same type of virtual machine on the path to be compared, the serial workflow nodes of the path to be compared are allocated to the same virtual machine. Processing on.
  • the following describes the optimized ADD algorithm with the workflow W as an example in conjunction with FIG. 2 and FIG. 3, and assumes that the types of virtual machines allocated on the adjacent workflow nodes 10 and 11 on the path P 1 ′ to be compared are the same under configuration A. If the configuration is different, the tasks on the workflow nodes 10 and 11 in the configuration A are merged into the same task and are assigned to run on the same VM.
  • the configuration of the workflow nodes to be compared is configured in configuration A and configuration B.
  • the runtime probability distributions of the workflow nodes 10 and 11 under configuration A are based on the performance distribution of the same virtual machine they are allocated and the tasks of the CPUs of the two tasks on nodes 10 and 11,
  • the network task amount and other information are calculated, and the runtime probability distributions of the workflow nodes 10 and 11 under configuration B are obtained by using the above complex ADD formula, which is obviously compared with the ADD algorithm that needs to perform integral calculation, and the optimized ADD.
  • the algorithm is much simpler.
  • parallel A plurality of workflow nodes to be compared can find a running time probability distribution according to the MAX algorithm. For example, if the workflow node 4 under the configuration A and the workflow node 4 under the configuration B are different, the types of the allocated virtual machines are different; for the workflow node 5 of the configuration A and the workflow node 5 under the configuration B If the type of the assigned virtual machine is different, the MAX algorithm is used to find the running time probability distribution of the workflow nodes 4 and 5.
  • the formula of the MAX algorithm is:
  • X is the running time of the workflow node 4 in one configuration
  • Y is the running time of the workflow node 5 in the same configuration
  • f X () is the running time probability of the workflow node 4
  • the distribution function, F X () is the runtime cumulative distribution function of the workflow node 4; f Y () is the runtime probability distribution function of the workflow node 5; F Y () is the runtime cumulative distribution function of the workflow node 5 .
  • task clustering is used to optimize the MAX algorithm: two parallel tasks assigned to the same type of virtual machine are assigned to one virtual machine for parallel processing. .
  • the number of the to-be-compared paths is one, and the path to be compared is a merged path, selecting the pair of paths according to the two configurations according to the first preset comparison manner from the at least two configurations.
  • the principle of optimal configuration before obtaining the optimal configuration of the path set in at least two configurations, further includes: if there are parallel workflow nodes assigned to virtual machines of the same type on the merge path, these parallel paths of the merge paths Workflow nodes are assigned to the same virtual machine for processing.
  • the workflow node 4 and the workflow node 5 if the node 4 and the node 5 are equivalent tasks under a certain configuration (configuration A or configuration B), the types of the allocated virtual machines are the same, and the Workflow nodes 4 and 5 are assigned to run on the same virtual machine.
  • configuration A or configuration B the types of the allocated virtual machines are the same, and the Workflow nodes 4 and 5 are assigned to run on the same virtual machine.
  • the runtime probability distribution may replace the result of f z (z) described above using only the runtime probability distribution of the workflow node 4 or the workflow node 5 in this configuration.
  • the workflow nodes 4 and 5 in configuration A are assigned the same virtual machine
  • the workflow nodes 4 and 5 in configuration B are assigned the same virtual machine
  • the number of comparison paths is at least two, and in the case where there are at least two configurations of the path set, a scheme of obtaining an optimal configuration in at least two configurations will be described.
  • the running time probability distribution of each path to be compared in each configuration is calculated first, and the path under each configuration is determined based on the running time probability distribution of each path to be compared in the same configuration.
  • the maximum runtime probability distribution of the set and then determining the optimal configuration of the set of paths based on the maximum runtime probability distribution under each of the at least two configurations.
  • X is understood as a configuration (configuration A or configuration B)
  • Y is understood as the running time of other workflow nodes of the same path to be compared under the same configuration.
  • f Y () is the running time probability distribution function of Y
  • f X () is the running time probability distribution function of X
  • f Z () is the total running time probability distribution function of the two workflow nodes.
  • the ADD algorithm can also be optimized by using the pruning method Task bundling.
  • the method further includes: if there is a serial workflow node assigned to the same type of virtual machine on a path to be compared, These serial workflow nodes to be compared are assigned to the same virtual machine for processing.
  • the runtime probability distribution of at least two workflow nodes running on the same virtual machine in one configuration may be based on the performance distribution of the same virtual machine they are allocated and the total of the at least two workflow nodes. The task amount of the CPU, the amount of network tasks, and the like are calculated.
  • the calculation of the runtime probability distribution of the merge path involves the ADD algorithm and the MAX algorithm, and at least two parallel workflow nodes may exist at one position of the merge path, so in order to calculate by the ADD algorithm
  • the running time probability distribution of the merged path needs to first determine the running time probability of the position of the parallel workflow node by using the MAX algorithm. For example, in the merge path of FIG. 3, there are juxtaposed 0, 1, 2, and 3 in the first position. For the four workflow nodes, the MAX algorithm is needed to determine the maximum runtime probability distribution of the first location.
  • X is the running time of a workflow node (for example, workflow node 0 in FIG. 3) at a position of a path to be compared under a configuration
  • f X () is the workflow node (workflow node 0)
  • the runtime probability distribution function, F X () is the runtime cumulative distribution function of the workflow node (workflow node 0);
  • Y is the same configuration, the same path to be compared, and another workflow node at the same location (for example, workflow node 1 to 3) run time, f Y () (workflow workflow node for node 1) of the run-time probability distribution function;
  • F Y () (workflow workflow node for node 1)
  • the runtime cumulative distribution function; f Z (z) is the maximum runtime probability distribution of the two workflow nodes at the location.
  • the pruning method task clustering can also be used to optimize the MAX algorithm: optionally, if the number of paths to be compared is at least two, and the path to be compared Before the merge path exists, before calculating the runtime probability distribution for each of the to-be-compared paths under each configuration, the method further includes: if there is a parallel workflow node assigned to the same type of virtual machine on a merge path, Then, the parallel workflow nodes of the merge path are allocated to the same virtual machine for processing.
  • the runtime probability distribution of any one of the workflow nodes in the parallel workflow node may be selected as the juxtaposed at least.
  • the maximum running time probability distribution of the path set in the same configuration may be calculated based on the MAX algorithm, optionally, based on the same configuration, each to be compared.
  • the running time probability distribution of the path determines the maximum runtime probability distribution of the path set in each configuration, including:
  • the maximum runtime probability distribution for the path set under each configuration of at least two configurations is calculated based on the following formula:
  • Determining the optimal configuration of the path set based on the maximum runtime probability distribution under each configuration of the at least two configurations includes:
  • the optimal configuration of the path set in the at least two configurations is obtained based on the principle of selecting the optimal configuration of the path set from the two configurations of the at least two configurations according to the second preset comparison manner;
  • the second preset comparison manner includes: following the formula Calculate the probability value P(X>Y) in the formula X represents the running time of the path set in the first configuration of the two configurations, Y represents the running time of the path set in the second configuration of the two configurations, and f X (x) represents the first the maximum running time of probability configurations set path distribution function, f Y (y) represents the distribution function of the maximum operating time of the second path configuration set probability; if the probability value P (X> Y) greater than a predetermined threshold value
  • the second configuration is an optimal configuration of the two configurations; wherein the preset threshold is not less than 0.5.
  • this embodiment also shows a workflow resource configuration optimization system based on a probability distribution, and the foregoing method for optimizing a workflow resource configuration based on probability distribution is implemented by the system.
  • the optimization system of this embodiment includes :
  • An obtaining module 41 configured to acquire a path set including all possible paths of the workflow
  • the pruning module 42 is configured to delete a path in the path set that does not meet the preset condition by using a pruning algorithm, to obtain a path optimization set, where the paths in the path optimization set have the same length;
  • the merging module 43 is configured to merge the workflow nodes at the same location on each path of the same length in the path optimization set to obtain at least one merge path, and merge paths and A path that does not participate in the merge in the path optimization set is used as a path to be compared; if there are no paths of the same length in the path optimization set, all the paths in the path optimization set are regarded as paths to be compared;
  • the configuration module 44 is configured to determine a configuration of the path set, where the configuration includes setting a type of the virtual machine of each workflow node on each path in the processing path set;
  • the processing module 45 is configured to determine, according to the first preset comparison manner, an optimal configuration of the path set according to the first preset comparison manner, if the number of the to-be-compared paths is one, and the path set has at least two configurations.
  • the optimal configuration of the path set in at least two configurations; if the number of paths to be compared is at least two, and there are at least two configurations for the path set, respectively, the running time of each path to be compared under each configuration is respectively performed.
  • the calculation of the probability distribution determines the maximum running time probability distribution of the path set in each configuration based on the running time probability distribution of each path to be compared in the same configuration, and determines the at least the maximum running time probability distribution in each configuration of the at least two configurations.
  • the optimal configuration of the path set in the two configurations; the first preset comparison manner includes: determining a workflow node with different configurations allocated on the path to be compared in the two configurations as a workflow node to be compared, and calculating two configurations
  • the running time probability distribution of the workflow nodes to be compared, based on the workflow nodes to be compared under the two configurations Line time probability distribution to determine the optimal configuration in both configurations of the set of paths.
  • the pruning module 42 is configured to compare paths in the path set. If a path in the path set belongs to a subset of at least one other path in the path set, the path is deleted from the path set. Path optimization collection.
  • the merging module 43 is configured to merge the paths in the path optimization set according to the following rules to obtain at least one merge path:
  • workflow nodes in the same location of the same length path are different in the path optimization set, the workflow nodes at the same position of the same length are juxtaposed as the same path on the same path.
  • Workflow node of the location If the workflow nodes in the same location of the same length path are different in the path optimization set, the workflow nodes at the same position of the same length are juxtaposed as the same path on the same path. Workflow node of the location;
  • workflow nodes in the same location of the same length path are the same in the path optimization set, the same workflow node at the same position of each path of the same length is used as the merge path of the paths of the same length. Workflow nodes in the same location.
  • the probability distribution-based workflow resource configuration optimization system of the present embodiment further includes a first optimization module 46 and a second optimization module 47, and the first optimization module 46 is configured to compare paths. If the number of the path is one, and the path to be compared is a merged path, the optimal configuration of the path set in at least two configurations is obtained based on the principle of determining the optimal configuration of the path set in the two configurations according to the first preset comparison manner.
  • the parallel workflow nodes of the merge path are allocated to the same virtual machine for processing; if the number of paths to be compared is at least two If there is a merge path in the path to be compared, there is a parallel workflow assigned to the same type of virtual machine on a merge path before calculating the runtime probability distribution for each of the paths to be compared under each configuration.
  • the parallel workflow nodes of the merge path are allocated to the same virtual machine for processing.
  • the second optimization module 47 is configured to: if the number of the paths to be compared is one, obtain the path of the at least two configurations based on the principle of determining the optimal configuration of the path set in the two configurations according to the first preset comparison manner. Before the optimal configuration of the set, if there are serial workflow nodes assigned to the same type of virtual machine on the path to be compared, the serial workflow nodes of the path to be compared are allocated to the same virtual machine for processing. If the number of paths to be compared is at least two, before the calculation of the running time probability distribution for each of the to-be-compared paths under each configuration, there is a virtual machine assigned to the same type on a certain path to be compared. In the case of a serial workflow node, the serial workflow nodes of the path to be compared are allocated to the same virtual machine for processing.
  • the processing module 45 can implement the running time probability distribution of the workflow node to be compared based on the two configurations in any of the following two manners, and determine an optimal configuration of the path set in the two configurations.
  • the first one Compare the two configurations in the following way to get the optimal configuration of the path set in the two configurations:
  • F x ( ) indicates the cumulative distribution function of the runtime of the workflow node to be compared in the first configuration
  • Xl and Xr are the lower and upper bounds of the value range of X, respectively
  • Yl and Yr are the lower bounds and upper bounds of the range of Y, respectively.
  • the second configuration is an optimal configuration of the two configurations; wherein, the first preset The threshold is not lower than 0.5; if the maximum value of the probability value P(X>Y) calculated according to the second formula is less than the second preset threshold, the first configuration is an optimal configuration of the two configurations; The second preset threshold is not higher than 0.5.
  • the processing module 45 is configured to calculate a maximum runtime probability distribution of the path set under each configuration of the at least two configurations based on the following formula:
  • processing module 45 is configured to determine an optimal configuration of the path set based on a maximum running time probability distribution in each configuration of the at least two configurations in the following manner:
  • the optimal configuration of the path set in the at least two configurations is obtained based on the principle of selecting the optimal configuration of the path set from the two configurations of the at least two configurations according to the second preset comparison manner;
  • the second preset comparison manner includes: following the formula Calculate the probability value P(X>Y) in the formula X represents the running time of the path set in the first configuration of the two configurations, Y represents the running time of the path set in the second configuration of the two configurations, and f X (x) represents the first The maximum runtime probability distribution function of the path set under the configuration, f Y (y) represents the maximum runtime probability distribution function of the path set in the second configuration; if the probability value P(X>Y) is greater than the preset threshold
  • the second configuration is an optimal configuration of the two configurations; wherein the preset threshold is not less than 0.5.
  • the optimal configuration of the workflow resources is implemented, and the running time probability distribution used by the optimization process is the cloud.
  • the instability of the platform resource performance solves the technical problem existing in the dynamic programming algorithm of the prior art.
  • the number of paths involved in the subsequent calculation and partial comparison is reduced by the first pruning, and the implementation is realized.
  • the calculation of the amount of calculation and consumption is reduced; the ADD and MAX algorithms are optimized by the second pruning method, and the optimization of the ADD algorithm and the MAX algorithm greatly reduces the calculation of the running time and probability distribution of the serial workflow node, respectively.
  • the calculation of the runtime probability distribution of the parallel workflow node also reduces the consumption of the embodiment; when there is only one path to be compared, the runtime probability distribution is compared only for the partial paths of the multiple configurations (ie, part) Compare), effectively reducing the consumption and calculation, and using the third pruning algorithm for part of the comparison process
  • the calculation of the running time probability distribution of the path is simplified, and the consumption of one time and the computational complexity are reduced.
  • the method of the embodiment greatly reduces the consumption by three pruning and partial comparison, and reduces the computational complexity and calculation.
  • the quantity reduces the difficulty of promotion of the workflow resource configuration optimization scheme and improves its practicability.
  • the modules described as separate components may or may not be physically separate.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • An integrated module if implemented as a software functional module and sold or used as a standalone product, can be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例公开了基于概率分布的工作流资源配置优化方法和系统,基于对包含工作流的所有可能路径的路径集合进行优化得到的路径优化集合,确定待比较路径;在对路径集合存在至少两种配置时,当待比较路径只有一条,基于待比较路径上待比较工作流节点的运行时间概率分布根据第一预设比较方式确定出对路径集合的最优配置;当待比较路径有至少两条,基于各配置下各待比较路径的运行时间概率分布确定出对路径集合的最优配置,运行时间概率分布的使用充分考虑到了云平台性能的不稳定性;对路径集合的优化减少了路径数量,采用第一预设比较方式只对两种配置下待比较路径的部分路径的运行时间概率分布进行比较,有效降低了运算量、运算复杂度和消耗。

Description

一种基于概率分布的工作流资源配置优化方法和系统 技术领域
本发明涉及云技术领域,尤其涉及一种基于概率分布的工作流资源配置优化方法和系统。
背景技术
目前,已经可以通过一些现有的算法实现工作流资源的配置,例如动态规划算法,但是这种算法将云平台资源的平均性能作为优化工作流资源的配置的输入,并没有考虑到云平台资源性能不稳定性对优化结果的影响,导致优化效果不理想。而一些算法考虑了云平台资源性能的动态性,解决了动态规划算法存在的技术问题,例如基于随机模型的资源调度算法,但是通过这种资源调度算法实现对工作流资源的配置,需要加入复杂的模型和分析方法,计算较为繁复。另外,在现有技术中还有其它实现工作流资源配置优化的方案,例如基于蒙特卡洛(Monte Carlo,MC)算法来实现基于概率分布的工作流资源配置的优化,但是采用蒙特卡洛算法需要非常多的消耗(overhead),难以在实际中推广应用。
发明内容
本发明实施例的主要目的在于提供一种基于概率分布的工作流资源配置优化方法和系统,解决在考虑到云平台资源性能不稳定性的同时如何减少在工作流资源配置优化过程中的消耗,以及如何降低工作流资源配置优化过程中的计算复杂度的技术问题。
为实现上述目的,本发明实施例第一方面提供一种基于概率分布的工作流 资源配置优化方法,该优化方法包括:
获取包含工作流的所有可能路径的路径集合;
通过剪枝算法将所述路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;
若所述路径优化集合中存在长度相同的路径,则对所述路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将所述合并路径以及所述路径优化集合中未参与合并的路径作为待比较路径;若所述路径优化集合中不存在长度相同的路径,则将所述路径优化集合中的所有路径均作为待比较路径;
确定对所述路径集合的配置;所述配置包括对处理所述路径集合中各路径上各工作流节点的虚拟机的类型的设置;
若所述待比较路径的数量为一条,且对所述路径集合存在至少两种配置,则基于按照第一预设比较方式确定两种配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;其中,所述第一预设比较方式包括:确定两种配置下所述待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算所述两种配置下所述待比较工作流节点的运行时间概率分布,基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置;
若所述待比较路径的数量为至少两条,且对所述路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算;基于相同配置下各待比较路径的所述运行时间概率分布,确定各配置下所述路径集合的最大运行时间概率分布;基于所述至少两种配置的各配置下的所述最大运行时间概率分布,确定所述至少两种配置中对所述路径集合的最优配置。
为实现上述目的,本发明实施例第二方面提供一种基于概率分布的工作流资源配置优化系统,该优化系统包括:
获取模块,用于获取包含工作流的所有可能路径的路径集合;
剪枝模块,用于通过剪枝算法将所述路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;
合并模块,用于若所述路径优化集合中存在长度相同的路径,则对所述路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将所述合并路径以及所述路径优化集合中未参与合并的路径作为待比较路径;若所述路径优化集合中不存在长度相同的路径,则将所述路径优化集合中的所有路径均作为待比较路径;
配置模块,用于确定对所述路径集合的配置;所述配置包括对处理所述路径集合中各路径上的各工作流节点的虚拟机的类型的设置;
处理模块,用于若所述待比较路径的数量为一条,且对所述路径集合存在至少两种配置,则基于按照第一预设比较方式确定两个配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;其中,所述第一预设比较方式包括:确定两种配置下所述待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算所述两种配置下所述待比较工作流节点的运行时间概率分布,基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置;以及用于若所述待比较路径的数量为至少两条,且对所述路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算;基于相同配置下各待比较路径的所述运行时间概率分布,确定各配置下所述路径集合的最大运行时间概率分布;基于所述至少两种配置的各配置下的所述最大运行时间概率分布,确定所述至少两种配置中对所述路径集合的最优配置。
本发明实施例提供一种基于概率分布的工作流资源配置优化方法和系统,通过本发明的方案,可以获取包含工作流的所有可能路径的路径集合;通过剪枝算法将路径集合中运行时间不满足预设条件的路径删除得到路径优化集合, 在路径优化集合中存在长度相同的路径时,对路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将合并路径以及路径优化集合中未参与合并的路径作为待比较路径;在路径优化集合中不存在长度相同的路径时,则将路径优化集合中的所有路径均作为待比较路径;相对于现有技术中采用蒙特卡洛算法需要对所有的路径都计算的特点,本发明实施例中通过剪枝减少了后续处理过程中涉及的路径,在一定程度上降低了计算的复杂度和资源的消耗,在剪枝和合并之后,当待比较路径有至少两条,且对路径集合的配置存在至少两种时,对每种配置下的各条待比较路径分别进行运行时间概率分布的计算,基于相同配置下各待比较路径的所述运行时间概率分布确定各配置下所述路径集合的最大运行时间概率分布;基于至少两种配置的各配置下的所述最大运行时间概率分布确定对路径集合的最优配置,在剪枝和合并的基础上,待比较路径的数量得到了降低,且在确定最优配置的过程中可以重复利用每条待比较路径的运行时间概率分布,降低计算复杂度以及开销,当待比较路径只有一条,且对该条待比较路径存在至少两种配置时;本实施例基于按照第一预设比较方式确定两个配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;该第一预设比较方式包括:确定两种配置下待比较路径上分布的配置不同的工作流节点作为待比较工作流节点,计算两种配置下待比较工作流节点的运行时间概率分布,基于两种配置下待比较工作流节点的运行时间概率分布确定在两种配置中对待比较路径的最优配置,通过对上述预设比较方式的记载可以得到当待比较路径只有一条时,本发明实施例实际上只对两种配置下的部分路径进行了运行时间概率分布的比较(即部分比较),采用这种部分比较的方式极大地降低了比较的数据量,降低了运算复杂度,降低了消耗,提升了本发明实施例的实用性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例中基于概率分布的工作流资源配置优化方法的流程示意图;
图2为本发明实施例中工作流W的所有可能路径的示意图;
图3为对图2中的工作流W的路径进行剪枝和合并操作的示意图;
图4为本发明实施例中基于概率分布的工作流资源配置优化系统的结构示意图。
具体实施方式
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
现有技术中,对工作流资源进行配置的方法存在计算复杂或消耗巨大的问题,为了解决这些技术问题,本发明提出一种基于概率分布的工作流资源配置优化方法,在该优化方法中,通过剪枝算法减少了在工作流资源配置中涉及计算的路径的数量,在一定程度上降低了计算复杂度和资源消耗;又基于对两种配置下的路径的工作流节点进行部分比较得到两种配置中的最优配置的方案,大大降低了比较过程中涉及的数据计算,有效降低了计算复杂度和消耗。
请参阅图1,图1为本发明第一实施例中基于概率分布的工作流资源配置优化方法的流程示意图,该方法包括:
步骤101、获取包含工作流的所有可能路径的路径集合;
本实施例中,工作流的路径由完成工作流所需执行的工作流节点构成,每个工作流节点代表工作流上的一个任务,一个工作流的路径可能不止一个。参见本实施例的图2和图3,图2为工作流W的所有可能路径的示意图。图3示为对工作流W的路径集合S中的路径进行剪枝和合并操作的示意图。如图2和图3所示,工作流W的路径集合S包含的所有可能路径有48条。
步骤102、通过剪枝算法将路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;其中,路径优化集合中的路径的长度相同;
根据图2可以知晓路径集合S中每条路径上的工作流节点的数量和各个工作流节点的任务不一定相同,有些路径的运行时间明显会短于其它路径,为了确定对工作流的路径的更优配置,需要比较各个配置下的路径集合的运行时间概率分布,而运行时间明显更短的路径参与比较的意义不大。所以,为了降低工作流资源配置过程的消耗,本实施例基于剪枝算法对路径集合进行了优化了,降低了路径集合中的路径数量。其中,确定被删除路径的一个标准是路径的运行时间不满足预设条件,该预设条件包括但不限于一路径的运行时间明显低于集合中的其它路径的运行时间。例如在图2中的集合S中,若路径P 1:0-4-10-11-12-16-17-18-19的运行时间明显低于路径集合S中其它的某条路径的运行时间,则该路径P 1是需要被删除的路径。将路径集合S中运行时间不满足预设条件的路径删除后,剩余的路径即组合为路径优化集合。
根据图2的示意图可以看出,在工作流W的所有可能路径中,有些路径可能是其它路径的子集,例如路径0-12-16-17-18-19是路径0-4-10-11-12-16-17-18-19的子集,前一条路径相对于后一条路径而言,其工作流节点是后一条路径的工作流节点的子集,其运行时间必定比后一条路径的运行时间短。所以,在一个示例中,通过剪枝算法将路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合包括:对路径集合中的路径进行比较, 若路径集合中某路径属于路径集合中至少一条其它路径的子集,则将某路径从路径集合中删除,得到路径优化集合。其中,可以理解的是,对路径集合优化后得到的路径优化集合中是不存在为其它路径子集的路径的。
步骤103、若路径优化集合中存在长度相同的路径,则对路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将合并路径以及路径优化集合中未参与合并的路径作为待比较路径;若路径优化集合中不存在长度相同的路径,则将路径优化集合中的所有路径均作为待比较路径;
本实施例中,路径上的一个工作流节点即为路径上的一个任务,对工作流节点进行合并,也即对路径上的任务进行合并,在路径优化集合中可能存在相同长度和/或不同长度的路径,路径合并之后,路径优化集合中可能还是存在多条路径。例如一路径优化集合中存在长度为10的两条路径A、B,两条长度为9的路径C、D,一条长度为8的路径E,则对路径优化集合中的A和B进行合并,B和C进行合并,得到两条合并路径,这两条合并路径和路径E都是本实施例的待比较路径。
在路径优化集合的各条长度相同的路径上,处于相同的位置的工作流节点可能相同或不同,例如在图2中,有些路径的第一个位置是工作流节点0,有些路径的第一个位置是工作流节点1。在对路径优化集合中长度相同的路径合并时,需要基于路径优化集合中各条路径的同一位置的工作流节点进行合并。可选的,对路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径包括:
对路径优化集合中的路径按照以下的规则进行合并得到至少一条合并路径:
若路径优化集合中在各条长度相同的路径的相同位置的工作流节点不同,将各条长度相同的路径在相同位置上的工作流节点并列作为各条长度相同的路径的合并路径上在相同位置的工作流节点;
若路径优化集合中在各条长度相同的路径的相同位置的工作流节点相同,将各条长度相同的路径的相同位置上的相同的工作流节点作为各条长度相同的路径的合并路径上在相同位置的工作流节点。
其中,工作流节点上的任务相同可以认为该工作流节点相同,工作流节点上的任务不同,则该工作流节点不同。
下面以工作流W为例结合图2和图3,对路径集合S中的路径的剪枝和合并过程进行说明。图3中路径集合S为工作流W的路径集合,集合S’是路径集合S的路径优化集合,集合S”中的路径是最后的合并路径。在路径集合S中,原本有P 1’-P 48’这48条路径,在经过剪枝处理后,剪去了0-12-16-17-18-19、1-13-16-17-18-19、2-14-16-17-18-19、3-15-16-17-18-19这四条路径,得到了包含路径P 1’–P 44’这44条路径的路径优化集合S’。再对路径优化集合S’中的各条路径进行合并:路径上的第一个位置上的工作流节点0、1、2、3并列作为合并路径的第一个位置的节点,路径上的第二个位置上的工作流节点4、5、6、7、8、9并列作为合并路径的第二个工作流节点,路径上的第三个位置上的工作流节点10作为合并路径的第三个工作流节点,······以此类推,对路径优化集合S’中的路径进行合并后得到图3中的合并路径P 1”,该合并路径P 1”即为上述步骤103中的待比较路径,在该示例中“未参与合并的路径”可以理解为空。
步骤104、确定对路径集合的配置,该配置包括对处理路径集合中各路径上各工作流节点的虚拟机的类型的设置;
确定对路径集合的配置例如,确定对图2中所有的工作流节点上虚拟机类型的设置,可以用现有的搜索方法实现,本实施在此不展开说明。本实施例中可以采用现有的搜索方法将对路径集合的所有可能配置都排列出来,再在后续的步骤中对各个配置进行比较选出对路径集合的最优配置。
步骤105、若待比较路径的数量为一条,且对路径集合存在至少两种配置,则基于按照第一预设比较方式确定两个配置中对路径集合的最优配置的原理, 得到至少两种配置中对路径集合的最优配置;若待比较路径的数量为至少两条,且对路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算,基于相同配置下各待比较路径的运行时间概率分布确定各配置下路径集合的最大运行时间概率分布;基于至少两种配置的各配置下的最大运行时间概率分布,确定该至少两种配置中对路径集合的最优配置;
其中,第一预设比较方式包括:确定两种配置下待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算两种配置下待比较工作流节点的运行时间概率分布,基于两种配置下待比较工作流节点的运行时间概率分布确定在两种配置中对路径集合的最优配置。
上述的第一预设比较方式是从两个配置中选出一个最优配置的方式(即二选一),可以理解的是,基于按照第一预设比较方式从至少两个配置的两个配置中选择对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置的过程中,可以按照先对至少两种配置中的两种配置按照第一预设比较方式进行比较得到两种配置中对路径集合的最优配置,再对该最优配置和至少两种配置中未进行比较的另一种配置按照第一预设比较方式进行比较,直到确定该至少两种配置中对路径集合的最优配置;或者,可以先在第一轮比较中,对至少两种配置中成对的配置(每个配置不重复比较)按照预设比较方式进行比较得到每一对路径集合中的最优配置(若有未参与比较的配置,则加入第二轮进行比较),在第二轮中对每一对最优配置再进行比较得到两种最优配置中的最优配置,······,以此类推,直到得到至少两种配置中对路径集合的最优配置。
下面对待比较路径的数量为一条,且对路径集合存在至少两种配置的情况下,得到至少两种配置中的最优配置的方案进行说明。
在待比较路径的数量为一条的情况下,对每一次进行比较的两个配置下的待比较工作流节点进行比较,实际上就是对每个配置下的部分路径进行比较(即部分比较),这种部分比较的方法相对于对每种配置下的全部路径进行比较,可 以有效降低比较过程中涉及的计算量,降低消耗。对工作流节点上分配的配置不同包括但不限于:对工作流节点分配的虚拟机的类型不同。
本实施例中,上述对工作流资源配置的优化一般是依赖云平台完成,云平台资源的性能具有不稳定性,满足一定的概率分布。本实施例中工作流节点上任务的运行时间概率分布是基于云平台的性能计算得到的,所以本实施例中在使用计算得到的待比较路径的运行时间概率的基础上,已经将云平台性能的不稳定性考虑在内。即本实施例中在得到对待比较路径的最优配置的过程中,已经考虑了云平台资源的动态性,避免了现有技术中动态规划算法存在的缺点。
下面还是以工作流W为例,结合图2和图3对上述利用部分比较方法得到对待比较路径的最优配置的过程进行详细说明。假设图2中的工作流W有两种配置,分别为配置A和配置B,在配置A和B下,工作流W的待比较路径P 1”中只有工作流节点10上配置的虚拟机的类型不同,则待比较工作流节点为节点10,计算配置A和配置B下的待比较工作流节点10的运行时间概率分布。而对于各个配置下的工作流节点10的运行时间概率分布,可以采用现有的计算方法基于配置A和配置B下在工作流节点10上分别分配的虚拟机的性能的概率分布,以及工作流节点10的任务的CPU任务量、network任务量等信息计算出来。然后再对配置A和B下工作流节点10的运行时间概率分布进行比较,确定出最优配置。
可选的,本实施例中基于两种配置下待比较工作流节点的运行时间概率分布确定在两种配置中对路径集合的最优配置包括:
对两种配置按照以下方式进行比较得到在两种配置中对路径集合的最优配置:
按照公式
Figure PCTCN2018086936-appb-000001
计算得到概率值P(X>Y);在公式
Figure PCTCN2018086936-appb-000002
中,X表示两种配置中的第一种配置下的待比较工作流节点的运行时间,Y表示在两种配置中的第二种配置下的待比 较工作流节点的运行时间,f X(x)表示在第一种配置下的待比较工作流节点的运行时间概率分布函数,f Y(y)表示在第二种配置下的待比较工作流节点的运行时间概率分布函数;
若概率值P(X>Y)大于预设阈值,则第二种配置为两种配置中对路径集合的最优配置;其中,预设阈值不低于0.5。
还是以上述工作流W有两种配置A和B为例,假设配置A下待比较工作流节点10的运行时间概率分布为f X(x),配置B下待比较工作流节点10的运行时间概率分布为f Y(y),则根据
Figure PCTCN2018086936-appb-000003
计算得到的概率值P(X>Y)为配置A下工作流节点10的运行时间X大于配置B下的工作流节点10的运行时间Y的概率,若是P(X>Y)>0.5,则认为配置A下的工作流节点10的运行时间更长的概率更高,配置B是配置A和B中的最优配置。
本实施例为了进一步降低上述部分比较的方案中涉及的计算,对P(X>Y)的计算引入了一个剪枝算法,提出了一种基于自由变量的取值范围的上、下界简化P(X>Y)的计算的方法。可选的,基于两种配置下待比较工作流节点的运行时间概率分布确定在两种配置中对该路径集合的最优配置包括:
对两种配置按照以下方式进行比较得到在两种配置中对该路径集合的最优配置:
基于以下的两个公式计算概率值P(X>Y)的范围,
If X.l≤Y.r≤X.r,P(X>Y)≥1-F X(Y.r)
If X.l≤Y.l≤X.r,P(X>Y)≤1-F X(Y.l)
其中,X表示两种配置中的第一种配置下的待比较工作流节点的运行时间,Y表示在两种配置中的第二种配置下的待比较工作流节点的运行时间,F x()表示在第一种配置下待比较工作流节点的运行时间累积分布函数;X.l和X.r分别为X的取值范围的下界和上界,Y.l和Y.r分别为Y的取值范围的下界和上界;
若根据第一个公式计算得到的概率值P(X>Y)的最小值大于第一预设阈值,则第二种配置为两种配置中的最优配置;其中,第一预设阈值不低于0.5;若根据第二个公式计算得到的概率值P(X>Y)的最大值小于第二预设阈值,则第一种配置为两种配置中的最优配置;其中,第二预设阈值不高于0.5。
可以预见,在实际中,两种配置下,可能存在多个工作流节点上的配置不一样的情况,即存在待比较工作流节点不止一个的情况,在这种情况下概率值P的计算更为复杂。一般可以采用ADD算法来对多个工作流节点的运行时间概率进行计算。例如若在配置A和配置B下,图2和图3中的工作流节点10和11上分配的虚拟机的类型不同。则待比较工作流节点的运行时间概率分布按照以下的公式计算:
Figure PCTCN2018086936-appb-000004
其中,Z=X+Y,X是一个配置下(配置A或配置B),工作流节点10的运行时间,Y是该配置下工作流节点11的运行时间,f Y()是Y的运行时间概率分布函数,f X()是X的运行时间概率分布函数,f Z()是工作流节点10和11总的运行时间概率分布函数。
在上述对f Z的计算中,涉及到积分,若是待比较工作流节点数量越多,涉及的积分计算就越多。为了降低这种情况的发生,减少消耗和计算复杂度,本实施例的一个示例还提出一种剪枝方法Task bundling对ADD算法进行优化:将分配到相同类型虚拟机的串行的工作流节点上的任务作为一个任务分配到同一个虚拟机,在同一个虚拟机上相继调度这些任务,这样做可以增加cpu的使用率,减少相互依赖的任务之间的数据传送。可选的,若待比较路径的数量为一条,则在基于按照第一预设比较方式从至少两个配置的两个配置中选择对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置前,还包括:若待比较路径上存在分配到相同类型的虚拟机的串行工作流节点,则将待比较路径的这些串行工作流节点分配到同一个虚拟机上进行处理。
下面以工作流W为例结合图2和图3对优化的ADD算法进行说明,假设待比较路径P 1”上相邻的工作流节点10和11上分配的虚拟机的类型在配置A下相同,在配置B下不同,则将配置A下的工作流节点10和11上的任务合并为同一个任务,分配到同一个虚拟机上运行。当配置A和配置B下待比较工作流节点为节点10和11,则配置A下的工作流节点10和11的运行时间概率分布是根据它们分配的同一个虚拟机的性能分布以及节点10和11上的两个任务分别的CPU的任务量、network任务量等等信息计算得到,而配置B下的工作流节点10和11的运行时间概率分布则采用上述复杂的ADD公式得到,明显相较于需要进行积分计算的ADD算法,优化后的ADD算法更为简单。
在实际中,当两种配置下待比较工作流节点的数量为多个时,可能存在同一位置的(并行)待比较工作流节点的数量有多个的情况,对于这种情况,对并行的多个待比较工作流节点可以按照MAX算法求运行时间概率分布。例如若对于上述的配置A下的工作流节点4和配置B下的工作流节点4,分配的虚拟机的类型不同;对于上述的配置A的工作流节点5和配置B下的工作流节点5,分配的虚拟机的类型不同,则采用MAX算法,求工作流节点4、5的运行时间概率分布,MAX算法的公式为:
Figure PCTCN2018086936-appb-000005
其中,Z=max(X,Y),X为一个配置下工作流节点4的运行时间,Y为相同配置下工作流节点5的运行时间,f X()为工作流节点4的运行时间概率分布函数,F X()为工作流节点4的运行时间累积分布函数;f Y()为工作流节点5的运行时间概率分布函数;F Y()为工作流节点5的运行时间累积分布函数。
在本实施例中若是两个任务做的是相同的数据处理等动作,且待处理的数据量的大小相同,则可以认为这两个任务是等效任务,可以分配到相同类型的虚拟机上并行处理。为了降低工作流运行时间,降低计算复杂度,减少消耗,本实施例的一个示例中,使用task clustering优化MAX算法:把分配到相同类 型虚拟机的两个并行任务分配到一个虚拟机上并行处理。可选的,本实施例中,若待比较路径的数量为一条,且待比较路径为合并路径,则在基于按照第一预设比较方式从至少两个配置的两个配置中选择对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置前,还包括:若合并路径上存在分配到相同类型的虚拟机的并行工作流节点,则将合并路径的这些并行工作流节点分配到同一个虚拟机上进行处理。
在上述关于工作流节点4和工作流节点5的示例中,若是在某个配置下(配置A或配置B)节点4和节点5是等效任务,分配的虚拟机的类型相同,可以将该工作流节点4和5分配到同一个虚拟机上运行,当工作流节点4和5在同一个虚拟机上运行,在该配置下就无需使用上述复杂的MAX算法计算后续部分比较时需要使用的运行时间概率分布,可以只使用工作流节点4或工作流节点5在该配置下的运行时间概率分布替换上述的f Z(z)的结果。即,若配置A下工作流节点4和5分配的是同一个虚拟机,配置B下工作流节点4和5分配的是同一个虚拟机,在对配置A和配置B的路径进行部分比较时,可以只将工作流节点4(或5)中的一个作为待比较工作流节点,将配置A下的工作流节点4(或5)的运行时间概率分布与配置B下的工作流节点4(或5)的运行时间概率分布进行比较,确定出配置A和B中的最优配置。
下面对待比较路径的数量为至少两条,且对路径集合存在至少两种配置的情况下,得到至少两种配置中的最优配置的方案进行说明。
在待比较路径为至少两条时,需要先对每种配置下的各条待比较路径分别进行运行时间概率分布的计算,基于相同配置下各待比较路径的运行时间概率分布确定各配置下路径集合的最大运行时间概率分布,然后再基于至少两个配置中每一个配置下的最大运行时间概率分布确定对路径集合的最优配置。
其中,对每种配置下的各条待比较路径分别进行运行时间概率分布的计算时,若是待比较路径不是合并路径,则采用ADD算法可以实现对待比较路径 的运行时间概率分布的计算,例如,对于自由变量X和Y,变量Z=X+Y的概率分布为
Figure PCTCN2018086936-appb-000006
若将X理解为一个配置下(配置A或配置B),一条待比较路径的某个工作流节点的运行时间,Y理解为相同配置下,相同待比较路径的其它工作流节点的运行时间,f Y()为Y的运行时间概率分布函数,f X()为X的运行时间概率分布函数,则f Z()为这两个工作流节点总的运行时间概率分布函数。
基于上述的ADD算法可以算出一个配置下一整条待比较路径的运行时间概率分布。
为了降低ADD算法的难度,在待比较路径具有至少两条的情况下,同样可以采用剪枝方法Task bundling对ADD算法进行优化,可选的,若待比较路径的数量为至少两条,则在对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,还包括:若某条待比较路径上存在分配到相同类型的虚拟机的串行工作流节点,则将该条待比较路径的这些串行工作流节点分配到同一个虚拟机上进行处理。在优化之后,对于一个配置下的在相同虚拟机上运行的至少两个工作流节点的运行时间概率分布,可以根据它们分配的同一个虚拟机的性能分布以及这至少两个工作流节点总的CPU的任务量、network任务量等等信息计算得到。
若待比较路径是合并路径,则对合并路径的运行时间概率分布的计算涉及ADD算法和MAX算法,合并路径的一个位置上可能存在至少两个并列的工作流节点,所以为了通过ADD算法计算出合并路径的运行时间概率分布,需要先通过MAX算法确定具有并列工作流节点的位置的运行时间概率,例如在图3的合并路径中,第一位置上存在并列的0、1、2和3这四个工作流节点,需要先通过MAX算法确定出第一位置的最大运行时间概率分布。
对于相互独立变量X、Y,变量Z=max(X,Y),则Z的概率分布为:
Figure PCTCN2018086936-appb-000007
其中,X为一个配置下,一条待比较路径的一个位置上的工作流节点(例如图3中的工作流节点0)的运行时间,f X()为该工作流节点(工作流节点0)的运行时间概率分布函数,F X()为该工作流节点(工作流节点0)的运行时间累积分布函数;Y为相同配置、相同待比较路径以及相同位置上另一个工作流节点(例如图3中的工作流节点1)的运行时间,f Y()为该工作流节点(工作流节点1)的运行时间概率分布函数;F Y()为该工作流节点(工作流节点1)的运行时间累积分布函数;f Z(z)为该位置上上述两个工作流节点的最大运行时间概率分布。通过上述MAX算法,可以对一个位置上并列的至少两个工作流节点求最大概率分布。
为了降低MAX算法的难度,在待比较路径具有至少两条的情况下,同样可以采用剪枝方法task clustering优化MAX算法:可选的,若待比较路径的数量为至少两条,且待比较路径中存在合并路径,则在对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,还包括:若一合并路径上存在分配到相同类型的虚拟机的并行工作流节点,则将该一合并路径的这些并行工作流节点分配到同一个虚拟机上进行处理。在优化MAX算法后,若是并列的至少两个工作流节点被分配到同一个虚拟机上执行,则可以选择并列的工作流节点中的任意一个工作流节点的运行时间概率分布作为该并列的至少两个工作流节点的最大运行时间概率分布。
本实施例中,在计算出每条待比较路径的运行时间概率分布之后,可以基于MAX算法计算出相同的配置下路径集合的最大运行时间概率分布,可选的,基于相同配置下各待比较路径的运行时间概率分布确定各配置下路径集合的最大运行时间概率分布包括:
基于如下的公式计算至少两个配置的每个配置下路径集合的最大运行时间概率分布:
Figure PCTCN2018086936-appb-000008
其中,f Z(z)为相同配置下两条待比较路径的最大运行时间概率分布,z=max(X,Y),X和Y分别为两条待比较路径的运行时间,f X()为X对应的待比较路径的运行时间概率分布函数;F X()为X对应的待比较路径的运行时间累积分布函数;f Y()为Y对应的待比较路径的运行时间概率分布函数;F Y()为Y对应的待比较路径的运行时间累积分布函数。
基于至少两种配置的各配置下的最大运行时间概率分布确定对路径集合的最优配置包括:
基于按照第二预设比较方式从至少两个配置的两个配置中选择对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置;
其中,第二预设比较方式包括:按照公式
Figure PCTCN2018086936-appb-000009
计算概率值P(X>Y),在公式
Figure PCTCN2018086936-appb-000010
Figure PCTCN2018086936-appb-000011
中,X表示两种配置中的第一种配置下的路径集合的运行时间,Y表示在两种配置中的第二种配置下的路径集合的运行时间,f X(x)表示在第一种配置下的路径集合的最大运行时间概率分布函数,f Y(y)表示在第二种配置下的路径集合的最大运行时间概率分布函数;若概率值P(X>Y)大于预设阈值,则第二种配置为两种配置中的最优配置;其中,预设阈值不低于0.5。
如图4所示,本实施例还示出了一种基于概率分布的工作流资源配置优化系统,通过该系统实现上述的基于概率分布的工作流资源配置优化方法,本实施例的优化系统包括:
获取模块41,用于获取包含工作流的所有可能路径的路径集合;
剪枝模块42,用于通过剪枝算法将路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;其中,路径优化集合中的路径的长度相同;
合并模块43,用于若路径优化集合中存在长度相同的路径,则对路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少 一条合并路径,将合并路径以及路径优化集合中未参与合并的路径作为待比较路径;若路径优化集合中不存在长度相同的路径,则将路径优化集合中的所有路径均作为待比较路径;
配置模块44,用于确定对路径集合的配置,配置包括对处理路径集合中各路径上的各工作流节点的虚拟机的类型的设置;
处理模块45,用于若待比较路径的数量为一条,且对路径集合存在至少两种配置,则基于按照第一预设比较方式确定两个配置中对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置;若待比较路径的数量为至少两条,且对路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算,基于相同配置下各待比较路径的运行时间概率分布确定各配置下路径集合的最大运行时间概率分布,基于至少两种配置的各配置下的最大运行时间概率分布,确定该至少两个配置中对路径集合的最优配置;其中,第一预设比较方式包括:确定两种配置下待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算两种配置下待比较工作流节点的运行时间概率分布,基于两种配置下待比较工作流节点的运行时间概率分布,确定在两种配置中对路径集合的最优配置。
在一个示例中,剪枝模块42,用于对路径集合中的路径进行比较,若路径集合中某路径属于路径集合中至少一条其它路径的子集,则将某路径从路径集合中删除,得到路径优化集合。
可选的,合并模块43,用于对路径优化集合中的路径按照以下的规则进行合并得到至少一条合并路径:
若路径优化集合中在各条长度相同的路径的相同位置的工作流节点不同,将各条长度相同的路径在相同位置上的工作流节点并列作为各条长度相同的路径的合并路径上在相同位置的工作流节点;
若路径优化集合中在各条长度相同的路径的相同位置的工作流节点相同, 将各条长度相同的路径的相同位置上的相同的工作流节点作为各条长度相同的路径的合并路径上在相同位置的工作流节点。
进一步的,如图4所示,本实施例的基于概率分布的工作流资源配置优化系统还包括第一优化模块46和第二优化模块47,第一优化模块46,用于若待比较路径的数量为一条,且待比较路径为合并路径,则在基于按照第一预设比较方式确定两个配置中对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置前,在合并路径上存在分配到相同类型的虚拟机的并行工作流节点时,将合并路径的这些并行工作流节点分配到同一个虚拟机上进行处理;若待比较路径的数量为至少两条,且待比较路径中存在合并路径,则在对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,在一合并路径上存在分配到相同类型的虚拟机的并行工作流节点时,将该一合并路径的这些并行工作流节点分配到同一个虚拟机上进行处理。
第二优化模块47,用于若待比较路径的数量为一条,则在基于按照第一预设比较方式确定两个配置中对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置前,在待比较路径上存在分配到相同类型的虚拟机的串行工作流节点的情况下,将待比较路径的这些串行工作流节点分配到同一个虚拟机上进行处理;若待比较路径的数量为至少两条,则在对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,在某条待比较路径上存在分配到相同类型的虚拟机的串行工作流节点的情况下,将该条待比较路径的这些串行工作流节点分配到同一个虚拟机上进行处理。
其中,处理模块45可以通过以下的两种方式中的任意一种实现基于两种配置下待比较工作流节点的运行时间概率分布,确定在两种配置中对该路径集合的最优配置。
第一种:对两种配置按照以下方式进行比较得到在两种配置中对该路径集合的最优配置:
按照公式
Figure PCTCN2018086936-appb-000012
计算得到概率值P(X>Y);在公式
Figure PCTCN2018086936-appb-000013
中,X表示两种配置中的第一种配置下的待比较工作流节点的运行时间,Y表示在两种配置中的第二种配置下的待比较工作流节点的运行时间,f X(x)表示在第一种配置下的待比较工作流节点的运行时间概率分布函数,f Y(y)表示在第二种配置下的待比较工作流节点的运行时间概率分布函数;若概率值P(X>Y)大于预设阈值,则第二种配置为两种配置中的最优配置;其中,预设阈值不低于0.5。
第二种:对两种配置按照以下方式进行比较得到在两种配置中对该路径集合的最优配置:
基于以下的两个公式计算概率值P(X>Y)的范围,
If X.l≤Y.r≤X.r,P(X>Y)≥1-F X(Y.r)
If X.l≤Y.l≤X.r,P(X>Y)≤1-F X(Y.l)
其中,X表示两种配置中的第一种配置下的待比较工作流节点的运行时间,Y表示在两种配置中的第二种配置下的待比较工作流节点的运行时间,F x()表示在第一种配置下待比较工作流节点的运行时间累积分布函数;X.l和X.r分别为X的取值范围的下界和上界,Y.l和Y.r分别为Y的取值范围的下界和上界;若根据第一个公式计算得到的概率值P(X>Y)的最小值大于第一预设阈值,则第二种配置为两种配置中的最优配置;其中,第一预设阈值不低于0.5;若根据第二个公式计算得到的概率值P(X>Y)的最大值小于第二预设阈值,则第一种配置为两种配置中的最优配置;其中,第二预设阈值不高于0.5。
处理模块45,用于基于如下的公式计算至少两个配置的每个配置下路径集合的最大运行时间概率分布:
Figure PCTCN2018086936-appb-000014
其中,f Z(z)为相同配置下两条待比较路径的最大运行时间概率分布,z=max(X,Y),X和Y分别为两条待比较路径的运行时间,f X()为X对应的待比较 路径的运行时间概率分布函数;F X()为X对应的待比较路径的运行时间累积分布函数;f Y()为Y对应的待比较路径的运行时间概率分布函数;F Y()为Y对应的待比较路径的运行时间累积分布函数。
以及该处理模块45用于通过如下的方式基于至少两种配置的各配置下的最大运行时间概率分布,确定对路径集合的最优配置:
基于按照第二预设比较方式从至少两个配置的两个配置中选择对路径集合的最优配置的原理,得到至少两种配置中对路径集合的最优配置;
其中,第二预设比较方式包括:按照公式
Figure PCTCN2018086936-appb-000015
计算概率值P(X>Y),在公式
Figure PCTCN2018086936-appb-000016
Figure PCTCN2018086936-appb-000017
中,X表示两种配置中的第一种配置下的路径集合的运行时间,Y表示在两种配置中的第二种配置下的路径集合的运行时间,f X(x)表示在第一种配置下的路径集合的最大运行时间概率分布函数,f Y(y)表示在第二种配置下的路径集合的最大运行时间概率分布函数;若概率值P(X>Y)大于预设阈值,则第二种配置为两种配置中的最优配置;其中,预设阈值不低于0.5。
本实施例基于对工作流节点的运行时间概率分布的计算且结合三个剪枝方法和部分比较的方法实现了对工作流资源的较优配置,该优化过程使用的运行时间概率分布是对云平台资源性能的不稳定性的考量,解决了现有技术的动态规划算法中存在的技术问题,本实施例通过第一个剪枝减少了后续的计算和部分比较涉及的路径的数量,实现了一次计算量和消耗的降低;通过第二个剪枝方法优化了ADD和MAX算法,对ADD算法和MAX算法的优化分别大大减少了对串行工作流节点的运行时间和的概率分布的计算,对并行工作流节点的运行时间概率分布的计算,也降低了本实施例的消耗;在只有一条待比较路径时,通过只对多个配置下的部分路径进行运行时间概率分布的比较(即部分比较),有效降低了消耗和计算量,而在部分比较过程中采用了第三个剪枝算法对部分路径的运行时间概率分布的计算进行了简化,又实现了一次消耗和计算复 杂度的降低;本实施例的方法通过三个剪枝和部分比较极大地降低了消耗,降低了计算复杂度和计算量,降低了工作流资源配置优化方案的推广难度,提升了其实用性。
在本申请所提供的实施例中,应该理解到,所揭露的系统和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上为对本发明所提供的一种基于概率分布的工作流资源配置优化方法和系统的描述,对于本领域的技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种基于概率分布的工作流资源配置优化方法,其特征在于,包括:
    获取包含工作流的所有可能路径的路径集合;
    通过剪枝算法将所述路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;
    若所述路径优化集合中存在长度相同的路径,则对所述路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将所述合并路径以及所述路径优化集合中未参与合并的路径作为待比较路径;若所述路径优化集合中不存在长度相同的路径,则将所述路径优化集合中的所有路径均作为待比较路径;
    确定对所述路径集合的配置;所述配置包括对处理所述路径集合中各路径上各工作流节点的虚拟机的类型的设置;
    若所述待比较路径的数量为一条,且对所述路径集合存在至少两种配置,则基于按照第一预设比较方式确定两种配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;其中,所述第一预设比较方式包括:确定两种配置下所述待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算所述两种配置下所述待比较工作流节点的运行时间概率分布,基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置;
    若所述待比较路径的数量为至少两条,且对所述路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算;基于相同配置下各待比较路径的所述运行时间概率分布,确定各配置下所述路径集合的最大运行时间概率分布;基于所述至少两种配置的各配置下的所述最大运行时间概率分布,确定所述至少两种配置中对所述路径集合的最优配置。
  2. 如权利要求1所述的基于概率分布的工作流资源配置优化方法,其特征 在于,所述通过剪枝算法将所述路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合包括:
    对所述路径集合中的路径进行比较,若所述路径集合中某路径属于所述路径集合中至少一条其它路径的子集,则将所述某路径从所述路径集合中删除,得到路径优化集合。
  3. 如权利要求1所述的基于概率分布的工作流资源配置优化方法,其特征在于,所述对所述路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径包括:
    对所述路径优化集合中的路径按照以下的规则进行合并得到至少一条合并路径:
    若所述路径优化集合中在各条长度相同的路径的相同位置的工作流节点不同,将所述各条长度相同的路径在所述相同位置上的工作流节点并列作为所述各条长度相同的路径的合并路径上在所述相同位置的工作流节点;
    若所述路径优化集合中在各条长度相同的路径的相同位置的工作流节点相同,将所述各条长度相同的路径的所述相同位置上的相同的工作流节点作为所述各条长度相同的路径的合并路径上在所述相同位置的工作流节点。
  4. 如权利要求1所述的基于概率分布的工作流资源配置优化方法,其特征在于,若所述待比较路径的数量为一条,且所述待比较路径为合并路径,则在所述基于按照第一预设比较方式确定两种配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置前,还包括:若所述合并路径上存在分配到相同类型的虚拟机的并行工作流节点,则将所述合并路径的所述并行工作流节点分配到同一个虚拟机上进行处理;
    若所述待比较路径的数量为至少两条,且所述待比较路径中存在合并路径,则在所述对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,还包括:若一合并路径上存在分配到相同类型的虚拟机的并行工作流节点,则 将所述一合并路径的所述并行工作流节点分配到同一个虚拟机上进行处理。
  5. 如权利要求1所述的基于概率分布的工作流资源配置优化方法,其特征在于,若所述待比较路径的数量为一条,则在所述基于按照第一预设比较方式确定两种配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置前,还包括:若所述待比较路径上存在分配到相同类型的虚拟机的串行工作流节点,则将所述待比较路径的所述串行工作流节点分配到同一个虚拟机上进行处理;
    若所述待比较路径的数量为至少两条,则在所述对每种配置下的各条待比较路径分别进行运行时间概率分布的计算前,还包括:若某条待比较路径上存在分配到相同类型的虚拟机的串行工作流节点,则将该条待比较路径的所述串行工作流节点分配到同一个虚拟机上进行处理。
  6. 如权利要求1-5任一项所述的基于概率分布的工作流资源配置优化方法,其特征在于,所述基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置包括:
    对所述两种配置按照以下方式进行比较得到在所述两种配置中对所述路径集合的最优配置:
    按照公式
    Figure PCTCN2018086936-appb-100001
    计算得到概率值P(X>Y);在所述公式
    Figure PCTCN2018086936-appb-100002
    中,X表示所述两种配置中的第一种配置下的所述待比较工作流节点的运行时间,Y表示在所述两种配置中的第二种配置下的所述待比较工作流节点的运行时间,f X(x)表示在所述第一种配置下的所述待比较工作流节点的运行时间概率分布函数,f Y(y)表示在所述第二种配置下的所述待比较工作流节点的运行时间概率分布函数;
    若所述概率值P(X>Y)大于预设阈值,则所述第二种配置为所述两种配置中的最优配置;其中,所述预设阈值不低于0.5。
  7. 如权利要求1-5任一项所述的基于概率分布的工作流资源配置优化方法, 其特征在于,所述基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置包括:
    对所述两种配置按照以下方式进行比较得到在所述两种配置中对所述路径集合的最优配置:
    基于以下的两个公式计算概率值P(X>Y)的范围,
    If X.l≤Y.r≤X.r,P(X>Y)≥1-F X(Y.r)
    If X.l≤Y.l≤X.r,P(X>Y)≤1-F X(Y.l)
    其中,所述X表示所述两种配置中的第一种配置下的所述待比较工作流节点的运行时间,所述Y表示在所述两种配置中的第二种配置下的所述待比较工作流节点的运行时间,所述F x()表示在所述第一种配置下所述待比较工作流节点的运行时间累积分布函数;所述X.l和X.r分别为所述X的取值范围的下界和上界,所述Y.l和Y.r分别为所述Y的取值范围的下界和上界;
    若根据第一个公式计算得到的所述概率值P(X>Y)的最小值大于第一预设阈值,则所述第二种配置为所述两种配置中的最优配置;其中,所述第一预设阈值不低于0.5;若根据第二个公式计算得到的所述概率值P(X>Y)的最大值小于第二预设阈值,则所述第一种配置为所述两种配置中的最优配置;其中,所述第二预设阈值不高于0.5。
  8. 如权利要求1-5任一项所述的基于概率分布的工作流资源配置优化方法,其特征在于,所述基于相同配置下各待比较路径的所述运行时间概率分布,确定各配置下所述路径集合的最大运行时间概率分布包括:
    基于如下的公式计算所述至少两个配置的每个配置下所述路径集合的最大运行时间概率分布:
    Figure PCTCN2018086936-appb-100003
    其中,所述f Z(z)为相同配置下两条待比较路径的最大运行时间概率分布,z=max(X,Y),所述X和Y分别为所述两条待比较路径的运行时间,f X()为所述X对应的待比较路径的运行时间概率分布函数;F X()为所述X对应的待 比较路径的运行时间累积分布函数;f Y()为所述Y对应的待比较路径的运行时间概率分布函数;F Y()为所述Y对应的待比较路径的运行时间累积分布函数。
  9. 如权利要求1-5任一项所述的基于概率分布的工作流资源配置优化方法,其特征在于,所述基于所述至少两种配置的各配置下的所述最大运行时间概率分布,确定所述至少两种配置中对所述路径集合的最优配置包括:
    基于按照第二预设比较方式确定两个配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;
    其中,所述第二预设比较方式包括:按照公式
    Figure PCTCN2018086936-appb-100004
    计算概率值P(X>Y),在所述公式
    Figure PCTCN2018086936-appb-100005
    中,所述X表示所述两种配置中的第一种配置下的所述路径集合的运行时间,所述Y表示在所述两种配置中的第二种配置下的所述路径集合的运行时间,所述f X(x)表示在所述第一种配置下的所述路径集合的最大运行时间概率分布函数,所述f Y(y)表示在所述第二种配置下的所述路径集合的最大运行时间概率分布函数;若所述概率值P(X>Y)大于预设阈值,则所述第二种配置为所述两种配置中的最优配置;其中,所述预设阈值不低于0.5。
  10. 一种基于概率分布的工作流资源配置优化系统,其特征在于,包括:
    获取模块,用于获取包含工作流的所有可能路径的路径集合;
    剪枝模块,用于通过剪枝算法将所述路径集合中运行时间不满足预设条件的路径删除,得到路径优化集合;
    合并模块,用于若所述路径优化集合中存在长度相同的路径,则对所述路径优化集合中各条长度相同的路径上位于同一位置的工作流节点进行合并,得到至少一条合并路径,将所述合并路径以及所述路径优化集合中未参与合并的路径作为待比较路径;若所述路径优化集合中不存在长度相同的路径,则将所 述路径优化集合中的所有路径均作为待比较路径;
    配置模块,用于确定对所述路径集合的配置;所述配置包括对处理所述路径集合中各路径上的各工作流节点的虚拟机的类型的设置;
    处理模块,用于若所述待比较路径的数量为一条,且对所述路径集合存在至少两种配置,则基于按照第一预设比较方式确定两个配置中对所述路径集合的最优配置的原理,得到所述至少两种配置中对所述路径集合的最优配置;其中,所述第一预设比较方式包括:确定两种配置下所述待比较路径上分配的配置不同的工作流节点作为待比较工作流节点,计算所述两种配置下所述待比较工作流节点的运行时间概率分布,基于所述两种配置下所述待比较工作流节点的运行时间概率分布确定在所述两种配置中对所述路径集合的最优配置;以及用于若所述待比较路径的数量为至少两条,且对所述路径集合存在至少两种配置,则对每种配置下的各条待比较路径分别进行运行时间概率分布的计算;基于相同配置下各待比较路径的所述运行时间概率分布,确定各配置下所述路径集合的最大运行时间概率分布;基于所述至少两种配置的各配置下的所述最大运行时间概率分布,确定所述至少两种配置中对所述路径集合的最优配置。
PCT/CN2018/086936 2018-05-16 2018-05-16 一种基于概率分布的工作流资源配置优化方法和系统 WO2019218169A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/086936 WO2019218169A1 (zh) 2018-05-16 2018-05-16 一种基于概率分布的工作流资源配置优化方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/086936 WO2019218169A1 (zh) 2018-05-16 2018-05-16 一种基于概率分布的工作流资源配置优化方法和系统

Publications (1)

Publication Number Publication Date
WO2019218169A1 true WO2019218169A1 (zh) 2019-11-21

Family

ID=68539558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/086936 WO2019218169A1 (zh) 2018-05-16 2018-05-16 一种基于概率分布的工作流资源配置优化方法和系统

Country Status (1)

Country Link
WO (1) WO2019218169A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981893A (zh) * 2012-12-25 2013-03-20 国网电力科学研究院 一种虚拟机调度方法及系统
CN103970609A (zh) * 2014-04-24 2014-08-06 南京信息工程大学 一种基于改进蚁群算法的云数据中心任务调度方法
US20160350160A1 (en) * 2014-01-31 2016-12-01 Telefonaktiebolaget Lm Ericsson (Publ) Method and Apparatus for Managing Workflows for Communication Network Provisioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981893A (zh) * 2012-12-25 2013-03-20 国网电力科学研究院 一种虚拟机调度方法及系统
US20160350160A1 (en) * 2014-01-31 2016-12-01 Telefonaktiebolaget Lm Ericsson (Publ) Method and Apparatus for Managing Workflows for Communication Network Provisioning
CN103970609A (zh) * 2014-04-24 2014-08-06 南京信息工程大学 一种基于改进蚁群算法的云数据中心任务调度方法

Similar Documents

Publication Publication Date Title
Chu et al. Confidence-based work stealing in parallel constraint programming
US8863128B2 (en) System and method for optimizing the evaluation of task dependency graphs
TWI730043B (zh) 關聯分析方法和裝置
WO2015196911A1 (zh) 数据挖掘方法和节点
Bender et al. Cache-adaptive algorithms
CN113742089B (zh) 异构资源中神经网络计算任务的分配方法、装置和设备
CN112114960B (zh) 一种适应互联网场景的遥感影像并行集群处理的调度策略
CN111309976B (zh) 一种面向收敛型图应用的GraphX数据缓存方法
CN115237580B (zh) 面向智能计算的流水并行训练自适应调整系统、方法
Mao et al. Dress: Dynamic resource-reservation scheme for congested data-intensive computing platforms
CN105205052A (zh) 一种数据挖掘方法及装置
Makanju et al. Deep parallelization of parallel FP-growth using parent-child MapReduce
US20230316187A1 (en) Optimization method for large-scale cloud service processes
WO2016197706A1 (zh) 数据的迁移方法及装置
WO2019218169A1 (zh) 一种基于概率分布的工作流资源配置优化方法和系统
CN110780947A (zh) 用于社交图数据的PageRank并行计算加速方法
CN108762918B (zh) 一种基于概率分布的工作流资源配置优化方法和系统
CN112598112B (zh) 一种基于图神经网络的资源调度方法
WO2022116142A1 (zh) 一种基于图神经网络的资源调度方法
CN108228323A (zh) 基于数据本地性的Hadoop任务调度方法及装置
Chen et al. Orchid: An Online Learning based Resource Partitioning Framework for Job Colocation with Multiple Objectives
Vinutha et al. Node Performance Load Balancing Algorithm for Hadoop Cluster
KR102405084B1 (ko) 대규모 그래프 데이터에 대한 분산처리 비식별화 방법
WO2018098797A1 (zh) Q学习中调整状态空间边界的方法和装置
WO2024087844A1 (zh) 图神经网络的训练方法、训练系统及异常账号识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919085

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 190321)

122 Ep: pct application non-entry in european phase

Ref document number: 18919085

Country of ref document: EP

Kind code of ref document: A1