CN111882234B

CN111882234B - Scientific workflow task management method and device

Info

Publication number: CN111882234B
Application number: CN202010765531.5A
Authority: CN
Inventors: 马宗学; 李传义; 顾易
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2022-04-12
Anticipated expiration: 2040-08-03
Also published as: CN111882234A

Abstract

The invention provides a scientific workflow task management method and device. Determining a directed acyclic graph of the scientific workflow, wherein the graph shows the incidence relation between each task in the workflow and each other task for calculating the task, the storage cost of the task, and each calculation cost corresponding to the task and each other task; setting a plurality of particles, wherein the values of the particles comprise the values of decision variables of each task, and the values correspond to storage or calculation; calculating the utility function of each particle and determining the minimum value according to the graph and the current latest value of each particle; and judging whether the optimal value exists in the current minimum values, wherein the optimal value is the minimum value and is not greater than any minimum value, if so, managing each task according to the value of the particle used in the calculation of the optimal value, otherwise, performing one-time iteration updating on the current latest value of each particle by using a particle swarm algorithm, and calculating the utility function of each particle again. The scheme can easily realize the lowest cost of scientific workflow processing.

Description

Scientific workflow task management method and device

Technical Field

The invention relates to the technical field of computers, in particular to a scientific workflow task management method and a scientific workflow task management device.

Background

Scientific Workflow (SWF) is the application of workflow technology to the field of Scientific computing. Scientific workflow can complete system construction quickly and ensure the quality of calculation. The scientific workflow is deployed on the cloud computing platform, so that the high efficiency, safety and rapidness of the computing process can be ensured, the accuracy of the computing result is ensured, the utilization efficiency of cloud computing technology resources is maximized, the operation and maintenance risk and the operation and maintenance cost are reduced to the minimum, and the consistency and the high efficiency of the rapid delivery of tasks and the automatic deployment of the resources are ensured.

Aiming at the characteristics of calculation and storage constraints in the cloud platform workflow, on a cloud computing platform formed by a server cluster, the processing of each task in the scientific workflow needs a task management system to control the storage or calculation of the task, so that the execution efficiency of the operation is improved. An intelligent algorithm is needed to reasonably process the submitted tasks, so that the processing speed is fastest, and the resource utilization is maximized.

At present, most of the conventional task allocation management systems adopt a static task allocation method, which can configure in advance, for example, whether each task is calculated or stored randomly, so that the externally submitted task is transferred to a suitable computer, and then all the tasks are executed to finally obtain a final result.

However, the static task processing algorithm does not consider the difference between the calculation cost and the storage cost of different tasks, and the constraint relationship between the tasks is not obvious, so that the cost minimization of the scientific workflow processing is not easy to realize.

Disclosure of Invention

The invention provides a scientific workflow task management method and a device, which can more easily realize the lowest cost of scientific workflow processing.

In order to achieve the purpose, the invention is realized by the following technical scheme:

in a first aspect, the present invention provides a method for managing tasks of a scientific workflow, comprising:

determining a directed acyclic graph of a scientific workflow, wherein for any task in the scientific workflow, the directed acyclic graph exhibits: the task and each other task in the scientific workflow are used for calculating the incidence relation between the task and each other task, the storage cost corresponding to the task, and each calculation cost corresponding to the task and each other task respectively;

setting at least two particles, wherein the value of any particle comprises the value of a decision variable of each task, and the value of the decision variable corresponds to storage or calculation;

respectively calculating a first value of a utility function of each particle according to the incidence relation, the storage cost and the calculation cost in the directed acyclic graph and the current latest value of each particle, and determining the minimum value of all the first values calculated at the current time;

determining a current minimum value change trend according to each determined minimum value in time sequence, and judging whether an optimal value exists in the minimum value change trends, wherein the optimal value is a minimum value in the minimum value change trends, and the optimal value is not greater than each minimum value in the minimum value change trends;

if the optimal value does not exist, respectively carrying out one-time iterative updating on the current latest value of each particle by using a particle swarm algorithm, and executing the first value of the utility function of each particle;

and if the optimal value exists, managing each task according to the value of the particle used for calculating the optimal value.

Furthermore, each vertex in the directed acyclic graph corresponds to each task, a value corresponding to any vertex is storage cost corresponding to the task corresponding to the vertex, when one task is used for calculating another task, the vertex corresponding to the task points to the vertex corresponding to the another task, and a value corresponding to a corresponding pointing line is calculation cost corresponding to the task and the another task;

after the determining the directed acyclic graph of the scientific workflow, further comprising: setting an adjacency list of a chain type storage structure, a calculation cost matrix and a storage cost array according to the directed acyclic graph;

wherein the adjacency list includes: the single linked list of each vertex in the directed acyclic graph comprises the vertex and each vertex pointed by the vertex;

the calculating the cost matrix comprises: values corresponding to all direction lines in the directed acyclic graph, vertexes at two ends and directions;

the storage cost array comprises: values corresponding to each vertex in the directed acyclic graph;

the step of, according to the incidence relation, the storage cost and the calculation cost in the directed acyclic graph, including: and according to the adjacency list, the calculation cost matrix and the storage cost array.

Further, the performing, by using a particle swarm algorithm, one iteration update on the current latest value of each particle respectively includes: respectively carrying out one-time iterative updating on the current latest value of each particle by using a formula I;

the first formula comprises:

wherein t represents before the current iteration update, and t +1 represents after the current iteration update; omega is the inertial weight; c. C₁Is a cognitive coefficient, regulates to p_iThe flight step length of (a); c. C₂Is social coefficient, regulation direction p_gThe flight step length of (a); x_i＝(x_i1，x_i2，…，x_im) Is the position of the particle i in the iteration, x_imThe value of the decision variable of the mth task in the particle i; p is a radical of_i＝(p_i1，p_i2，…，p_im) Is the individual extremum of particle i; p is a radical of_g＝(p_g1，p_g2，…，p_gm) Is a global extremum; f₁(X_i(t)) is with respect to particle X_i(t); f₂(X_i(t),p_i(t)) is X_i(t) to p_i(t) a learning operation; f₃(X_i(t),p_g(t)) is X_i(t) to p_g(t) learning operation.

Further, c₁＝c₂＝2。

Further, the value of omega is between [0.4 and 1.4], the value of omega is gradually reduced along with the increase of the iteration times, the value of omega is not less than 1.2 in the previous N iterations, and N is a preset value.

Further, after the determining the minimum value of all the first values calculated at the current time, further comprising: and judging whether the minimum value determined for the latest continuous M times is unchanged, wherein M is a preset numerical value, and if yes, replacing part of the at least two particles with new particles with the same amount.

Further, M is 5, and the number of particles of the new particles is 30% of the number of particles of the at least two particles.

Further, said calculating a first value of the utility function for each of said particles, respectively, comprises: taking each particle as a current particle, and executing the following steps:

a1: respectively taking each task as a first task, determining whether the value of a decision variable of the first task corresponds to storage according to the value of the current particle, if so, executing A2, otherwise, executing A3;

a2: determining the storage cost corresponding to the first task to be added, and executing A5;

a3: determining the calculation costs respectively corresponding to the first task and the second tasks to be added, wherein the second tasks are tasks used for calculating the first task in the scientific workflow, and executing A4 and A5;

a4: respectively taking each second task as a third task, determining whether the value of a decision variable of the third task corresponds to storage according to the value of the current particle, if so, executing A5, otherwise, taking the third task as a first task, and executing A3;

a5: and determining the sum of the determined calculation cost and the storage cost to be added as a first value of the utility function of the current particle.

In a second aspect, the present invention provides a scientific workflow task management apparatus for executing the scientific workflow task management method according to any one of the first aspects, the apparatus comprising:

the directed acyclic graph determining unit is used for determining a directed acyclic graph of a scientific workflow, wherein for any task in the scientific workflow, the directed acyclic graph shows: the task and each other task in the scientific workflow are used for calculating the incidence relation between the task and each other task, the storage cost corresponding to the task, and each calculation cost corresponding to the task and each other task respectively;

the particle setting unit is used for setting at least two particles, the value of any particle comprises the value of a decision variable of each task, and the value of the decision variable corresponds to storage or calculation;

a utility function calculation unit, configured to calculate a first value of a utility function of each particle according to the association relationship, the storage cost, and the calculation cost in the directed acyclic graph, and according to a current latest value of each particle, and determine a minimum value of all the first values calculated at the current time;

the optimal value judging unit is used for determining the current minimum value change trend according to the time sequence according to the determined minimum values and judging whether the minimum value change trend has an optimal value, wherein the optimal value is the minimum value in the minimum value change trend, the optimal value is not greater than each minimum value in the minimum value change trend, if yes, the task management unit is triggered, and if not, the iteration updating unit is triggered;

the iterative updating unit is used for performing one-time iterative updating on the current latest value of each particle by utilizing a particle swarm algorithm and triggering the utility function calculating unit after being triggered by the optimal value judging unit;

and the task management unit is used for triggering by the optimal value judgment unit and managing each task according to the value of the particle used for calculating the optimal value.

In a third aspect, the present invention provides a memory controller comprising: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor is configured to invoke the machine-readable program to perform the method of any of the above first aspects.

In a fourth aspect, the present invention provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of the first aspects above.

The invention provides a scientific workflow task management method and device. Determining a directed acyclic graph of the scientific workflow, wherein the graph shows the incidence relation between each task in the workflow and each other task for calculating the task, the storage cost of the task, and each calculation cost corresponding to the task and each other task; setting a plurality of particles, wherein the values of the particles comprise the values of decision variables of each task, and the values correspond to storage or calculation; calculating the utility function of each particle and determining the minimum value according to the graph and the current latest value of each particle; and judging whether the optimal value exists in the current minimum values, wherein the optimal value is the minimum value and is not greater than any minimum value, if so, managing each task according to the value of the particle used in the calculation of the optimal value, otherwise, performing one-time iteration updating on the current latest value of each particle by using a particle swarm algorithm, and calculating the utility function of each particle again. The invention can easily realize the lowest cost of scientific workflow processing.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a scientific workflow task management method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a directed acyclic graph according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of another directed acyclic graph provided in accordance with an embodiment of the present invention;

FIG. 4 is a diagram illustrating an adjacency list according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another directed acyclic graph provided in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of a distribution of initial particle populations provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of a distribution of iteratively completed particle populations provided by an embodiment of the present invention;

fig. 8 is a schematic diagram of a scientific workflow task management apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a scientific workflow task management method, which may include the following steps:

step 101: determining a directed acyclic graph of a scientific workflow, wherein for any task in the scientific workflow, the directed acyclic graph exhibits: the task and each other task in the scientific workflow are used for calculating the incidence relation between the task and each other task, the storage cost corresponding to the task, and each calculation cost corresponding to the task and each other task respectively.

Step 102: and setting at least two particles, wherein the value of any particle comprises the value of a decision variable of each task, and the value of the decision variable corresponds to storage or calculation.

Step 103: and respectively calculating a first value of a utility function of each particle according to the incidence relation, the storage cost and the calculation cost in the directed acyclic graph and the current latest value of each particle, and determining the minimum value of all the first values calculated at the current time.

Step 104: and determining the current minimum value change trend according to the determined minimum values in time sequence, and judging whether the minimum value change trend has an optimal value, wherein the optimal value is the minimum value in the minimum value change trend, and the optimal value is not greater than each minimum value in the minimum value change trend, if so, executing step 106, otherwise, executing step 105.

Step 105: if the optimal value does not exist, respectively performing one iteration updating on the current latest value of each particle by using a particle swarm algorithm, and executing the step 103 again.

Step 106: and if the optimal value exists, managing each task according to the value of the particle used for calculating the optimal value.

The scientific workflow task management method provided by the embodiment of the invention comprises the following steps: determining a directed acyclic graph of the scientific workflow, wherein the graph shows the incidence relation between each task in the workflow and each other task for calculating the task, the storage cost of the task, and each calculation cost corresponding to the task and each other task; setting a plurality of particles, wherein the values of the particles comprise the values of decision variables of each task, and the values correspond to storage or calculation; calculating the utility function of each particle and determining the minimum value according to the graph and the current latest value of each particle; and judging whether the optimal value exists in the current minimum values, wherein the optimal value is the minimum value and is not greater than any minimum value, if so, managing each task according to the value of the particle used in the calculation of the optimal value, otherwise, performing one-time iteration updating on the current latest value of each particle by using a particle swarm algorithm, and calculating the utility function of each particle again. It can be seen that, unlike the existing static scheduling policy, the static scheduling policy is only to compute or store tasks randomly without combining the constraint relationship between the tasks, and thus cannot perform dynamic adjustment well according to the requirement of cost minimization, lacks adaptability, and does not consider the difference between the computation cost and the storage cost of different tasks. In addition, the embodiment of the invention can realize the optimization and low consumption of task processing, thereby better saving the storage space and reducing the storage space of the cloud server.

In the embodiment of the invention, after the task is processed under the cloud computing platform, two options of not storing and storing need to be considered for the processing result:

(1) if calculation is adopted, the cost does not need to be stored, but the cost for the reproduction of the subsequent task needs to include the calculation cost of the current task;

(2) if storage is selected, no calculation cost exists, and storage only needs the task to occupy storage space of the server for storage.

According to the workflow task computing-storage problem model, in a cloud computing mode, two types of expenses of storage and computing exist in the use of a public cloud, and the aim of storage is to minimize the total cost of the process on the premise of meeting two types of constraints of computing and storage.

In one embodiment of the invention, the scientific workflow planning scheme in the cloud computing environment can be summarized into the following parts:

1. decision variables

Is provided with T (N)₁,N₂,…,N_T) The tasks are waited for the cloud computing platform to process, and according to the description of the planning problem, a decision variable x (T) is determined, wherein T is more than or equal to 1 and less than or equal to T, and the following definition of a formula (1) can be provided:

it can be seen that the value of the decision variable of a task corresponding to storage is 1, and the value of the decision variable of a task corresponding to calculation is 0.

The cloud computing task processing allocation aims at optimizing the overall benefits of the whole formation, and the cost required by computing and storing is a main index for evaluating the efficiency.

2. Calculating a cost function

The calculation cost can be represented by a, (t) represents the calculation cost of the t-th task, and the sum of the calculation costs of the tasks related to the previous tasks is shown in the following formula (2):

in this formula (2), i corresponds to the ith task in the scientific workflow, which is used to calculate the tth task.

As shown in fig. 2, the dots represent tasks, the task names are under the tasks, the storage costs are in parentheses after the task names, and the numbers on the directed line segments are the calculated costs from the task to the next task. For example, if a4 stores, it only stores a cost of 4, and if not, the sum of the calculated costs to the task for each of the tasks previously associated is 3.

3. Storing cost functions

The storage cost can be represented by b, (t) represents the storage cost of the t-th task.

4. Task processing

As can be seen from the above model analysis, the task processing of the cloud computing platform includes two sub-target models, and the two sub-target models are mutually constrained. If the storage is performed, the calculation cost does not need to be considered, and similarly, if the calculation is performed, the storage cost does not need to be considered. This constraint may be constrained with a decision function. According to the decision variables, the calculation cost function and the storage cost function, the performance index function of the task processing under the cloud computing platform is the following formula (3):

two terms are included in this equation (3), the first term F being the additional associated computational cost. In the second item, two levels of summation are provided, wherein the first level of summation is aimed at each task related to the t-th task, and the second level of summation is aimed at each task in the scientific workflow.

For the calculation of the first term F, for example, referring to fig. 3, a5 is not stored, and a5 can be calculated according to a1, a2 and a3, and the corresponding calculation cost is 2+2+ 1-5, which is calculated in the second term. However, since a1 is not stored and the task processing result can not be directly used, a1 is still calculated, i.e. a1 is calculated according to a0, and the corresponding calculation cost is 1, and the calculation cost is calculated in the first item.

Based on the above, when storing, the sum plus the storage cost; and when the storage is not carried out, adding the sum of the expenses to the calculation expense, and obtaining the calculation expense which is the sum of the calculation expenses from each associated node to the node. The specific implementation procedure may include the following contents:

based on the above, please refer to fig. 2, fig. 3 and fig. 5, which respectively show directed acyclic graphs of 3 different scientific workflows. Taking fig. 2 as an example, there are 4 tasks a1, a2, a3 and a4 in the corresponding scientific workflow, and vertices corresponding to the tasks are shown in fig. 2. The calculation of a2 needs to be carried out on a1, the calculation of a3 needs to be carried out on a1, the calculation of a4 needs to be carried out on a2 and a3, and the association relationship among the tasks is shown by corresponding directional lines in FIG. 2. For example, the calculation of a2 requires a1, then a1 → a 2. The value in "()" in fig. 2 is the storage cost corresponding to the task, that is, the storage cost required for storing the task result for obtaining the task result after calculating the task. The numerical value "→" in fig. 2 is a calculation cost corresponding to the start task of the direction line and the direction task of the direction line, that is, a calculation cost required when the start task is used to calculate the direction task.

After the directed acyclic graph of the scientific workflow is determined, the values of decision variables of each task are determined, the values can represent which tasks need to be stored and which tasks do not need to be stored, and the tasks which do not need to be stored naturally need to be calculated. Therefore, based on the incidence relation, the storage cost and the calculation cost shown in the directed acyclic graph and based on the values of the decision variables of each task, the required cost for processing the scientific workflow can be obtained by solving the utility function.

In order to minimize the processing cost of the scientific workflow, the value of the decision variable of each task should be an optimized value. However, because the number of tasks in the scientific workflow is usually large, and the incidence relation or constraint relation between the tasks is usually complex, it is usually impossible to directly determine whether the task is optimized for calculation or optimized for storage according to the comparison between the calculation cost and the storage cost corresponding to each task. Based on this, the embodiment of the invention adopts the particle swarm algorithm to continuously iterate and update to find the optimal value, so that whether each task is stored or not can be determined based on the optimal value, and the cost of scientific workflow processing is minimized.

As shown in step 102, a particle group may be set, and the initial value of each particle in the particle group may be artificially preset or randomly generated, which is equivalent to the value after the 0 th iteration. And the value of the particle is the value of a decision variable of each task in the scientific workflow. For example, a task may correspond to a value of 1 when the task corresponds to storage and a task may correspond to a value of 0 when the task corresponds to computation.

And in the circulation, after the value after the ith iterative update of the particle is iteratively updated again, the value after the (i + 1) th iterative update of the particle can be obtained, and the iteration is stopped until the optimal value can be obtained by using the value after a certain iterative update. Of course, the maximum number of iterations may also be preset, and the number of times of execution of the iteration update should not be greater than the preset number of times. And in a special case, if the optimal value does not exist after the preset number of times is reached, a relative optimal value in the latest minimum value change trend can be taken, for example, the relative optimal value is not greater than any other minimum value.

And after each iteration update, calculating the value of the utility function according to the obtained particle value. Since the number of particles is plural, the obtained value of the utility function is plural, and the minimum value among the plural values can be taken. In this way, the updated minimum value for each iteration can be obtained.

Based on the execution time sequence of the iterative update and the minimum value obtained after each iterative update, the minimum value change trend after each iterative update can be obtained. Whether the optimal value exists can be known according to the change trend of the minimum value, and the optimal value is usually a minimum value and is not larger than any other minimum value so as to avoid the optimal value being locally optimal. After the optimal value is obtained, the scientific workflow can be managed based on the optimal value, for example, the tasks corresponding to the calculation are calculated, and the tasks corresponding to the storage are stored.

Taking the directed acyclic graph of the scientific workflow shown in fig. 3 as an example, there are 7 tasks a0-a6 in the scientific workflow, and according to the value of the particle used in calculating the optimal value, the situation of whether each task is stored or not as shown in fig. 3 can be obtained, and then each task is managed accordingly based on the situation.

In summary, based on the task data evolution problem of the cloud computing platform and the characteristics of the computing and storage constraints in the scientific workflow, the embodiment of the invention provides a cloud platform workflow task result persistence strategy based on the particle swarm algorithm, thereby realizing the application of the particle swarm algorithm in workflow deployment. The embodiment of the invention specifically aims at the problem of the combination optimization of the discrete variable space, utilizes the storage structure and the characteristics of the directed acyclic graph to construct a utility function (Fitness function) which reflects the satisfaction degree of the obtained task processing and accords with the economic profit of the user, and utilizes a mutual constraint mode among tasks to construct a method of scientific workflow so as to achieve the purposes of reasonably utilizing resources and improving the utilization rate of the resources.

In an embodiment of the present invention, an implementation manner of setting a particle group may be as follows:

the values of the population of particles are only 0 and 1, with 0 representing no memory and 1 representing memory. A random function may be applied to generate an initial population of particles with values of 0 and 1. As shown in the scientific workflow corresponding to fig. 3, if the number of nodes is 7, the number of dimensions of the particle is 7, and the number of initial particles can be defined as 5. The procedure for initializing the particle population may be as follows:

in an embodiment of the present invention, each vertex in the directed acyclic graph corresponds to each task, a value corresponding to any vertex is storage cost corresponding to the task corresponding to the vertex, when a task is used for calculating another task, the vertex corresponding to the task points to the vertex corresponding to the another task, and a value corresponding to the corresponding point line is calculation cost corresponding to the task and the another task;

In the embodiment of the present invention, after the directed acyclic graph is obtained, in order to facilitate the use of the information, such as the incidence relation, the storage cost, and the calculation cost, shown in the directed acyclic graph for the calculation of the utility function, the directed acyclic graph may be processed to correspondingly obtain an adjacency list reflecting the incidence relation information, a calculation cost matrix reflecting the calculation cost information, and a storage cost array reflecting the storage cost information, and then the utility function of the particle is calculated based on the information in the adjacency list, the calculation cost matrix, and the storage cost array.

In the embodiment of the invention, the representation of the scientific workflow is represented by a directed acyclic graph, each vertex in the graph represents one task, the number on a directed line segment represents the calculation cost from one task to the next task, and each task has an attribute representation to be stored or not.

An adjacency list is a chained storage structure of the graph. In the adjacency list, a singly linked list is established for each vertex in the graph, and nodes in the ith singly linked list represent edges attached to the vertex vi. Each node consists of 3 domains: the location of the vertex pointed to by the arc, the pointer to the next arc, and the pointer to the information associated with the arc. Each linked list has a head node. These header nodes are linked in a chain or stored in a sequential structure.

For example, the directed acyclic graph may be as shown in fig. 2, fig. 3 (the storage cost is not shown in fig. 3, and fig. 3 additionally shows whether the task is stored or not for the convenience of understanding of other contents of the embodiment), and fig. 5 (the storage cost is not shown in fig. 5). It can be seen that the vertexes in the directed acyclic graph correspond to the tasks in the scientific workflow respectively, the values corresponding to the vertexes are the storage cost of the tasks corresponding to the vertexes, the connection line and the line direction between the vertexes are consistent with the association relationship between the tasks, and the values on the direction line are consistent with the association relationship and the calculation cost of the tasks at the two ends of the line.

For example, a scientific workflow includes 7 tasks a0-a6, which is a directed acyclic graph as shown in FIG. 3. From the directed acyclic graph shown in fig. 3, an adjacency list shown in fig. 4, a calculation cost matrix shown in the following table 1, and a storage cost array shown in the following table 2 can be obtained.

TABLE 1

a0

a1

a2

a3

a4

a5

a6

a0

∞

1

3

2

∞

a1

∞

1

2

∞

a2

∞

1

∞

a3

∞

1

∞

a4

∞

1

a5

∞

3

a6

∞

TABLE 2

a0	a1	a2	a3	a4	a5	a6
								1	2	1	3	2	5	1

Referring to fig. 3 and 4, the adjacency list includes the singly linked lists of each vertex in the directed acyclic graph, as shown in v1-v7, the singly linked list of any vertex includes the vertex and each vertex pointed to by the vertex, and the vertex is used as the starting vertex.

Referring to fig. 3 and table 1, the computation cost matrix includes values corresponding to each direction line, vertices at two ends, and directions in the directed acyclic graph.

Referring to fig. 3 and table 2, the stored cost array includes values corresponding to each vertex in the directed acyclic graph.

According to the characteristics of the scientific workflow task calculation-storage problem model, under two constraints of calculation and storage, a discrete particle swarm algorithm with natural number 0 and natural number 1 codes can be designed, the corresponding relation between the particles and the actual problem is established, and the new positions of the particles can be the interaction result of the speed, the individual extreme value and the global extreme value of the particles. And optimally defining a position and speed updating formula of the particle swarm algorithm according to the practical characteristics of cloud computing storage constraints.

Therefore, in an embodiment of the present invention, the performing, by using a particle swarm algorithm, one iteration update on the current latest value of each particle respectively includes: respectively carrying out one-time iteration updating on the current latest value of each particle by using the following formula (4);

In detail, in the formula (4), F₁(X_i(t)) may be effected by taking into account the effect of the velocity of the particle itself on its change in position.

In detail, the location update formula consists of three parts, let W_i，M_iAre temporary variables.

(1)

This is the inertial part of the particle, representing the particle's thought about its own flight velocity. Wherein

The speed of the particles is expressed by the following method: an interval [0,1 ] is generated by rand ()]If r < ω, the particle will be subjected to a displacement operation.

(2)

Representing particles according to individual extrema p_i(t) adjusting its position, F₂(M_i(t),p_i(t)) the operation is as follows: from M_i(t) extracting a segment of the extract and placing the segment in p_i(t) before (or after), and then deleting p_i(t) preceding (or following) data. The operation is simple but very effective, a new individual can be obtained, and the individual retains the characteristics of the individual and learns the local optimal position.

(3)

Denotes the particles according to p_g(t) adjusting the position. F₃(W_i(t),p_i(t)) the operation is as follows: from W_i(t) extracting a segment of the extract and placing the segment in p_g(t) before (or after), and then deleting p_g(t) preceding (or following) data. The operation is simple but very effective, a new individual can be obtained, and the individual retains the characteristics of the individual and learns the local optimal position.

In the population iteration process, X_i(t)、p_i(t)、p_gAnd (t) continuously updating, and finally outputting the global optimal solution.

In the embodiment of the invention, a feedback mechanism is introduced into the scientific workflow task processing method based on particle swarm optimization for optimization, and parameters in the algorithm are predicted, fed back and corrected, so that the self-balance of the algorithm can be realized, and the cloud computing platform can quickly and efficiently find the optimal task processing scheme.

According to the above formula (4), the cognition factor c₁And social coefficient c₂Preferably, it can be set to c empirically₁＝c ₂2. Thus, in one embodiment of the present invention, preferably c₁＝c₂＝2。

According to the above formula (4), when the inertia coefficient ω is between [0.4-1.2], the DPSO Algorithm (Discrete Particle Swarm Optimization Algorithm) has a faster convergence speed, and when ω >1.2, it is liable to fall into a local extremum. Therefore, a mode that the inertia coefficient fluctuates between 0.4 and 1.4 can be adopted, omega can be made to have a larger value firstly, so that the DPSO algorithm can search a larger area, and as the search process goes deep, omega is gradually reduced, and fine search is started.

Therefore, in an embodiment of the present invention, preferably, ω is a value between [0.4 and 1.4], and the value of ω gradually decreases with the increase of the number of iterations, the value of ω is not less than 1.2 at the previous N iterations, and N is a preset value. For example, N may be 0.05 times, 0.1 times, 0.2 times, etc. the preset maximum number of iterations.

In an embodiment of the present invention, after the determining a minimum value of all the first values calculated at the current time, the method further includes: and judging whether the minimum value determined for the latest continuous M times is unchanged, wherein M is a preset numerical value, and if yes, replacing part of the at least two particles with new particles with the same amount.

In detail, no change can be expressed as equal numerical values, or as numerical differences not greater than a preset difference threshold.

In one embodiment of the present invention, preferably, M is 5, and the number of particles of the new particles is 30% of the number of particles of the at least two particles.

Based on the above, in an embodiment of the present invention, the conventional particle swarm optimization can be optimized through the above two measures, so as to avoid the algorithm from falling into local optimization.

Specifically, in the first measure, a strategy of dynamically changing ω is adopted, and ω has a larger value of 1.4, so that the DPSO algorithm can search a larger area, and as the search process goes deep, ω gradually decreases, and starts a fine search, thereby preventing the algorithm from falling into local optimum. In the second measure, when the optimal solution is iterated for 5 times and is not changed, the particle swarm can be interfered, and 30% of particles are replaced by newly generated particles, so that the particle swarm can jump out of the local optimal limit.

In summary, the embodiment of the present invention provides an implementation of an optimized particle swarm algorithm, improves, expands and perfects the particle swarm algorithm, and solves the limitation of the traditional particle swarm algorithm in solving the discrete problem, so that the particle swarm algorithm is more suitable for the scientific workflow of the mutual constraint and connection of tasks.

In addition, on a cloud computing platform, in an existing static scheduling strategy, an optimized task processing algorithm is designed to be an NP (Non-deterministic Polynomial) complete problem, and the particle swarm optimization algorithm provided by the embodiment of the invention can effectively solve the NP problem.

In one embodiment of the present invention, the calculating the first value of the utility function of each particle separately includes: taking each particle as a current particle, and executing the following steps:

For example, referring to fig. 3, when calculating each task in the scientific workflow, it is assumed that a5 is currently calculated, i.e., the first task in a1 is a 5. Since a5 is not stored, in A3, a1, a2 and A3 are all the second tasks, a5 can be calculated according to a1, a2 and A3, the corresponding calculation cost is 2+2+ 1-5, and a4 is executed. In a4, since a1 does not store and cannot directly use the task processing result, a1 still needs to be calculated, i.e., a1 can be calculated according to a0, and the corresponding calculation cost is 1, a1 is used as the first task and a3 is executed. After that, when a3 is executed again, the first task is a1, and the second task is a 0.

In the embodiment of the present invention, the calculation of the fee involved in the step a2 and the step A3 performed directly after the step a1 is performed in the second term of the formula (3), and the calculation of the fee involved in the step A3 performed after the step a4 is performed in the first term of the formula (3).

In the embodiment of the invention, if the task is storage, the situation that one task N is executed for multiple times is considered, the task is only needed to be calculated and then stored when being executed for the first time, and the stored result is directly used when any one time is executed later, so the calculation cost in the first time of execution can be ignored. Therefore, if the task is storage, the corresponding storage cost is added, and the corresponding calculation cost is not required to be added.

In addition, since the task processing result is reusable only if stored, there are n tasks that need to be used for calculation, and the task is executed n times correspondingly when corresponding to the calculation, that is, when calculating the utility function, the corresponding calculation cost of the n tasks is included.

In summary, the embodiment of the present invention provides a resource allocation and price adjustment strategy based on a particle swarm optimization algorithm, aiming at the characteristics of computation and storage constraints in a cloud platform workflow. The embodiment of the invention represents the scientific workflow by the directed acyclic graph, constructs the utility function reflecting the minimum cost according to the characteristics of the tasks and the correlation between the tasks, realizes the minimum strategy of realizing the cost of the cloud platform under two constraint conditions of calculation and storage, and can conveniently and reasonably utilize resources and improve the utilization rate of the resources.

The scientific workflow task processing method based on the particle swarm optimization algorithm provided by the embodiment of the invention at least has the following characteristics: (1) the particle swarm optimization algorithm can effectively solve the NP problem; (2) the processing of the tasks is divided into two types of storage or calculation, the cost of the two methods is compared and then the two methods are processed, so that the cost generated in the task processing process is the lowest, and the resources are saved; (3) the correlation between tasks can be continuously maintained without worrying about errors in calculation results due to the constraint relationship between tasks.

Based on the above, the task processing method may also be verified through test data, and the related verification process may include the following:

referring to fig. 5, the calculation cost of the adjacent node is already given, and the storage cost of each task can be randomly generated by using a random function. The implementation code for randomly generating the storage fee may include the following:

in this section, the random function is: rand ()% 50+ 10; the specific implementation is as described above.

In the C language compiling environment, the generated random numbers are as follows:

13 17 13 11 15 10 13 47 41 53Press any key to continue_

these random numbers are the storage costs for each task that are randomly generated, and the storage costs for each particle are stored in table 3 below.

TABLE 3

Task numbering	1	2	3	4	5	6	7	8	9	10
											Storage cost of tasks	13	17	13	11	15	10	13	47	41	53

The computation storage input interface may be as follows:

please enter the number of vertices and edges of the graph:

10 number of vertices and 15 number of edges

Inputting the serial numbers and the corresponding weights of two adjacent points:

the storage cost is input, and 10 tasks are performed:

13.000000 17.000000 13.000000 11.000000 15.000000 10.000000 13.000000 47.000000 41.000000 53.000000

the above-mentioned calculation storage input interface represents the number of fixed points, number of edges, serial number and weight of adjacent nodes and storage cost of every node.

The optimal solution criterion should be set to 208 and the number of iterations should not be more than 102.

Based on the above, the above steps 102 to 106 are performed, and the verification result is that after a plurality of iterations not exceeding the preset maximum iteration number, the obtained optimal value is equal to the set optimal solution.

Specifically, the initial particle swarm distribution of the binary algorithm is shown in fig. 6, and the particle swarm distribution after the binary algorithm is iterated for multiple times is shown in fig. 7. As can be seen from fig. 6 and 7, the binary particle swarm optimization has the characteristics of the particle swarm optimization, the solution will advance to the local optimal solution and the global optimal solution, and after a plurality of iterations, a plurality of particles will exhibit regular characteristics. In addition, the particle swarm optimization algorithm provided by the embodiment of the invention enables the particle precision to be high and the convergence to be fast in the workflow processing.

Therefore, the effectiveness and feasibility of the particle swarm optimization algorithm for realizing the scientific workflow task processing in the cloud computing environment are verified through the simulation experiment.

In summary, the embodiment of the present invention researches the implementation problem of the scientific workflow planning method under the cloud computing platform, and provides an improved particle swarm optimization, so as to provide a cloud platform workflow task result persistence strategy based on the particle swarm optimization, and the method has at least the following advantages:

(1) the improved particle swarm optimization can quickly and stably find the optimal distribution scheme, effectively improves the processing speed of the cloud computing platform under the constraint conditions of computing and storage, and reduces the processing cost.

(2) On the basis of cloud computing, the advantages of a cloud computing platform for deploying scientific workflows are analyzed as follows: the scientific workflow can improve the efficiency and the safety of cloud computing and save the operation and storage cost; cloud computing also provides an inexpensive infrastructure for deployment of scientific workflows.

(3) The application of the particle swarm algorithm in workflow deployment is realized aiming at the characteristics of calculation and storage constraints in scientific workflows, and the realization of the optimized particle swarm algorithm is provided. The particle swarm optimization is improved, expanded and perfected, the limitation of the traditional particle swarm optimization in solving the discrete problem is solved, and the particle swarm optimization is more suitable for the scientific workflow of the mutual constraint connection of tasks.

(4) The two measures for improving the performance of the particle swarm algorithm are realized: the value of the inertia weight is dynamically changed in the iterative process, so that the optimal solution can be rapidly and intelligently sought. Secondly, when the solution is iterated for multiple times, if twenty iterations still do not change, 30% of particles are replaced by randomly generated particles, and therefore the particles can jump out of early-maturing error zones.

(5) The effectiveness of the obtained algorithm is verified by using a simulation experiment, namely the particle swarm optimization algorithm has good convergence and rapidity when solving the discrete problem, is not easy to fall into local optimum when being applied to solving the workflow problem, and can find the optimum solution by continuously iterating and recurrently solving the suboptimal solution for multiple times.

As shown in fig. 8, an embodiment of the present invention provides a scientific workflow task management apparatus for performing any one of the above-mentioned scientific workflow task management methods, where the apparatus may include:

a directed acyclic graph determining unit 801, configured to determine a directed acyclic graph of a scientific workflow, where for any task in the scientific workflow, the directed acyclic graph shows: the task and each other task in the scientific workflow are used for calculating the incidence relation between the task and each other task, the storage cost corresponding to the task, and each calculation cost corresponding to the task and each other task respectively;

a particle setting unit 802, configured to set at least two particles, where a value of any particle includes a value of a decision variable of each task, and the value of the decision variable corresponds to storage or calculation;

a utility function calculating unit 803, configured to calculate a first value of a utility function of each particle according to the association relationship, the storage cost, and the calculation cost in the directed acyclic graph, and according to the current latest value of each particle, and determine a minimum value of all the first values calculated at the current time;

an optimal value determining unit 804, configured to determine, according to each of the currently determined minimum values, a current minimum value change trend according to a time sequence, and determine whether an optimal value exists in the minimum value change trends, where the optimal value is a minimum value in the minimum value change trends, and the optimal value is not greater than each minimum value in the minimum value change trends, if yes, trigger the task management unit 806, and otherwise trigger the iterative updating unit 805;

the iterative updating unit 805 is configured to perform iterative updating on the current latest value of each particle by using a particle swarm algorithm, and trigger the utility function calculating unit 803, when triggered by the optimal value determining unit;

the task management unit 806 is configured to be triggered by the optimal value determining unit, and manage each task according to a value of a particle used when the optimal value is calculated.

Because the information interaction, execution process, and other contents between the units in the device are based on the same concept as the method embodiment of the present invention, specific contents may refer to the description in the method embodiment of the present invention, and are not described herein again.

Embodiments of the present invention also provide a computer-readable medium storing instructions for causing a computer to perform a scientific workflow task management method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

In summary, the embodiments of the present invention have at least the following advantages:

1. different from the existing static scheduling strategy, the static scheduling strategy only calculates or stores tasks at random, so that dynamic adjustment cannot be well performed according to the requirement of cost minimization, adaptability is lacked, and different calculation cost and storage cost are not considered. In addition, the embodiment of the invention can realize the optimization and low consumption of task processing, thereby better saving the storage space and reducing the storage space of the cloud server.

2. Based on the task data evolution problem of the cloud computing platform and the characteristics of computing and storage constraints in scientific workflows, the embodiment of the invention provides a cloud platform workflow task result persistence strategy based on a particle swarm algorithm, and the application of the particle swarm algorithm in workflow deployment is realized. The embodiment of the invention specifically aims at the problem of the combination optimization of the discrete variable space, utilizes the storage structure and the characteristics of the directed acyclic graph to construct a utility function which reflects the satisfaction degree of the obtained task processing and accords with the economic profit of the user, and utilizes a mutual constraint mode among tasks to construct a method of scientific workflow so as to achieve the purposes of reasonably utilizing resources and improving the utilization rate of the resources.

3. In the embodiment of the invention, a feedback mechanism is introduced into the scientific workflow task processing method based on particle swarm optimization for optimization, and parameters in the algorithm are predicted, fed back and corrected, so that the self-balance of the algorithm can be realized, and the cloud computing platform can quickly and efficiently find the optimal task processing scheme.

4. The embodiment of the invention provides the realization of the optimized particle swarm algorithm, improves, expands and perfects the particle swarm algorithm, solves the limitation of the traditional particle swarm algorithm in solving the discrete problem, and enables the particle swarm algorithm to be more suitable for the scientific workflow of mutual constraint and connection of tasks.

5. Aiming at the characteristics of calculation and storage constraints in the cloud platform workflow, the embodiment of the invention provides a resource allocation and price adjustment strategy based on a particle swarm optimization algorithm. The embodiment of the invention represents the scientific workflow by the directed acyclic graph, constructs the utility function reflecting the minimum cost according to the characteristics of the tasks and the correlation between the tasks, realizes the minimum strategy of realizing the cost of the cloud platform under two constraint conditions of calculation and storage, and can conveniently and reasonably utilize resources and improve the utilization rate of the resources.

It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.

In the above embodiments, the hardware unit may be implemented mechanically or electrically. For example, a hardware element may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware elements may also comprise programmable logic or circuitry, such as a general purpose processor or other programmable processor, that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. The scientific workflow task management method is characterized by comprising the following steps:

2. The method of claim 1,

each vertex in the directed acyclic graph corresponds to each task, a value corresponding to any vertex is storage cost corresponding to the task corresponding to the vertex, when one task is used for calculating another task, the vertex corresponding to the task points to the vertex corresponding to the other task, and a value corresponding to a corresponding pointing line is calculation cost corresponding to the task and the other task;

3. The method of claim 1,

performing one iteration update on the current latest value of each particle by using a particle swarm algorithm, wherein the iteration update comprises the following steps: respectively carrying out one-time iterative updating on the current latest value of each particle by using a formula I;

the first formula comprises:

4. The method of claim 3,

c₁＝c₂＝2；

and/or the presence of a gas in the gas,

and omega is taken as a value between [0.4 and 1.4], the value of omega is gradually reduced along with the increase of the iteration times, the value of omega is not less than 1.2 in the previous N iterations, and N is a preset value.

5. The method of claim 1,

after the determining the minimum value of all the first values calculated at the current time, further comprising: and judging whether the minimum value determined for the latest continuous M times is unchanged, wherein M is a preset numerical value, and if yes, replacing part of the at least two particles with new particles with the same amount.

6. The method of claim 5,

and M is 5, and the number of the new particles accounts for 30% of the number of the at least two particles.

7. The method according to any one of claims 1 to 6,

said separately calculating a first value of a utility function for each of said particles, comprising: taking each particle as a current particle, and executing the following steps:

8. Scientific workflow task management apparatus for executing the scientific workflow task management method according to any one of claims 1 to 7, the apparatus comprising:

9. A storage controller, comprising: at least one memory and at least one processor;

the at least one memory to store a machine readable program;

the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1 to 7.

10. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 7.