CN108427602B

CN108427602B - Distributed computing task cooperative scheduling method and device

Info

Publication number: CN108427602B
Application number: CN201710078384.2A
Authority: CN
Inventors: 朱力鹏; 胡斌; 饶玮; 黄太贵; 李端超; 王松; 靳丹; 马志程
Original assignee: State Grid Corp of China SGCC; State Grid Gansu Electric Power Co Ltd; State Grid Anhui Electric Power Co Ltd; Global Energy Interconnection Research Institute
Current assignee: State Grid Corp of China SGCC; State Grid Gansu Electric Power Co Ltd; State Grid Anhui Electric Power Co Ltd; Global Energy Interconnection Research Institute
Priority date: 2017-02-14
Filing date: 2017-02-14
Publication date: 2021-10-29
Anticipated expiration: 2037-02-14
Also published as: CN108427602A

Abstract

The invention relates to a distributed computing task cooperative scheduling method and a distributed computing task cooperative scheduling device, wherein the method comprises the following steps: determining expected completion time of each task on each resource, and establishing an expected completion time matrix; determining the number of tasks to be processed of each resource by using a gene expression programming algorithm; determining the evaluation value of each task according to the urgency and the importance of each task, and sequencing the evaluation values of each task from large to small to obtain a task sequence; allocating tasks in the task sequence to each resource in sequence by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of tasks to be processed of each resource; the technical scheme provided by the invention researches an effective task scheduling method, and fully utilizes effective resources to complete submitted tasks in the shortest time.

Description

Distributed computing task cooperative scheduling method and device

Technical Field

The invention relates to the field of distributed computing software, in particular to a distributed computing task cooperative scheduling method and device.

Background

Cooperative scheduling is an important technique for resource allocation in a distributed computing environment, and is used to allocate multiple tasks submitted by users to multiple resources for simultaneous processing, so as to meet specific performance requirements. The effective processing of user tasks is the centralized embodiment of technology fusion type development and application intelligent concept under new situation, and in a distributed system, the task scheduling, the balance of resource computing capacity and the efficiency of computing nodes are main indexes for measuring the quality of an algorithm. How to share resources and collaboratively solve among a plurality of dynamically-changing virtual mechanisms is a big problem in task collaborative scheduling at present, the performance of tasks in a distributed computing environment can be improved through collaborative scheduling, and the collaborative scheduling method is widely applied to the fields of virtual reality, virtual instruments, large-scale scientific computing and the like.

At present, the research of task scheduling strategies at home and abroad mainly comprises two types: application-level task scheduling and job-level task scheduling. Application-level task scheduling evolves from a task graph (DAG) -based scheduling problem in a traditional resource environment, and application performance is improved by abstracting compute-intensive applications into coarse-grained constraint task graphs and mapping the coarse-grained constraint task graphs to network computing resources by adopting an economic model and a mathematical planning strategy. Due to the defects of high delay, low bandwidth and the like of the current network environment, the research work in the aspect is only limited to parameter scanning and loosely-coupled iterative application. The heuristic task scheduling based on fuzzy clustering, the task scheduling at the job level based on the cooperative scheduling of the particle swarm algorithm and the like are provided for the cooperative scheduling problem of tasks in China, and the problem of performance optimization of research objects during cooperative operation is an extension of high-performance multiple resource scheduling research in a network computing environment. The task collaborative scheduling of a plurality of resources under the network environment is an NP-hard problem, and an optimal scheduling method in polynomial time is difficult to obtain. The main factors influencing the task execution time include the diversity of tasks, the difference of each resource, and the like.

Disclosure of Invention

The invention provides a distributed computing task cooperative scheduling method and a distributed computing task cooperative scheduling device, and aims to research an effective task scheduling method and fully utilize effective resources to complete submitted tasks in the shortest time.

The purpose of the invention is realized by adopting the following technical scheme:

the improvement of a method for collaborative scheduling of distributed computing tasks, comprising:

determining expected completion time of each task on each resource, and establishing an expected completion time matrix;

determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;

determining the evaluation value of each task according to the urgency and the importance of each task, and sequencing the evaluation values of each task from large to small to obtain a task sequence;

and sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed of each resource.

Preferably, the determining the expected completion time of each task on each resource and establishing an expected completion time matrix includes:

recording the number of tasks as n and the number of resources as m, and constructing an m multiplied by n expected completion time matrix E according to the following formula_m×n：

In the above formula, e_ijThe expected completion time on the jth resource for the ith task.

Preferably, the determining the number of tasks required to be processed by each resource by using the gene expression programming algorithm includes:

a. initializing a population, wherein the population consists of m resources and the number of tasks to be processed by each resource, the head length of a chromosome in the population is p, the tail length d is p (l-1) +1, and l is the maximum operand;

b. selecting the optimal individual in the population according to the fitness function value of the population individual, reserving the optimal individual, and carrying out gene crossing, gene mutation and recombination on the current sub-population individual to obtain a new population, wherein the fitness function of the population individual is determined according to the following formula:

in the above formula, M is the range value of the number of the resource selection tasks, C_(i,j)Returning a value, T, for the fitness of task i to resource j_jSelecting a target value for the number of tasks, f, for a resource j_iScheduling the task i to the fitness value on the resource j;

c. if the genetic algebra T satisfies T & gt T, outputting the new population, decoding the new population, obtaining the number of tasks required to be processed by each resource when the target function value is minimum, and if the genetic algebra T does not satisfy T & gt T, making T as T +1 and returning to the step b, wherein the target function is as follows:

in the above formula, h_jNumber of tasks to be processed for resource j, e_ijAnd n is the total number of tasks and m is the total number of resources.

Preferably, the determining the evaluation value of each task according to the urgency and the importance of each task, and sorting the evaluation values of each task from large to small to obtain the task sequence includes:

the evaluation value G of the task i at the current time t is determined according to the following formula_i(t)：

G_i(t)＝p₁U_i(t)+p₂I_i

In the above formula, U_i(t) the urgency of task I at the current time t, I_iImportance of task i, p₁As an urgency weight, p₂As importance weight, p₁+p₂＝1。

Further, the urgency U of the task i at the current time t is determined as follows_i(t)：

U_i(t)＝t_i1/(t_i2+t_i3-t)

In the above formula, t_i1Estimating a completion time, t, for a task_i2Is the allowed completion time for the task; t is t_i3Is the time-out time that the task is allowed.

Further, the importance I of task I is determined as follows_i：

I_i＝m₁H_i+m₂N_i

In the above formula, H_iIs the relational importance of task i, N_iM is the time importance of task i₁Is a relationship importance weight, m₂As a temporal importance weight, m₁+m₂＝1；

Wherein the relationship importance H of task i is determined according to the following formula_i：

In the above formula, M_ikIs the dependency of the relationship between task i and task k, if M_ikIf 0, task i is independent of task k, and if M is equal to M_ik1, the execution process of the task i and the execution process of the task k are interdependent, and n is the total number of the tasks;

determining the temporal importance N of task i as follows_i：

In the above formula, t_i1Is the estimated completion time, t, of task i_k1Is the estimated completion time for task k.

Preferably, the sequentially allocating the tasks in the task sequence to the resources by using a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed by the resources includes:

a. deleting the distributed tasks from the task sequence, and deleting the resources, the number of which is required by the tasks distributed in the resource set to meet the number of the tasks needing to be processed, from the resource set;

b. and c, selecting the top-ranked task in the task sequence, distributing the task to the resource with the minimum expected completion time for completing the task in the resource set, and returning to the step a until the task sequence is empty.

In an apparatus for collaborative scheduling of distributed computing tasks, the improvement comprising:

the first determining module is used for determining the expected completion time of each task on each resource and establishing an expected completion time matrix;

the second determining module is used for determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;

the evaluation module is used for determining the evaluation value of each task according to the urgency and the importance of each task and sequencing the evaluation values of each task from large to small to obtain a task sequence;

and the distribution module is used for sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks needing to be processed by each resource.

Preferably, the first determining module includes:

Preferably, the second determining module includes:

Preferably, the evaluation module includes:

G_i(t)＝p₁U_i(t)+p₂I_i

Preferably, the distribution module includes:

The invention has the beneficial effects that:

the technical scheme provided by the invention aims at processing the total time of the tasks, allocates the tasks with high urgency and importance to the grid resource with the shortest processing time according to the current available resources, sorts the tasks submitted by the user according to the importance and urgency of the tasks, preferentially processes the tasks with high urgency and importance, preferentially processes the tasks to high-quality resources, meets the user requirements as far as possible in limited cyclic selection, has good expandability and flexibility, can fully utilize the current available resources, can improve the service quality, and solves the problem of collaborative optimization of the distributed computing tasks facing to the big data in the resource environment.

Drawings

FIG. 1 is a flow chart of a method for collaborative scheduling of distributed computing tasks in accordance with the present invention;

fig. 2 is a schematic structural diagram of a cooperative scheduling apparatus for distributed computing tasks according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The cooperative scheduling method of the distributed computing task provided by the invention needs to consider two problems:

(1) how to sort the tasks submitted by the users according to the importance and the urgency of the tasks, so that the tasks with high urgency and the importance are distributed to the optimal resources to be processed, data transmission and task execution are carried out synchronously, and the waiting time of the task execution on the data is reduced.

(2) Due to the difference of available resources, how to find a method for arranging tasks to resources in proportion according to resource attributes and reasonably arranging the number of tasks to resources according to the computing power, the storage space and the like of each resource is provided.

The cooperative scheduling method for distributed computing tasks, provided by the invention, uses the evaluation function values of the tasks to sequence the tasks, preferentially processes the tasks with high evaluation function values of the tasks, and uses a gene expression programming algorithm to determine the number of the processing tasks of each resource, as shown in fig. 1, and comprises the following steps:

101. determining expected completion time of each task on each resource, and establishing an expected completion time matrix;

102. determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;

103. determining the evaluation value of each task according to the urgency and the importance of each task, and sequencing the evaluation values of each task from large to small to obtain a task sequence;

104. and sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed of each resource.

Specifically, the step 101 includes:

the number of the tasks is recorded as n,with m resources, an m × n expected completion time matrix E is constructed as follows_m×n：

After the expected completion time matrix is established, the number of processing tasks for each resource is determined, and therefore, in step 102, the objective function is programmed using a genetic expression programming algorithm

Approximating the optimal value so as to obtain the number of each resource processing task, which specifically comprises the following steps:

wherein, p is 12 in the application;

the method and the device adopt a roulette algorithm to select the next generation of individuals, the individuals are selected according to the goodness or the weakness according to the fitness value, and the probability that the individuals with higher fitness values are selected is higher.

After determining the number of tasks to be processed by each resource, the tasks need to be sorted according to the evaluation value of each task, so step 103 includes:

G_i(t)＝p₁U_i(t)+p₂I_i

However, since the longer the estimated completion time is, the less the time remaining from the deadline is, and the higher the urgency level of the task is, the urgency level U of the task i at the current time t is determined by the following equation_i(t)：

U_i(t)＝t_i1/(t_i2+t_i3-t)

The importance of the tasks is composed of 2 dimensions of relation and time, and the importance of the relation reflects the influence of the execution time of a single task on the whole system. If 1 task is related to other tasks, i.e. task T_jIs performed in dependence on T_iWhen, consider task T_iThe importance of the relationship is higher; if a task requires a longer execution time, the task is considered to have a higher time importance for the whole system. Therefore, the importance I of task I is determined as follows_i：

I_i＝m₁H_i+m₂N_i

determining the temporal importance N of task i as follows_i：

Finally, in step 104, sequentially allocating the tasks in the task sequence to the resources by using a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed by the resources, including:

The present invention also provides a cooperative scheduling apparatus for distributed computing tasks, as shown in fig. 2, the apparatus includes:

The first determining module includes:

The second determining module includes:

The evaluation module comprises:

G_i(t)＝p₁U_i(t)+p₂I_i

The distribution module includes:

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A method for collaborative scheduling of distributed computing tasks, the method comprising:

allocating tasks in the task sequence to each resource in sequence by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of tasks to be processed of each resource;

the method for determining the evaluation value of each task according to the urgency and the importance of each task and obtaining the task sequence by sequencing the evaluation values of each task from large to small comprises the following steps:

G_i(t)＝p₁U_i(t)+p₂I_i

In the above formula, U_i(t) the urgency of task I at the current time t, I_iImportance of task i, p₁As an urgency weight, p₂As importance weight, p₁+p₂＝1；

The urgency U of the task i at the current time t is determined according to the following formula_i(t)：

U_i(t)＝t_i1/(t_i2+t_i3-t)

In the above formula, t_i1Estimating a completion time, t, for a task_i2Is the allowed completion time for the task; t is t_i3A timeout time allowed for the task;

the importance I of task I is determined as follows_i：

I_i＝m₁H_i+m₂N_i

determining the temporal importance N of task i as follows_i：

In the above formula, t_i1Is the estimated completion time, t, of task i_k1An estimated completion time for task k;

the method for determining the number of tasks needing to be processed of each resource by using the gene expression programming algorithm comprises the following steps:

c. if the genetic algebra T satisfies T > T, outputting the new population, decoding the new population, acquiring the number of tasks required to be processed by each resource when the objective function value is minimum, and if the genetic algebra T does not satisfy T > T, making T equal to T +1 and returning to the step b, wherein the objective function is as follows:

2. The method of claim 1, wherein determining expected completion times for tasks on resources and building an expected completion time matrix comprises:

recording the number of tasks as n and the number of resources as m, and constructing an n multiplied by m expected completion time matrix E according to the following formula_n×m：

3. The method of claim 1, wherein said sequentially assigning tasks in the task sequence to each resource using a Min-Min algorithm based on the expected completion time matrix and the number of tasks required to be processed by each resource comprises:

4. An apparatus for implementing a method of co-scheduling of distributed computing tasks according to any of claims 1-3, the apparatus comprising:

the allocation module is used for sequentially allocating the tasks in the task sequence to the resources by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed of the resources;

the second determining module includes:

5. The apparatus of claim 4, wherein the first determining module comprises:

6. The apparatus of claim 4, wherein the evaluation module comprises:

G_i(t)＝p₁U_i(t)+p₂I_i

7. The apparatus of claim 4, wherein the assignment module comprises: