Disclosure of Invention
The invention provides a distributed computing task cooperative scheduling method and a distributed computing task cooperative scheduling device, and aims to research an effective task scheduling method and fully utilize effective resources to complete submitted tasks in the shortest time.
The purpose of the invention is realized by adopting the following technical scheme:
the improvement of a method for collaborative scheduling of distributed computing tasks, comprising:
determining expected completion time of each task on each resource, and establishing an expected completion time matrix;
determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;
determining the evaluation value of each task according to the urgency and the importance of each task, and sequencing the evaluation values of each task from large to small to obtain a task sequence;
and sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed of each resource.
Preferably, the determining the expected completion time of each task on each resource and establishing an expected completion time matrix includes:
recording the number of tasks as n and the number of resources as m, and constructing an m multiplied by n expected completion time matrix E according to the following formulam×n:
In the above formula, eijThe expected completion time on the jth resource for the ith task.
Preferably, the determining the number of tasks required to be processed by each resource by using the gene expression programming algorithm includes:
a. initializing a population, wherein the population consists of m resources and the number of tasks to be processed by each resource, the head length of a chromosome in the population is p, the tail length d is p (l-1) +1, and l is the maximum operand;
b. selecting the optimal individual in the population according to the fitness function value of the population individual, reserving the optimal individual, and carrying out gene crossing, gene mutation and recombination on the current sub-population individual to obtain a new population, wherein the fitness function of the population individual is determined according to the following formula:
in the above formula, M is the range value of the number of the resource selection tasks, C(i,j)Returning a value, T, for the fitness of task i to resource jjSelecting a target value for the number of tasks, f, for a resource jiScheduling the task i to the fitness value on the resource j;
c. if the genetic algebra T satisfies T & gt T, outputting the new population, decoding the new population, obtaining the number of tasks required to be processed by each resource when the target function value is minimum, and if the genetic algebra T does not satisfy T & gt T, making T as T +1 and returning to the step b, wherein the target function is as follows:
in the above formula, hjNumber of tasks to be processed for resource j, eijAnd n is the total number of tasks and m is the total number of resources.
Preferably, the determining the evaluation value of each task according to the urgency and the importance of each task, and sorting the evaluation values of each task from large to small to obtain the task sequence includes:
the evaluation value G of the task i at the current time t is determined according to the following formulai(t):
Gi(t)=p1Ui(t)+p2Ii
In the above formula, Ui(t) the urgency of task I at the current time t, IiImportance of task i, p1As an urgency weight, p2As importance weight, p1+p2=1。
Further, the urgency U of the task i at the current time t is determined as followsi(t):
Ui(t)=ti1/(ti2+ti3-t)
In the above formula, ti1Estimating a completion time, t, for a taski2Is the allowed completion time for the task; t is ti3Is the time-out time that the task is allowed.
Further, the importance I of task I is determined as followsi:
Ii=m1Hi+m2Ni
In the above formula, HiIs the relational importance of task i, NiM is the time importance of task i1Is a relationship importance weight, m2As a temporal importance weight, m1+m2=1;
Wherein the relationship importance H of task i is determined according to the following formulai:
In the above formula, MikIs the dependency of the relationship between task i and task k, if MikIf 0, task i is independent of task k, and if M is equal to Mik1, the execution process of the task i and the execution process of the task k are interdependent, and n is the total number of the tasks;
determining the temporal importance N of task i as followsi:
In the above formula, ti1Is the estimated completion time, t, of task ik1Is the estimated completion time for task k.
Preferably, the sequentially allocating the tasks in the task sequence to the resources by using a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed by the resources includes:
a. deleting the distributed tasks from the task sequence, and deleting the resources, the number of which is required by the tasks distributed in the resource set to meet the number of the tasks needing to be processed, from the resource set;
b. and c, selecting the top-ranked task in the task sequence, distributing the task to the resource with the minimum expected completion time for completing the task in the resource set, and returning to the step a until the task sequence is empty.
In an apparatus for collaborative scheduling of distributed computing tasks, the improvement comprising:
the first determining module is used for determining the expected completion time of each task on each resource and establishing an expected completion time matrix;
the second determining module is used for determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;
the evaluation module is used for determining the evaluation value of each task according to the urgency and the importance of each task and sequencing the evaluation values of each task from large to small to obtain a task sequence;
and the distribution module is used for sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks needing to be processed by each resource.
Preferably, the first determining module includes:
recording the number of tasks as n and the number of resources as m, and constructing an m multiplied by n expected completion time matrix E according to the following formulam×n:
In the above formula, eijThe expected completion time on the jth resource for the ith task.
Preferably, the second determining module includes:
a. initializing a population, wherein the population consists of m resources and the number of tasks to be processed by each resource, the head length of a chromosome in the population is p, the tail length d is p (l-1) +1, and l is the maximum operand;
b. selecting the optimal individual in the population according to the fitness function value of the population individual, reserving the optimal individual, and carrying out gene crossing, gene mutation and recombination on the current sub-population individual to obtain a new population, wherein the fitness function of the population individual is determined according to the following formula:
in the above formula, M is the range value of the number of the resource selection tasks, C(i,j)Returning a value, T, for the fitness of task i to resource jjSelecting a target value for the number of tasks, f, for a resource jiScheduling the task i to the fitness value on the resource j;
c. if the genetic algebra T satisfies T & gt T, outputting the new population, decoding the new population, obtaining the number of tasks required to be processed by each resource when the target function value is minimum, and if the genetic algebra T does not satisfy T & gt T, making T as T +1 and returning to the step b, wherein the target function is as follows:
in the above formula, hjNumber of tasks to be processed for resource j, eijAnd n is the total number of tasks and m is the total number of resources.
Preferably, the evaluation module includes:
the evaluation value G of the task i at the current time t is determined according to the following formulai(t):
Gi(t)=p1Ui(t)+p2Ii
In the above formula, Ui(t) the urgency of task I at the current time t, IiImportance of task i, p1As an urgency weight, p2As importance weight, p1+p2=1。
Preferably, the distribution module includes:
a. deleting the distributed tasks from the task sequence, and deleting the resources, the number of which is required by the tasks distributed in the resource set to meet the number of the tasks needing to be processed, from the resource set;
b. and c, selecting the top-ranked task in the task sequence, distributing the task to the resource with the minimum expected completion time for completing the task in the resource set, and returning to the step a until the task sequence is empty.
The invention has the beneficial effects that:
the technical scheme provided by the invention aims at processing the total time of the tasks, allocates the tasks with high urgency and importance to the grid resource with the shortest processing time according to the current available resources, sorts the tasks submitted by the user according to the importance and urgency of the tasks, preferentially processes the tasks with high urgency and importance, preferentially processes the tasks to high-quality resources, meets the user requirements as far as possible in limited cyclic selection, has good expandability and flexibility, can fully utilize the current available resources, can improve the service quality, and solves the problem of collaborative optimization of the distributed computing tasks facing to the big data in the resource environment.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The cooperative scheduling method of the distributed computing task provided by the invention needs to consider two problems:
(1) how to sort the tasks submitted by the users according to the importance and the urgency of the tasks, so that the tasks with high urgency and the importance are distributed to the optimal resources to be processed, data transmission and task execution are carried out synchronously, and the waiting time of the task execution on the data is reduced.
(2) Due to the difference of available resources, how to find a method for arranging tasks to resources in proportion according to resource attributes and reasonably arranging the number of tasks to resources according to the computing power, the storage space and the like of each resource is provided.
The cooperative scheduling method for distributed computing tasks, provided by the invention, uses the evaluation function values of the tasks to sequence the tasks, preferentially processes the tasks with high evaluation function values of the tasks, and uses a gene expression programming algorithm to determine the number of the processing tasks of each resource, as shown in fig. 1, and comprises the following steps:
101. determining expected completion time of each task on each resource, and establishing an expected completion time matrix;
102. determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;
103. determining the evaluation value of each task according to the urgency and the importance of each task, and sequencing the evaluation values of each task from large to small to obtain a task sequence;
104. and sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed of each resource.
Specifically, the step 101 includes:
the number of the tasks is recorded as n,with m resources, an m × n expected completion time matrix E is constructed as followsm×n:
In the above formula, eijThe expected completion time on the jth resource for the ith task.
After the expected completion time matrix is established, the number of processing tasks for each resource is determined, and therefore, in
step 102, the objective function is programmed using a genetic expression programming algorithm
Approximating the optimal value so as to obtain the number of each resource processing task, which specifically comprises the following steps:
a. initializing a population, wherein the population consists of m resources and the number of tasks to be processed by each resource, the head length of a chromosome in the population is p, the tail length d is p (l-1) +1, and l is the maximum operand;
wherein, p is 12 in the application;
b. selecting the optimal individual in the population according to the fitness function value of the population individual, reserving the optimal individual, and carrying out gene crossing, gene mutation and recombination on the current sub-population individual to obtain a new population, wherein the fitness function of the population individual is determined according to the following formula:
in the above formula, M is the range value of the number of the resource selection tasks, C(i,j)Returning a value, T, for the fitness of task i to resource jjSelecting a target value for the number of tasks, f, for a resource jiScheduling the task i to the fitness value on the resource j;
the method and the device adopt a roulette algorithm to select the next generation of individuals, the individuals are selected according to the goodness or the weakness according to the fitness value, and the probability that the individuals with higher fitness values are selected is higher.
c. If the genetic algebra T satisfies T & gt T, outputting the new population, decoding the new population, obtaining the number of tasks required to be processed by each resource when the target function value is minimum, and if the genetic algebra T does not satisfy T & gt T, making T as T +1 and returning to the step b, wherein the target function is as follows:
in the above formula, hjNumber of tasks to be processed for resource j, eijAnd n is the total number of tasks and m is the total number of resources.
After determining the number of tasks to be processed by each resource, the tasks need to be sorted according to the evaluation value of each task, so step 103 includes:
the evaluation value G of the task i at the current time t is determined according to the following formulai(t):
Gi(t)=p1Ui(t)+p2Ii
In the above formula, Ui(t) the urgency of task I at the current time t, IiImportance of task i, p1As an urgency weight, p2As importance weight, p1+p2=1。
However, since the longer the estimated completion time is, the less the time remaining from the deadline is, and the higher the urgency level of the task is, the urgency level U of the task i at the current time t is determined by the following equationi(t):
Ui(t)=ti1/(ti2+ti3-t)
In the above formula, ti1Estimating a completion time, t, for a taski2Is the allowed completion time for the task; t is ti3Is the time-out time that the task is allowed.
The importance of the tasks is composed of 2 dimensions of relation and time, and the importance of the relation reflects the influence of the execution time of a single task on the whole system. If 1 task is related to other tasks, i.e. task TjIs performed in dependence on TiWhen, consider task TiThe importance of the relationship is higher; if a task requires a longer execution time, the task is considered to have a higher time importance for the whole system. Therefore, the importance I of task I is determined as followsi:
Ii=m1Hi+m2Ni
In the above formula, HiIs the relational importance of task i, NiM is the time importance of task i1Is a relationship importance weight, m2As a temporal importance weight, m1+m2=1;
Wherein the relationship importance H of task i is determined according to the following formulai:
In the above formula, MikIs the dependency of the relationship between task i and task k, if MikIf 0, task i is independent of task k, and if M is equal to Mik1, the execution process of the task i and the execution process of the task k are interdependent, and n is the total number of the tasks;
determining the temporal importance N of task i as followsi:
In the above formula, ti1Is the estimated completion time, t, of task ik1Is the estimated completion time for task k.
Finally, in step 104, sequentially allocating the tasks in the task sequence to the resources by using a Min-Min algorithm according to the expected completion time matrix and the number of the tasks to be processed by the resources, including:
a. deleting the distributed tasks from the task sequence, and deleting the resources, the number of which is required by the tasks distributed in the resource set to meet the number of the tasks needing to be processed, from the resource set;
b. and c, selecting the top-ranked task in the task sequence, distributing the task to the resource with the minimum expected completion time for completing the task in the resource set, and returning to the step a until the task sequence is empty.
The present invention also provides a cooperative scheduling apparatus for distributed computing tasks, as shown in fig. 2, the apparatus includes:
the first determining module is used for determining the expected completion time of each task on each resource and establishing an expected completion time matrix;
the second determining module is used for determining the number of tasks to be processed of each resource by using a gene expression programming algorithm;
the evaluation module is used for determining the evaluation value of each task according to the urgency and the importance of each task and sequencing the evaluation values of each task from large to small to obtain a task sequence;
and the distribution module is used for sequentially distributing the tasks in the task sequence to each resource by utilizing a Min-Min algorithm according to the expected completion time matrix and the number of the tasks needing to be processed by each resource.
The first determining module includes:
recording the number of tasks as n and the number of resources as m, and constructing an m multiplied by n expected completion time matrix E according to the following formulam×n:
In the above formula, eijThe expected completion time on the jth resource for the ith task.
The second determining module includes:
a. initializing a population, wherein the population consists of m resources and the number of tasks to be processed by each resource, the head length of a chromosome in the population is p, the tail length d is p (l-1) +1, and l is the maximum operand;
b. selecting the optimal individual in the population according to the fitness function value of the population individual, reserving the optimal individual, and carrying out gene crossing, gene mutation and recombination on the current sub-population individual to obtain a new population, wherein the fitness function of the population individual is determined according to the following formula:
in the above formula, M is the range value of the number of the resource selection tasks, C(i,j)Returning a value, T, for the fitness of task i to resource jjSelecting a target value for the number of tasks, f, for a resource jiScheduling the task i to the fitness value on the resource j;
c. if the genetic algebra T satisfies T & gt T, outputting the new population, decoding the new population, obtaining the number of tasks required to be processed by each resource when the target function value is minimum, and if the genetic algebra T does not satisfy T & gt T, making T as T +1 and returning to the step b, wherein the target function is as follows:
in the above formula, hjNumber of tasks to be processed for resource j, eijAnd n is the total number of tasks and m is the total number of resources.
The evaluation module comprises:
the evaluation value G of the task i at the current time t is determined according to the following formulai(t):
Gi(t)=p1Ui(t)+p2Ii
In the above formula, Ui(t) the urgency of task I at the current time t, IiImportance of task i, p1As an urgency weight, p2As importance weight, p1+p2=1。
The distribution module includes:
a. deleting the distributed tasks from the task sequence, and deleting the resources, the number of which is required by the tasks distributed in the resource set to meet the number of the tasks needing to be processed, from the resource set;
b. and c, selecting the top-ranked task in the task sequence, distributing the task to the resource with the minimum expected completion time for completing the task in the resource set, and returning to the step a until the task sequence is empty.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.