CN102184124A - Task scheduling method and system - Google Patents

Task scheduling method and system Download PDF

Info

Publication number
CN102184124A
CN102184124A CN 201110121393 CN201110121393A CN102184124A CN 102184124 A CN102184124 A CN 102184124A CN 201110121393 CN201110121393 CN 201110121393 CN 201110121393 A CN201110121393 A CN 201110121393A CN 102184124 A CN102184124 A CN 102184124A
Authority
CN
China
Prior art keywords
task
computing node
input data
allocated
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110121393
Other languages
Chinese (zh)
Other versions
CN102184124B (en
Inventor
张霄宏
冯圣中
樊建平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN 201110121393 priority Critical patent/CN102184124B/en
Publication of CN102184124A publication Critical patent/CN102184124A/en
Application granted granted Critical
Publication of CN102184124B publication Critical patent/CN102184124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a task scheduling method which comprises the following steps: a compute node requests to allocate tasks; whether the compute node requesting to allocate tasks contains input data or not is judged, if so, the task to be allocated, corresponding to the input data, is dispatched to the compute node, and if not, the dispatching probability is calculated according to the input data distribution and task traversal sequence of the task to be allocated; and the task to be allocated is dispatched to the compute node according to the dispatching probability sequence from high to low. In the task scheduling method and system, the task of the input data, stored in the compute node requesting to allocate tasks, is allocated to the compute node requesting to allocate tasks for executing; when a task of the input data, stored in the compute node requesting to allocate tasks, cannot be allocated, the dispatching probability is calculated according to the input data distribution and task traversal sequence of the task to be allocated, and the task with greatest dispatching probability is preferentially allocated to the compute node requesting to allocate tasks for executing so as to reduce the tasks causing remote data access delay and achieve the aim of reducing network load.

Description

Method for scheduling task and system
[technical field]
The present invention relates to distributed computing technology, particularly relate to a kind of method for scheduling task and system.
[background technology]
Along with Internet fast development, in traditional mass data intensive applications,, caused the expense that transmits data very big because the input data scale that required by task is wanted is very big.Usually task scheduling is carried out with the raising system performance to the back end of having stored the input data that this required by task wants.But the processing power of each node is limited, and the computational resource of vying each other between the different tasks, exist to be dispatched to the task that back end is carried out, these tasks must by the remote I/O operation with the input copying data of correspondence in XM.
Yet a large amount of copies of input data have increased offered load in the remote I/O operation.Traditional optimal way usually adopts prefetching technique.Prefetching technique maskable remote data access time delay, when XM is not stored the input data, to import data pre-fetching in XM, if XM Insufficient disk space, then will import copying data to the node nearest apart from XM, this will cause the secondary transmission of input data, fails to solve the problem that offered load increases the weight of.
[summary of the invention]
Based on this, be necessary to provide a kind of method for scheduling task that reduces offered load.
In addition, also be necessary to provide a kind of task scheduling system that reduces offered load.
A kind of method for scheduling task may further comprise the steps:
Computing node request allocating task;
Judge in the computing node of asking allocating task and whether have the input data, if, then dispatch the to be allocated task corresponding to described computing node, if not, then dispatch probability according to task input DATA DISTRIBUTION to be allocated and the calculating of task traversal order with described input data;
According to described scheduling probability order from high to low with task scheduling to be allocated to the described computing node.
Preferably, also comprise before the described step according to task input DATA DISTRIBUTION to be allocated and task traversal order calculating scheduling probability:
Computing node according to the next request of Task Progress prediction allocating task;
Obtain the task of input data storage in the computing node of described next request allocating task;
The task of input data storage in the computing node of described next request allocating task kept to described next computing node of asking allocating task.
Preferably, the step of described computing node according to the next request of Task Progress to be allocated prediction allocating task is:
Be provided with reference to the input data scale;
The input data scale is not equal to reference to each task of importing data scale according to the Task Progress of correspondence and the progress that calculates imaginary task with reference to the input data scale;
Progress with imaginary task substitutes described Task Progress;
According to all tasks of big minispread of Task Progress, get and the computing node of the pairing computing node of the task of Task Progress maximum as next one request allocating task.
Preferably, described all tasks of big minispread according to Task Progress, get the computing node corresponding and also comprise as after the step of the computing node of next one request allocating task with the task of Task Progress maximum:
Task with the Task Progress maximum in the task of arranging is the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively.
Preferably, the described step of calculating the scheduling probability according to the distribution and the task traversal order of task input data to be allocated is:
The computing node whether the input data of grade, task traversal order and the described task of setting with respect to the computing node of request allocating task according to task to be allocated to be allocated are stored in next request allocating task calculates the scheduling probability;
Describedly with the step of task scheduling to be allocated to the described computing node be according to described scheduling probability order from high to low:
Distance according to the computing node of request allocating task and the memory location of input data judges whether to exist the task to be allocated that belongs to described task grade successively according to from the near to the remote order in the task grade of dividing, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in described next task grade.
A kind of task scheduling system comprises at least:
Computing node is used for Processing tasks;
Control device is used for judging whether the computing node of request allocating task exists the input data, if, then notify dispatching device, if not, notifier processes device then;
Described dispatching device is used for dispatching the to be allocated task corresponding with described input data to described computing node, according to described scheduling probability order from high to low with task scheduling to be allocated to described computing node;
Described treating apparatus is used for calculating the scheduling probability according to task input DATA DISTRIBUTION to be allocated and task traversal order.
Preferably, also comprise:
Prediction unit is used for the computing node according to the next request of Task Progress prediction allocating task;
Described control device also is used to obtain the task of input data storage in the computing node of described next request allocating task;
Described dispatching device also is used for the task of input data storage in the computing node of described next request allocating task kept to described next computing node of asking allocating task.
Preferably, described prediction unit comprises:
Imagination Task Progress computing unit, be used to be provided with reference to input size, be not equal to reference to each task of importing data scale according to the Task Progress of correspondence and with reference to importing the progress that data scale calculates imaginary task, with the alternative described Task Progress of the progress of imaginary task importing data scale;
Extraction unit is used for all tasks of big minispread according to Task Progress, get with the Task Progress maximum the pairing computing node of task as the computing node of next one request allocating task.
Preferably, also to be used in the task of arranging be the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively with the task of Task Progress maximum to described extraction unit.
Preferably, described treating apparatus is used for the grade of setting with respect to the computing node of asking allocating task according to task to be allocated, whether the input data that task travels through order and described task to be allocated are stored in next computing node calculating scheduling probability of asking allocating task;
Described dispatching device also is used for judging whether to exist the task to be allocated that belongs to described task grade in the task grade of dividing according to the distance of the computing node of request allocating task and the memory location of input data successively according to from the near to the remote order, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in described next task grade.
In above-mentioned method for scheduling task and the system, carry out for the computing node of request allocating task in the Task Distribution of the computing node of request allocating task the input data storage, in the time can not distributing the input data storage in the task of computing node of request allocating task, calculate the scheduling probability according to task input DATA DISTRIBUTION to be allocated and traversal order, task to the computing node of asking allocating task of priority allocation scheduling probability maximum is carried out, cause the task that remote data access is delayed time with minimizing, reach the purpose that reduces offered load simultaneously.
In above-mentioned method for scheduling task and the system, calculate the computing node of the next request of prediction allocating task, the task of input data storage on the computing node of next one request allocating task kept to next requesting node execution, reduce offered load and remote data access time delay, improved system performance.
[description of drawings]
Fig. 1 is the process flow diagram of a method for scheduling task among the embodiment;
Fig. 2 is the process flow diagram of the method for scheduling task among another embodiment;
Fig. 3 is according to the next method flow diagram of asking the computing node of allocating task of Task Progress prediction among Fig. 1;
Fig. 4 is the structural representation of a task scheduling system among the embodiment;
Fig. 5 is the structural representation of the task scheduling system among another embodiment;
Fig. 6 is the structural representation of prediction module among Fig. 5.
[embodiment]
Fig. 1 shows the method flow of a task scheduling among the embodiment, may further comprise the steps:
Step S101, computing node request allocating task.In the present embodiment, computing node is handled one during allocating task, and request distributes new task under the situation that resources such as current CPU (central processing unit, Central Processing Unit), hard disk meet the demands.For example, handling a request of under the free space of hard disk satisfies the situation of the demand in the new task implementation, initiating allocating task after the allocating task when computing node.
Step S103 judges in the computing node of asking allocating task whether have the input data, if, then enter step S105, if not, then enter step S107.In the present embodiment, the input data are the computing node necessary data of executing the task.In the task scheduling and implementation of reality, the input data storage of each task correspondence is in any computing node, when computing node was carried out a certain task, corresponding input data may be stored in this computing node, also may be stored in other computing node.If the input data storage is in other computing node, the computing node of then executing the task need be imported the input data of transmission to obtain being used to execute the task of data by the remote I/O operation.
For reducing offered load, reduce the number of tasks of carrying out the remote I/O operation in the processing procedure, preferred scheduling input data storage is in the task of this computing node, make computing node when executing the task, not need to import the transmission of data, and then avoid because of the time-delay of remote I/O operation causing remote data access.
Step S105 dispatches the to be allocated task corresponding with the input data to computing node.In the present embodiment, when having stored the input data for the treatment of scheduler task in the computing node of request allocating task, to import the corresponding task scheduling of data to the computing node of asking allocating task with this, the transmission that makes this computing node not need to import data can be executed the task.
Step S107 calculates the scheduling probability according to task input DATA DISTRIBUTION to be allocated and task traversal order.In the present embodiment, for task scheduling to be allocated in having stored the nearest computing node of input data computing node, carrying out, need be according to the memory location calculating of dispatching probability of input data in the task to be allocated, the scheduling probability is high more, and then the memory location of explanation input data is approaching more with the computing node of this task of execution.
In a specific embodiment, the step of calculating the scheduling probability according to task to be allocated input DATA DISTRIBUTION and task traversal order is: the computing node whether the input data of grade, task traversal order and the task of setting with respect to the computing node of request allocating task according to task to be allocated to be allocated are stored in next request allocating task calculates the scheduling probability.In the present embodiment, for task scheduling to carrying out from having stored the nearest computing node of input data, task is carried out the setting of task grade according to the memory location of input data with the distance of the computing node of request allocating task.In a preferred embodiment, is the first estate with the input data storage in the task division of the computing node of request allocating task, the task that the input data storage is connected in other computing node on the same switch at the computing node with the request allocating task is as second grade, and the input data storage is being linked task on the computing node of different switches as the tertiary gradient with the computing node of request allocating task.The input data of the task of second grade and the tertiary gradient all are not stored in computing node, if distribute the task of second grade and the tertiary gradient for the computing node of request allocating task, described node needs to carry out data transmission in the process of these tasks of execution, increase offered load.It imports the node of data to storage if can keep task, and described node does not need to carry out data transmission in the process of executing the task.Therefore the scheduling probability that needs calculation task.When computing node request task, give lower probability by giving the input data storage in the task of the computing node of next one request allocating task, and the highest task of priority allocation scheduling probability, can reduce causing the remote data access time-delay and increasing offered load of task.The scheduling probability is calculated by following formula:
P R=P L-P k+P T
Wherein, P RBe the scheduling probability of task, P LBe the probability factor by the decision of task grade, P kThe probability factor of k computing node decision of the next request task of serving as reasons, P TBe probability factor by the decision of task traversal order.
P L = 1 - i - 1 l
Wherein, i is the task grade, and l is the task total number of grades.
If the input data storage of task is on k computing node, then If not, P then k=0.
P T = j n × l × k
Wherein, j is a task traversal order, and n is the subtask sum in task sum or the task.
With the P that calculates L, P kAnd P TObtain the scheduling probability of each task in the computing formula of substitution scheduling probability.
Among another embodiment, as shown in Figure 2, it is before further comprising the steps of to calculate the step of dispatching probability according to the memory location of input data in the task to be allocated in the above-mentioned method for scheduling task:
Step S201 is according to the computing node of the next request of Task Progress prediction allocating task.In the present embodiment, input data volume of having handled and input data total amount than Task Progress in the value representation computing node.For example, in homogeneous environment, the processing power of each computing node is identical, under the identical prerequisite of all task input data scales, the fastest task of Task Progress will be done the earliest, handle the computing node of this task and also will ask to carry out new task the earliest, promptly this computing node is the computing node of next request allocating task.
In a specific embodiment, as shown in Figure 3, the step of above-mentioned computing node according to the next request of Task Progress prediction allocating task is:
Step S211 is provided with reference to the input data scale.In the present embodiment, in the computing node that all are being executed the task, each computing node all has with it Task Progress one to one.But in the actual process of task, the input data of each task all are to be divided into some parts according to certain length, each part is the input data of a subtask, therefore, if the input data scale of task is not the integral multiple of input data standard partition length, exist the input data scale not to wait in the task and with reference to the task of input size.For the identical task of input data scale different task progress, it is less to import the pending input data of the little task of data scale, will finish earlier, carries out the computing node of described task and asks allocating task earlier.
The difference of input data scale causes being difficult to accurately predict according to Task Progress the computing node of next request allocating task.For this reason, will import the data standard partition length greatly as with reference to the input data scale, for example, can be the size of distributed file system blocks of files in the computing environment with reference to the input data scale.
Step S231 is not equal to reference to each task of importing data scale according to the Task Progress of correspondence and with reference to importing the progress that data scale calculates imaginary task, with the alternative described Task Progress of the progress of imaginary task importing data scale.In the present embodiment, be not equal to each task for the input data scale with reference to input size, in order to ask the computing node of allocating task according to the Task Progress prediction next one, with each duty mapping to one input data scale and the identical imaginary task of reference input data scale, with the progress replacement Task Progress of imaginary task.Model as follows can be used for calculating the progress of imaginary task:
f ( x , y ) = 1 + x - 1 &alpha; &times; y if y ( 1 - x ) &alpha; 0 otherwise < 1
Wherein, x and y represent Task Progress and input data scale respectively; α represents with reference to the input data scale, f (x, y) progress of the imaginary task of expression.
Step S251 according to all tasks of big minispread of Task Progress, gets and the computing node of the pairing computing node of the task of Task Progress maximum as next one request allocating task.In the present embodiment, with the Task Progress of each task according to big minispread, with the computing node of the task of implementation progress maximum computing node as next one request allocating task.For example, each Task Progress can be stored in the tabulation by descending, at this moment, be the computing node of the next one request allocating task of prediction with the corresponding computing node of first element in the tabulation.
In addition, above-mentioned method for scheduling task is according to the big minispread task of progress, comprised that also the task with the Task Progress maximum is the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively in the task of arranging after getting the step of computing node as the computing node of next one request allocating task of task of implementation progress maximum.In the present embodiment, owing to exist uncertainty and dynamic, therefore situation about predicting and actual conditions may be also incomplete same, suppose that the computing node request allocating task order of prediction and the error between the PS are k, task with the Task Progress maximum in the task of arranging is initial k the task of extracting successively, and the computing node of then next request allocating task necessarily comprises in this k computing node.Error k can measure by experiment and obtain, and adjusts flexibly according to the actual needs.
Step S203 obtains the input data storage in the task of the computing node of next one request allocating task.
Step S205 keeps computing node to next one request allocating task with the input data storage in the task of the computing node of next one request allocating task.
Step S109, according to scheduling probability order from high to low with task scheduling to be allocated to computing node.In the present embodiment, according to scheduling probability order from high to low be: in the task grade of dividing, judge whether to exist the task to be allocated that belongs to this task grade successively according to from the near to the remote order according to the distance of the computing node of request allocating task and the memory location of input data with task scheduling to be allocated to the step in the computing node, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in this next task grade.
Judge in each task grade and whether have task, because the task grade is to divide with the distance of the distance of the computing node of request allocating task according to the memory location of input data in the task, therefore, whether there is the task of belonging to this task grade in the nearest task grade of the computing node of elder generation's judging distance request allocating task, if distance asks not have the task of belonging to this task grade in the nearest task grade of the computing node of allocating task, then in each task grade, judge successively according to order from the near to the remote with the distance of the computing node of request allocating task according to the memory location of input data in the task to be allocated.For example, in the Three Estate of dividing, begin to judge whether have task to be allocated this grade, judge by that analogy successively from the first estate.
If have task to be allocated in a certain task grade, then the task to be allocated in this task grade is given Task Distribution to be allocated successively the computing node of request allocating task according to scheduling probability order from high to low.If do not have task to be allocated in a certain task grade, then enter the next task grade and judge in this task grade whether have task to be allocated.
Under the effect of scheduling probability, because the computing node priority allocation of request allocating task is to the high task of scheduling probability, therefore importing data storage carries out owing to the scheduling probability hangs down in the computing node that can not be assigned to the request allocating task in the task of the computing node of next one request allocating task, and keep the computing node of giving next request allocating task, avoided described task to cause remote data access time-delay and offered load in the process of implementation.
Fig. 4 shows the detailed structure of task scheduling system among the embodiment, comprises computing node 10, control device 30, dispatching device 50 and treating apparatus 70 at least.
Computing node 10 is used for Processing tasks.In the present embodiment, computing node 10 is handled one during allocating task, and request distributes new task under the situation that resources such as current CPU (central processing unit, Central Processing Unit), hard disk meet the demands.For example, handling the request that under the free space of hard disk satisfies the situation of the demand in the new task implementation, can initiate allocating task after the allocating task when computing node 10.
Control device 30 is used to judge whether the computing node 10 of request allocating task exists the input data, if, then notify dispatching device 50, if not, then the notifier processes device 70.In the present embodiment, the input data are the computing node 10 necessary data of executing the task.In the task scheduling and implementation of reality, the input data storage of each task correspondence is in any computing node 10, when computing node 10 was carried out a certain task, corresponding input data may be stored in this computing node 10, also may be stored in other computing node 10.If control device 30 is found the input data storage in other computing node 10, the computing node 10 of then executing the task need be by remote I/O operation transmission input data to computing node 10.
Dispatching device 50 is used for dispatching the to be allocated task corresponding with importing data to computing node, according to scheduling probability order from high to low with task scheduling to be allocated to computing node.In the present embodiment, for reducing offered load, reduce the number of tasks of carrying out the remote I/O operation in the processing procedure, dispatching device 50 preferred scheduling input data storage are in the task of the computing node 10 of request allocating task, make computing node 10 when executing the task, not need to import the transmission of data, and then avoid causing remote data access time-delay and offered load because of transmitting these task input data.
Treating apparatus 70 is used for calculating the scheduling probability according to task input DATA DISTRIBUTION to be allocated and task traversal order.In the present embodiment, for task scheduling to be allocated is carried out to importing in the nearest computing node 10 of data from this task, need treating apparatus 70 to calculate the scheduling probability according to input DATA DISTRIBUTION and task traversal order in the task to be allocated, the scheduling probability is high more, and then the computing node 10 of the memory location of explanation input data and current request allocating task is approaching more.
In a specific embodiment, treating apparatus 70 is used for the grade of setting with respect to the computing node of asking allocating task according to task to be allocated, whether the input data that task travels through order and task to be allocated are stored in next computing node calculating scheduling probability of asking allocating task.In the present embodiment, it is the first estate in the task division of the computing node 10 of request allocating task that data storage will be imported in scheduling probability calculation unit 703, the task that the input data storage is connected in other computing node on the same switch at the computing node 10 with the request allocating task is as second grade, and the input data storage is being linked task on the computing node 10 of different switches as the tertiary gradient with the computing node 10 of request allocating task.Be about to ask the task on the computing node of allocating task to give lower probability can for the input data storage.When computing node request task,, can keep task to the clearing node of its input data of storage and carry out by the highest task of priority allocation scheduling probability.The scheduling probability is calculated by following formula:
P R=P L-P k+P T
Wherein, P RBe the scheduling probability of task, P LBe the probability factor by the decision of task grade, P kThe probability factor of k computing node decision of the next request task of serving as reasons, P TBe probability factor by task traversal order.
P L = 1 - i - 1 l
Wherein, i is the task grade, and l is the task total number of grades.
If k is about to ask the computing node of task to store the input data of task, then
Figure BDA0000060584890000102
If not, P then k=0.
P T = j n &times; l &times; k
Wherein, j is a task traversal order, and n is the task sum that comprises in the operation or the subtask sum in the task.
With the P that calculates L, P kAnd P TObtain the scheduling probability of each task in the computing formula of substitution scheduling probability.
Among another embodiment, as shown in Figure 5, above-mentioned task scheduling system has also comprised prediction unit 90, and this prediction unit 90 is used for the computing node 10 according to the next request of Task Progress prediction allocating task.In the present embodiment, input data volume of having handled and input data total amount than task handling progress in the value representation computing node.For example, in homogeneous environment, the processing power of each computing node is identical, under the identical prerequisite of all task input data scales, the fastest task of Task Progress will be done the earliest, handle the computing node of this task and also will ask to carry out new task the earliest, promptly this computing node is the computing node of next request allocating task.
Control device 30 also is used to obtain the input data storage in the task of the computing node of next one request allocating task.
Dispatching device 50 also is used for the input data storage is kept computing node 10 to next one request allocating task in the task of the computing node of next one request allocating task.
In a specific embodiment, as shown in Figure 6, prediction unit 90 comprises progress computing unit 901 and extraction unit 903.
Imagination Task Progress computing unit 901, be used to be provided with reference to input size, be not equal to reference to each task of importing data scale according to the Task Progress of correspondence and with reference to importing the progress that data scale calculates imaginary task, with the progress alternative tasks progress of imaginary task for the input data scale.In the present embodiment, in the computing node that all are being executed the task, each computing node all has corresponding with it Task Progress.But in the actual process of task, the input data scale of each task differs to establish a capital and equals with reference to the input data scale.For the identical subtask of input data scale different task progress, it is less to import the pending input data of the little task of data scale, will finish earlier, carries out the computing node of this task and asks allocating task earlier.
The difference of different task input data scale causes being difficult to accurately predict according to Task Progress the computing node of next request allocating task.For this reason, will import data criteria for classifying length as with reference to the input data scale, for example, can be the size of blocks of files in the distributed file system with reference to the input data scale.
Be not equal to reference to each task of importing data scale for the input data scale, in order to ask the computing node of allocating task according to the task processing progress prediction next one, imagination Task Progress computing unit 901 is with each duty mapping to one input size and the identical imaginary task of reference input data scale, with the progress replacement Task Progress of imaginary task.Model as follows can be used for calculating the progress of imaginary task:
f ( x , y ) = 1 + x - 1 &alpha; &times; y if y ( 1 - x ) &alpha; 0 otherwise < 1
Wherein, x and y represent Task Progress and input data scale respectively; α represents that (x y) represents imaginary progress to f with reference to the input data scale.
Extraction unit 903 is used for all tasks of big minispread according to Task Progress, gets and the computing node 10 of the pairing computing node of the task of Task Progress maximum as next one request allocating task.In the present embodiment, extraction unit 903 with the Task Progress of each task according to big minispread, obtaining the task of progress maximum, with the computing node 10 of this task computing node 10 as next request allocating task.For example, each Task Progress can be stored in the tabulation by descending, at this moment, the computing node 10 corresponding with the task of progress maximum in the tabulation is the computing node 10 of the next one request allocating task that predicts.
In addition, also to be used in the task of arranging be the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively with the task of Task Progress maximum to extraction unit 903.In the present embodiment, owing to exist uncertainty and dynamic, situation about predicting and actual conditions can be also incomplete same, the request allocating task order of assumed calculation node 10 prediction and the error between the PS are k, and the computing node 10 of then next request allocating task is included in necessarily that the task with the progress maximum is in the computing node 10 of initial k the task of extracting successively in the task of arrangement.Error k can measure by experiment and obtain, and adjusts flexibly according to the actual needs.
Dispatching device 50 also is used for judging whether to exist the task to be allocated that belongs to the corresponding task grade in the task grade of dividing according to the distance of the computing node 10 of request allocating task and the memory location of input data successively according to from the near to the remote order, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node 10 of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in next task grade.
Dispatching device 50 judges in each task grade whether have task, because the task grade is to divide with the distance of the distance of the computing node 10 of request allocating task according to the memory location of task input data, therefore, whether there is the task of belonging to this task grade in the nearest task grade of the computing node 10 of elder generation's judging distance request allocating task, if do not have the task of belonging to this task grade in the nearest task grade of the computing node of distance request allocating task 10, then in each task grade, judge successively according to order from the near to the remote with the distance of the computing node 10 of request allocating task according to the memory location of input data in the task to be allocated.For example, in the Three Estate of dividing, begin to judge whether have task to be allocated this grade, judge by that analogy successively from the first estate.
If have task to be allocated in a certain task grade, then dispatching device 50 is given the task to be allocated in this task grade the computing node of request allocating task successively with Task Distribution to be allocated according to the order from high to low of scheduling probability.If do not have task to be allocated in a certain task grade, then dispatching device 50 enters the next task grade and judges in this task grade whether have task to be allocated.
Under the effect of scheduling probability, because the computing node priority allocation of request allocating task is to the high task of scheduling probability, therefore importing data storage carries out owing to the scheduling probability hangs down in the computing node that can not be assigned to the current request allocating task in the task of the computing node of next one request allocating task, carry out for the computing node of next request allocating task and keep, reach the minimizing data transmission, reduced the purpose of offered load.
In above-mentioned method for scheduling task and the system, carry out for the computing node of request allocating task in the Task Distribution of the computing node of request allocating task the input data storage, in the time can not distributing the input data storage in the task of computing node of request allocating task, calculate the scheduling probability according to task input DATA DISTRIBUTION to be allocated and traversal order, task to the computing node of asking allocating task of priority allocation scheduling probability maximum is carried out, cause the task that remote data access is delayed time with minimizing, reach the purpose that reduces offered load simultaneously.
In above-mentioned method for scheduling task and the system, calculate the computing node of the next request of prediction allocating task, the task of input data storage on the computing node of next one request allocating task kept to next requesting node execution, reduce offered load, reduced the remote data access time-delay, improved system performance.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (10)

1. method for scheduling task may further comprise the steps:
Computing node request allocating task;
Judge in the computing node of asking allocating task and whether have the input data, if, then dispatch the to be allocated task corresponding to described computing node, if not, then dispatch probability according to task input DATA DISTRIBUTION to be allocated and the calculating of task traversal order with described input data;
According to described scheduling probability order from high to low with task scheduling to be allocated to the described computing node.
2. method for scheduling task according to claim 1 is characterized in that, also comprises before the described step according to task input DATA DISTRIBUTION to be allocated and task traversal order calculating scheduling probability:
Computing node according to the next request of Task Progress prediction allocating task;
Obtain the task of input data storage in the computing node of described next request allocating task;
The task of input data storage in the computing node of described next request allocating task kept to described next computing node of asking allocating task.
3. method for scheduling task according to claim 2 is characterized in that, the step of described computing node according to the next request of Task Progress to be allocated prediction allocating task is:
Be provided with reference to the input data scale;
Be not equal to reference to each task of importing data scale according to the Task Progress of correspondence and with reference to importing the progress that data scale calculates imaginary task, with the alternative described Task Progress of the progress of imaginary task importing data scale;
According to all tasks of big minispread of Task Progress, get and the computing node of the pairing computing node of the task of Task Progress maximum as next one request allocating task.
4. method for scheduling task according to claim 3, it is characterized in that, described all tasks of big minispread according to Task Progress, get the computing node corresponding and also comprise as after the step of the computing node of next one request allocating task with the task of Task Progress maximum:
Task with the Task Progress maximum in the task of arranging is the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively.
5. method for scheduling task according to claim 1 is characterized in that, the described step of calculating the scheduling probability according to the distribution and the task traversal order of task input data to be allocated is:
The computing node whether the input data of grade, task traversal order and the described task of setting with respect to the computing node of request allocating task according to task to be allocated to be allocated are stored in next request allocating task calculates the scheduling probability;
Describedly with the step of task scheduling to be allocated to the described computing node be according to described scheduling probability order from high to low:
Distance according to the computing node of request allocating task and the memory location of input data judges whether to exist the task to be allocated that belongs to described task grade successively according to from the near to the remote order in the task grade of dividing, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in described next task grade.
6. a task scheduling system is characterized in that, comprises at least:
Computing node is used for Processing tasks;
Control device is used for judging whether the computing node of request allocating task exists the input data, if, then notify dispatching device, if not, notifier processes device then;
Described dispatching device is used for dispatching the to be allocated task corresponding with described input data to described computing node, according to described scheduling probability order from high to low with task scheduling to be allocated to described computing node;
Described treating apparatus is used for calculating the scheduling probability according to task input DATA DISTRIBUTION to be allocated and task traversal order.
7. task scheduling system according to claim 6 is characterized in that, also comprises:
Prediction unit is used for the computing node according to the next request of Task Progress prediction allocating task;
Described control device also is used to obtain the task of input data storage in the computing node of described next request allocating task;
Described dispatching device also is used for the task of input data storage in the computing node of described next request allocating task kept to described next computing node of asking allocating task.
8. method for scheduling task according to claim 7 is characterized in that, described prediction unit comprises:
Imagination Task Progress computing unit, be used to be provided with reference to input size, be not equal to reference to each task of importing data scale according to the Task Progress of correspondence and with reference to importing the progress that data scale calculates imaginary task, with the alternative described Task Progress of the progress of imaginary task importing data scale;
Extraction unit is used for all tasks of big minispread according to Task Progress, get with the Task Progress maximum the pairing computing node of task as the computing node of next one request allocating task.
9. method for scheduling task according to claim 8, it is characterized in that it is the initial computing node of the pairing computing node of task of default number as next one request allocating task that extract successively with the task of Task Progress maximum that described extraction unit also is used in the task of arranging.
10. method for scheduling task according to claim 6, it is characterized in that described treating apparatus is used for the grade of setting with respect to the computing node of asking allocating task according to task to be allocated, whether the input data that task travels through order and described task to be allocated are stored in next computing node calculating scheduling probability of asking allocating task;
Described dispatching device also is used for judging whether to exist the task to be allocated that belongs to described task grade in the task grade of dividing according to the distance of the computing node of request allocating task and the memory location of input data successively according to from the near to the remote order, if, then according to dispatching the corresponding task to be allocated of probability sequential scheduling from high to low to the computing node of asking allocating task, if not, then select next task dispatching level to enter the judgement that whether has task to be allocated in described next task grade.
CN 201110121393 2011-05-11 2011-05-11 Task scheduling method and system Active CN102184124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110121393 CN102184124B (en) 2011-05-11 2011-05-11 Task scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110121393 CN102184124B (en) 2011-05-11 2011-05-11 Task scheduling method and system

Publications (2)

Publication Number Publication Date
CN102184124A true CN102184124A (en) 2011-09-14
CN102184124B CN102184124B (en) 2013-06-05

Family

ID=44570304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110121393 Active CN102184124B (en) 2011-05-11 2011-05-11 Task scheduling method and system

Country Status (1)

Country Link
CN (1) CN102184124B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199738A (en) * 2014-08-11 2014-12-10 苏州阔地网络科技有限公司 Multi-data processing equipment cooperative work method and system
CN104754007A (en) * 2013-12-26 2015-07-01 伊姆西公司 Method and device for managing network attached storage
CN107645541A (en) * 2017-08-24 2018-01-30 阿里巴巴集团控股有限公司 Date storage method, device and server
CN111679904A (en) * 2020-03-27 2020-09-18 北京世纪互联宽带数据中心有限公司 Task scheduling method and device based on edge computing network
CN113176937A (en) * 2021-05-21 2021-07-27 北京字节跳动网络技术有限公司 Task processing method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393455B1 (en) * 1997-03-28 2002-05-21 International Business Machines Corp. Workload management method to enhance shared resource access in a multisystem environment
CN101770402A (en) * 2008-12-29 2010-07-07 中国移动通信集团公司 Map task scheduling method, equipment and system in MapReduce system
CN102004670A (en) * 2009-12-17 2011-04-06 华中科技大学 Self-adaptive job scheduling method based on MapReduce

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393455B1 (en) * 1997-03-28 2002-05-21 International Business Machines Corp. Workload management method to enhance shared resource access in a multisystem environment
CN101770402A (en) * 2008-12-29 2010-07-07 中国移动通信集团公司 Map task scheduling method, equipment and system in MapReduce system
CN102004670A (en) * 2009-12-17 2011-04-06 华中科技大学 Self-adaptive job scheduling method based on MapReduce

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104754007A (en) * 2013-12-26 2015-07-01 伊姆西公司 Method and device for managing network attached storage
CN104199738A (en) * 2014-08-11 2014-12-10 苏州阔地网络科技有限公司 Multi-data processing equipment cooperative work method and system
CN104199738B (en) * 2014-08-11 2018-05-25 阔地教育科技有限公司 A kind of more data processing equipment collaboration working methods and system
CN107645541A (en) * 2017-08-24 2018-01-30 阿里巴巴集团控股有限公司 Date storage method, device and server
CN107645541B (en) * 2017-08-24 2021-03-02 创新先进技术有限公司 Data storage method and device and server
CN111679904A (en) * 2020-03-27 2020-09-18 北京世纪互联宽带数据中心有限公司 Task scheduling method and device based on edge computing network
CN111679904B (en) * 2020-03-27 2023-10-31 北京世纪互联宽带数据中心有限公司 Task scheduling method and device based on edge computing network
CN113176937A (en) * 2021-05-21 2021-07-27 北京字节跳动网络技术有限公司 Task processing method and device and electronic equipment
CN113176937B (en) * 2021-05-21 2023-09-12 抖音视界有限公司 Task processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN102184124B (en) 2013-06-05

Similar Documents

Publication Publication Date Title
CN108351805B (en) Flow-based accelerator processing of computational graphs
US10474504B2 (en) Distributed node intra-group task scheduling method and system
CN106020933B (en) Cloud computing dynamic resource scheduling system and method based on ultralight amount virtual machine
CN111400022A (en) Resource scheduling method and device and electronic equipment
CN105487930A (en) Task optimization scheduling method based on Hadoop
JP2009277041A (en) Priority control program, priority control device and priority control method
CN114138486A (en) Containerized micro-service arranging method, system and medium for cloud edge heterogeneous environment
KR101471749B1 (en) Virtual machine allcoation of cloud service for fuzzy logic driven virtual machine resource evaluation apparatus and method
JP2010122758A (en) Job managing device, job managing method and job managing program
Li et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters
CN107370799B (en) A kind of online computation migration method of multi-user mixing high energy efficiency in mobile cloud environment
CN102184124B (en) Task scheduling method and system
CN114371926B (en) Refined resource allocation method and device, electronic equipment and medium
CN112181664B (en) Load balancing method and device, computer readable storage medium and electronic equipment
CN105740059B (en) A kind of population dispatching method towards Divisible task
CN116820784B (en) GPU real-time scheduling method and system for reasoning task QoS
Delavar et al. A synthetic heuristic algorithm for independent task scheduling in cloud systems
CN104917839A (en) Load balancing method used in cloud computing environment
US9417924B2 (en) Scheduling in job execution
Zikos et al. A clairvoyant site allocation policy based on service demands of jobs in a computational grid
Wang et al. On mapreduce scheduling in hadoop yarn on heterogeneous clusters
Sirohi et al. Improvised round robin (CPU) scheduling algorithm
CN114138453A (en) Resource optimization allocation method and system suitable for edge computing environment
CN115061794A (en) Method, device, terminal and medium for scheduling task and training neural network model
CN110764886A (en) Batch job cooperative scheduling method and system supporting multi-partition processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant