CN102243598B - Method for scheduling task in Distributed Data Warehouse and system - Google Patents

Method for scheduling task in Distributed Data Warehouse and system Download PDF

Info

Publication number
CN102243598B
CN102243598B CN201010188509.5A CN201010188509A CN102243598B CN 102243598 B CN102243598 B CN 102243598B CN 201010188509 A CN201010188509 A CN 201010188509A CN 102243598 B CN102243598 B CN 102243598B
Authority
CN
China
Prior art keywords
group
real
task
subtask
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010188509.5A
Other languages
Chinese (zh)
Other versions
CN102243598A (en
Inventor
李均
郭玮
洪坤乾
赵伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201010188509.5A priority Critical patent/CN102243598B/en
Publication of CN102243598A publication Critical patent/CN102243598A/en
Application granted granted Critical
Publication of CN102243598B publication Critical patent/CN102243598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention provides the method for scheduling task in a kind of Distributed Data Warehouse and system, said method comprising the steps of: task is divided into multiple task groups by A. by type, setting should distribute to the resource ratio of task groups respectively; B. according to described resource ratio, Resourse Distribute is carried out to described multiple task groups.Described system comprises: grouping module, by type task is divided into multiple task groups, and setting should distribute to the resource ratio of described task groups respectively; Resource distribution module, carries out Resourse Distribute according to described resource ratio to described multiple task groups.Adopt the method for scheduling task in distributed data base warehouse provided by the invention and system, can reasonable distribution resource, the computation requirement of real-time little task can not only be met, also can meet the computation requirement of non real-time large task.

Description

Method for scheduling task in Distributed Data Warehouse and system
[technical field]
The present invention relates to technical field of data processing, particularly relate to the method for scheduling task in a kind of Distributed Data Warehouse and system.
[background technology]
Data warehouse (Data Warehouse) is the structural data environment of decision support system (DSS) and on-line analysis application data source, can study and solve the problem of obtaining information from database.Distributed Data Warehouse refers to use based on GFS (Google File System, an extendible distributed file system) and MapReduce (a kind of programming model, the concurrent operation for large-scale dataset) correlation technique the data warehouse solution of mass memory and calculation services is provided.
Adopt the Distributed Data Warehouse that MapReduce programming model realizes, usually FIFO (First Input First Output is adopted when carrying out multi-task scheduling, first in first out) scheduling strategy, namely after user submits a task (job) to, according to time and the position of task priority determination task in fifo queue of job invocation, the priority of task of queue foremost obtains all computational resources of system.
Fig. 1 shows the task scheduling sequence chart in traditional Distributed Data Warehouse in fifo queue, which depict the scheduling process of 3 tasks in fifo queue.Supposing the system always has 2 M (Map) and 2 R (Reduce) computational resources.Start most task 1 and occupy all computational resources, 2 M computational resources and 2 R computational resources are scheduled simultaneously, and the expression task that monocline line is filled is run.Become cross spider after 2 M computational resources of task 1 terminate to fill, then continue scheduling 2 M computational resources, the surplus next M computational resource of last task 1 needs scheduling, and the resource had more then distributes to task 2.
But traditional this task scheduling mode has task matching unfairness in resources, the problem that resource can not obtain an equitable breakdown.Such as, when large task is run together with little task, there are some little tasks may be mixed in the middle of large task in queue, little task can be caused like this to be in state of hungering and thirst for a long time, the requirement of the also needs real-time little task of response fast while the application scenarios that therefore cannot meet Distributed Data Warehouse should meet the non real-time large task of user.
[summary of the invention]
Based on this, be necessary to provide the method for scheduling task in a kind of energy reasonable distribution distribution of resource formula data warehouse.
A method for scheduling task in Distributed Data Warehouse, comprises the following steps: task is divided into multiple task groups by A. by type, and setting should distribute to the resource ratio of task groups respectively; B. according to described resource ratio, Resourse Distribute is carried out to described multiple task groups.
Steps A is: task is divided into mission critical group, real-time task group and un-real time job group by type.
The method also can comprise: the subtask number that the task in real-time statistics mission critical group is being run and the subtask number that needs run, the subtask number that task in real-time task group is being run and the subtask number that needs run, the subtask number that task in un-real time job group is being run and the subtask number that needs run, the subtask sum run in mission critical group and the subtask sum needing operation, the subtask sum run in real-time task group and the subtask sum needing operation, the subtask sum run in un-real time job group and the subtask sum needing operation.
Wherein, step B can be: B1. judges whether there is the subtask that need run in described mission critical group, if so, then performs step B2, otherwise performs step B3; B2. according to the scheduling strategy of described mission critical group by Resourse Distribute to the subtask in mission critical group; B3. Resourse Distribute is carried out to described real-time task group and un-real time job group.
Wherein, step B3 can be specifically: B31. obtains the resource amount of described real-time task group according to described resource ratio, has judged whether that Resourse Distribute is to real-time task group, if so, then performs step B32, otherwise performs step B34; B32. judge whether to have in described real-time task group the subtask and the total resource amount being less than described real-time task group in the subtask run in real-time task group that need to run, if so, then perform step B33, otherwise perform step B34; B33. according to the scheduling strategy of described real-time task group Resourse Distribute given the subtask in described real-time task group; B34. obtain the resource amount of un-real time job group according to described resource ratio, judged whether that Resourse Distribute is to un-real time job group, if so, then perform step B35, otherwise perform step B37; B35. judge whether to have in described un-real time job group the subtask and the total resource amount being less than un-real time job group in the subtask run in un-real time job group that need to run, if so, then perform step B36, otherwise terminate; B36. according to the scheduling strategy in described un-real time job group Resourse Distribute given the subtask in described un-real time job group.
Wherein, the method can also comprise: when the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group is greater than the resource amount of un-real time job group, the resource amount of real-time task group is obtained according to described resource ratio, judge whether that Resourse Distribute is to real-time task group further, if, then judge in described real-time task group, whether to have the subtask that need run further and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy in real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise terminate.
In addition, there is a need to provide the task scheduling system in a kind of energy reasonable distribution distribution of resource formula data warehouse.
A task scheduling system in Distributed Data Warehouse, comprising: grouping module, by type task is divided into multiple task groups, and setting should distribute to the resource ratio of described task groups respectively; Resource distribution module, carries out Resourse Distribute according to described resource ratio to described multiple task groups.
Task can be divided into mission critical group, real-time task group and un-real time job group by this grouping module by type.
This system also can comprise counter, the subtask number that the subtask number run for the task in real-time statistics mission critical group and needs run, the subtask number that task in real-time task group is being run and the subtask number that needs run, the subtask number that task in un-real time job group is being run and the subtask number that needs run, the subtask sum run in mission critical group and the subtask sum needing operation, the subtask sum run in real-time task group and the subtask sum needing operation, the subtask sum run in un-real time job group and the subtask sum needing operation.
Wherein, resource distribution module also can be used for judging whether there is the subtask that need run in described mission critical group, if, then according to the scheduling strategy of described mission critical group by Resourse Distribute to the subtask in mission critical group, otherwise Resourse Distribute is carried out to described real-time task group and un-real time job group.
Wherein, resource distribution module also can be used for the resource amount obtaining real-time task group according to described resource ratio, judge whether that Resourse Distribute is to real-time task group, if, whether then judge further to have in described real-time task group needs the subtask of operation and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy of described real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise the resource amount of un-real time job group is obtained according to described resource ratio, judge whether that Resourse Distribute is to un-real time job group, if, whether then judge further to have in described un-real time job group needs the subtask of operation and the total resource amount being less than un-real time job group in the subtask run in un-real time job group, if, then according to the scheduling strategy in described un-real time job group Resourse Distribute given the subtask in described un-real time job group, otherwise terminate.
Wherein, resource distribution module also can be used for when the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group is greater than the resource amount of un-real time job group, the resource amount of real-time task group is obtained according to described resource ratio, judge whether that Resourse Distribute is to real-time task group further, if, then judge in described real-time task group, whether to have the subtask that need run further and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy in real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise terminate.
Method for scheduling task in above-mentioned Distributed Data Warehouse and system, be divided into multiple task groups by task, and setting should distribute to the resource ratio of different task group respectively, carries out Resourse Distribute according to resource ratio to multiple task groups, can reasonable distribution resource.
In addition, by task being divided into mission critical group, real-time task group, un-real time job group, can when Resourse Distribute, preferentially by Resourse Distribute to mission critical group; Owing to setting the resource ratio of these three task groups, the resource amount of each task groups can be got, when mission critical group does not need resource, resource amount according to real-time task group and un-real time job group is allocated, the resource requirement of un-real time job can not only be met, also can meet the resource requirement of real-time task, real-time task can be responded fast, make resource obtain rationally and utilize fully.Further, different task groups can adopt the different scheduling strategy being applicable to this task groups, therefore can more reasonably Resources allocation.
[accompanying drawing explanation]
Fig. 1 is the task scheduling sequence chart in traditional distributed data warehouse in fifo queue;
Fig. 2 is the process flow diagram of the method for scheduling task in the present invention in Distributed Data Warehouse;
Fig. 3 is the method flow diagram in an embodiment, task groups being carried out to Resourse Distribute;
Fig. 4 is the method flow diagram in an embodiment, real-time task group and un-real time job group being carried out to Resourse Distribute;
Fig. 5 is the structural representation of the task scheduling system in an embodiment in Distributed Data Warehouse;
Fig. 6 is the structural representation of the task scheduling system in another embodiment in Distributed Data Warehouse.
[embodiment]
As shown in Figure 2, the method for scheduling task in a kind of Distributed Data Warehouse, comprises the following steps:
Step S10, is divided into multiple task groups by task by type, and setting should distribute to the resource ratio of task groups respectively.In one embodiment, task is divided into multiple task groups by the type according to task, such as mission critical group, real-time task group and un-real time job group etc., wherein, mission critical group comprises some and needs timing output and very important mission critical, the daily paper, monthly magazine etc. of such as department; Real-time task group comprises the little task of needs process in time; Un-real time job group then comprises the large task not needing process in time.After grouped task, can be different task groups setting priority, such as, task in priority processing mission critical group, secondly process needs the task in the real-time task group of process in time, is then the task in un-real time job group.
In this embodiment, setting should distribute to the resource ratio of mission critical group, real-time task group and un-real time job group respectively, the resource amount of each task groups can be calculated at any time according to resource ratio, so-called resource amount, refer to the number of resources should distributing to task groups, when the subtask quantity run in task groups reaches the resource amount of this task groups, the demand having met this task groups can be thought.In one embodiment, the resource ratio that setting should distribute to mission critical group is 100% of all resources, the resource ratio should distributing to real-time task group is 20% of surplus resources after the computation requirement meeting mission critical group, and the resource ratio should distributing to un-real time job group is 80% of surplus resources after the computation requirement meeting mission critical group.In this embodiment, system has 400 computational resources, now, the resource then distributing to mission critical group is 400 (namely the resource amount of mission critical group is 400), meeting the surplus resources after the computation requirement of mission critical group is 200 resources, the number of resources then distributing to real-time task group is 40 (namely the resource amount of real-time task group is 40), and the number of resources distributing to un-real time job group is 160 (namely the resource amount of un-real time job group is 160).
In one embodiment, said method also comprises: the subtask number that the task in real-time statistics mission critical group is being run and the subtask number that needs run, the subtask number that task in real-time task group is being run and the subtask number that needs run, the subtask number that task in un-real time job group is being run and the subtask number that needs run, the subtask sum run in mission critical group and the subtask sum needing operation, the subtask sum run in real-time task group and the subtask sum needing operation, the subtask sum run in un-real time job group and the subtask sum needing operation.
Step S20, carries out Resourse Distribute according to described resource ratio to multiple task groups.According to above-mentioned embodiment, task is divided into mission critical group, real-time task group and un-real time job group, because the priority of mission critical group is the highest, therefore needs the resource requirement preferentially meeting mission critical group.In one embodiment, before carrying out Resourse Distribute, remove finishing the work in all task groups.
In one embodiment, as shown in Figure 3, the detailed process of step S20 is:
Step S210, judges whether there is the subtask that need run in mission critical group, if so, then performs step S220, otherwise performs step S230.Because the task in mission critical group needs priority processing, need the subtask sum of subtask number and the needs operation run can judge whether to have in mission critical group the subtask needing to run according to the task in the mission critical group of real-time statistics, need in task in mission critical group run subtask number be zero or needs operation subtask add up to zero, then without the need to Resources allocation to mission critical group, enter step S230, if non-vanishing, then enter step S220.
Step S220, according to the scheduling strategy of mission critical group by Resourse Distribute to the subtask in mission critical group.What mission critical group was run usually is large task, and can adopt FIFO strategy, namely task sorted at fifo queue according to the time of submitting to and priority, and Resourse Distribute is to forward and that priority is high subtask of sorting in mission critical group.
Step S230, carries out Resourse Distribute to real-time task group and un-real time job group.When not needing the subtask run in mission critical group, then the subtask that will distribute in real-time task group or un-real time job group of resource.
As shown in Figure 4, in one embodiment, the detailed process of step S230 is:
Step S2301, obtains the resource amount of real-time task group according to the resource ratio should distributing to real-time task group.Owing to setting the resource ratio should distributing to real-time task group in advance, when mission critical group does not need the subtask run, now refresh the resource amount of real-time task group, the resource amount namely calculating real-time task group be surplus resources number after the demand meeting mission critical group and respective resources ratio take advantage of value.
Step S2302, has judged whether that Resourse Distribute is to real-time task group, if so, then performs step S2303, otherwise performs step S2305.The resource amount of real-time task group is greater than zero, then show now there is Resourse Distribute to real-time task group.
Step S2303, judges whether to have in real-time task group the subtask and the total resource amount being less than real-time task group in the subtask run in real-time task group that need to run, if so, then performs step S2304, otherwise perform step S2305.Need the subtask sum of subtask number and the needs operation run can judge whether to have in real-time task group the subtask needing to run according to the task in the real-time task group of statistics, when the subtask sum that the subtask number needing in task to run or needs run is non-vanishing, then show the subtask having needs to run in real-time task group.When running subtask sum be less than the resource amount of real-time task group time, show can by Resourse Distribute to real-time task group to meet the demand of real-time task group.
Step S2304, according to the scheduling strategy of real-time task group by Resourse Distribute to the subtask in real-time task group.What real-time task group was run usually is the little task needing process in time, can adopt concurrence performance strategy, namely sort by the degree of hungering and thirst of task, the task that the resource of distribution is fewer is more hungered and thirst, simultaneously can the subtask quantity that performs of limiting concurrent, by Resourse Distribute to the subtask in real-time task group.
Step S2305, obtains the resource amount of un-real time job group according to the resource ratio should distributing to un-real time job group.Owing to setting the resource ratio should distributing to un-real time job group in advance, now refresh the resource amount of un-real time job group, the resource amount namely calculating un-real time job group be surplus resources number after the demand meeting mission critical group and respective resources ratio take advantage of value.
Step S2306, has judged whether that Resourse Distribute is to un-real time job group, if so, then performs step S2307, otherwise performs step S2309.The resource amount of un-real time job group is greater than zero, shows there is Resourse Distribute to un-real time job group.
Step S2307, judges whether to have in un-real time job group the subtask and the total resource amount being less than un-real time job group in the subtask run in un-real time job group that need to run, if so, then performs step S2308, otherwise perform step S2309.Need the subtask sum of subtask number or the needs operation run can judge whether to have in un-real time job group the subtask needing to run according to the task in the un-real time job group of statistics, when the subtask sum that the subtask number needing in task to run is non-vanishing or needs run is non-vanishing, then show the subtask having needs to run in un-real time job group.When running subtask sum be less than the resource amount of un-real time job group time, show can by Resourse Distribute to un-real time job group to meet the demand of un-real time job group.
Step S2308, according to the scheduling strategy of un-real time job group by Resourse Distribute to the subtask in un-real time job group.What un-real time job group was run usually is large task, and can adopt FIFO strategy, namely task sorted at fifo queue according to the time of submitting to and priority, and Resourse Distribute is to forward and that priority is high subtask of sorting in un-real time job group.
Step S2309, obtains the resource amount of real-time task group according to the resource ratio should distributing to real-time task group.Now again refresh the resource amount of real-time task group, the resource amount namely calculating real-time task group be surplus resources number after the demand meeting mission critical group and respective resources ratio take advantage of value.
Step S2310, has judged whether that Resourse Distribute is to real-time task group, if so, then performs step S2311, otherwise terminates.The resource amount of real-time task group is greater than zero, then show there is Resourse Distribute to real-time task group.
Step S2311, judges whether to have in real-time task group the subtask and the total resource amount being less than real-time task group in the subtask run in real-time task group that need to run, if so, then enters step S2312, otherwise terminate.When the task in the real-time task group of adding up needs the subtask sum of subtask or the needs operation run non-vanishing, and the subtask sum that the needs of statistics run is when being less than the resource amount of real-time task group, then show can by Resourse Distribute to real-time task group to meet the demand of real-time task group.
Step S2312, according to the scheduling strategy of real-time task group by Resourse Distribute to the subtask in real-time task group.As implied above, real-time task group according to concurrence performance strategy by Resourse Distribute subtask wherein.Above-mentioned judgement resource judges whether Resourse Distribute again to real-time task group after not needing to distribute to un-real time job group, the demand of the real-time little task processed in time of can fully satisfying the demand, that is also resource may be remained after the demand meeting un-real time job group, now can be used for meeting the demand of real-time task group, resource can be fully utilized.
Cite an actual example the detailed process of the method for scheduling task illustrated in above-mentioned Distributed Data Warehouse below.In this example, system one has 400 computational resources, the resource ratio that setting should distribute to mission critical group is 100% of all resources, the resource ratio should distributing to real-time task group is 20% of the surplus resources after the demand meeting mission critical group, and the resource ratio should distributing to un-real time job group is 80% of the surplus resources after the demand meeting mission critical group.There are 200 subtasks to need to run in mission critical group, have 180 subtasks in real-time task group, in un-real time job group, have 400 subtasks.Now, the resource amount of mission critical group is 200, and the resource amount of real-time task group is 40, and the resource amount of un-real time job group is 160.After running a period of time, the subtask in mission critical group is run complete, and now refreshing the resource amount obtaining mission critical group is 0, and the resource amount of real-time task group is 80, and the resource amount of un-real time job group is 320.If now increased 200 subtasks in mission critical group newly, then refreshing the resource amount obtaining mission critical group has been 400, and the resource amount of real-time task group is 40, and the resource amount of un-real time job group is 160.And whether these 200 resources remaining distribute to the subtask in real-time task group or un-real time job group, Resourse Distribute need be carried out according to the method described above.To any one of remaining 200 resources, first need to have judged whether that Resourse Distribute is to real-time task group, because the resource amount of real-time task group is non-vanishing, therefore Resourse Distribute is had to real-time task group, judge now whether to have in real-time task group the subtask and the total resource amount whether being less than real-time task group in the subtask run that need to run again further, if, then by this Resourse Distribute to real-time task group, otherwise think the demand now having met real-time task group, do not need again by Resourse Distribute to real-time task group, then judge whether that Resourse Distribute is to un-real time job group further, detailed process is with reference to description above, then repeat no more at this.
As shown in Figure 5, the task scheduling system in a kind of Distributed Data Warehouse, comprises grouping module 10 and resource distribution module 40, wherein: grouping module 10 is for being divided into multiple task groups by task by type, and setting should distribute to the resource ratio of task groups respectively.Multiple task groups is carried out Resourse Distribute according to described resource ratio by resource distribution module 40.
In one embodiment, grouping module 10 is mission critical group, real-time task group and un-real time job group by task matching, wherein comprise needs process in time in mission critical group and important task, real-time task group comprises the little task of needs process in time, and un-real time job group comprises some the large tasks not needing process in time.In one embodiment, the resource ratio that grouping module 10 setting should distribute to mission critical group is 100% of all resources, the resource ratio should distributing to real-time task group is 20% of surplus resources after the computation requirement meeting mission critical group, and the resource ratio should distributing to un-real time job group is 80% of surplus resources after the computation requirement meeting mission critical group.
As shown in Figure 2, in one embodiment, this system, except comprising above-mentioned grouping module 10, resource distribution module 40, also comprises task detach module 20 sum counter 30, wherein: task detach module 20 is for knowing finishing the work in task groups before allocating resources.Counter 30 comprises mission critical cluster counters 310, real-time task cluster counters 320 and un-real time job cluster counters 330, wherein: the subtask run in the task that mission critical cluster counters 310 is used in real-time statistics mission critical group and need run subtask, and running in mission critical group subtask sum and need run subtask sum; The subtask that the subtask run in the task that real-time task cluster counters 320 is used in real-time statistics real-time task group and needs run, and the subtask that the subtask run in real-time task group is total and needs run is total; The subtask that the subtask run in the task that un-real time job cluster counters 330 is used in real-time statistics un-real time job group and needs run, and the subtask that the subtask run in un-real time job group is total and needs run is total.
In one embodiment, resource distribution module 40 is for judging whether to have in mission critical group the subtask needing to run, when the subtask sum that the subtask needing in the task in the mission critical group that counter 30 is added up to run or needs run is non-vanishing, then show the subtask having needs to run, then according to the scheduling strategy of mission critical group by Resourse Distribute to the subtask in mission critical group, otherwise Resourse Distribute is carried out to real-time task group and un-real time job group.What mission critical group comprised usually is all large task, adopts first in first out strategy, according to first in first out strategy by Resourse Distribute to the subtask of mission critical group.
The resource ratio that resource distribution module 40 sets according to grouping module 10 further obtains the resource amount of real-time task group, judge whether that Resourse Distribute is to real-time task group, if, then judge further in real-time task group, whether to have the subtask that need run and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, show the resource amount now fully not meeting real-time task group, then according to the scheduling strategy of real-time task group by Resourse Distribute to the subtask in real-time task group, what real-time task group comprised is the little task that needs process in time, concurrence performance strategy can be adopted by Resourse Distribute to the subtask in real-time task group, otherwise Resourse Distribute is carried out to un-real time job group.
Resource distribution module 40 is for obtaining the resource amount of un-real time job group according to resource ratio, judge whether resource dispensing un-real time job group, if, whether then judge further to have in described un-real time job group needs the subtask of operation and the total resource amount being less than un-real time job group in the subtask run in un-real time job group, if, then according to the scheduling strategy in un-real time job group Resourse Distribute given the subtask in described un-real time job group, in like manner, what un-real time job group comprised is large task, first in first out strategy can be adopted by Resourse Distribute to the subtask in un-real time job group, otherwise terminate.In a preferred embodiment, when the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group is greater than the resource amount of un-real time job group, again can also carry out Resourse Distribute to real-time task group, with the demand of the real-time task group of process in time of fully satisfying the demand, resource is fully utilized.
When resource distribution module 40 is also for being greater than the resource amount of un-real time job group when the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group, the resource amount of real-time task group is obtained according to resource ratio, judge whether that Resourse Distribute is to real-time task group further, if then judge further whether to have the subtask that need run in real-time task group and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy in real-time task group by Resourse Distribute to the subtask in real-time task group, otherwise terminate.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (6)

1. the method for scheduling task in Distributed Data Warehouse, comprises the following steps:
A. task is divided into mission critical group, real-time task group and un-real time job group by type, wherein said mission critical group comprises some and needs timing output and very important mission critical, real-time task group comprises the little task of needs process in time, un-real time job group comprises the large task not needing process in time, for different task groups setting priority, task in the crucial group of priority processing, secondly process needs the task in the real-time task group of process in time, then be the task in un-real time job group, setting should distribute to the resource ratio of task groups respectively;
The subtask number that task in real-time statistics mission critical group is being run and the subtask number that needs run, the subtask number that task in real-time task group is being run and the subtask number that needs run, the subtask number that task in un-real time job group is being run and the subtask number that needs run, the subtask sum run in mission critical group and the subtask sum needing operation, the subtask sum run in real-time task group and the subtask sum needing operation, the subtask sum run in un-real time job group and the subtask sum needing operation,
B. according to described resource ratio, Resourse Distribute is carried out to described multiple task groups, specifically:
B1. judge whether there is the subtask that need run in described mission critical group, if so, then perform step B2, otherwise perform step B3;
B2. according to the scheduling strategy of described mission critical group by Resourse Distribute to the subtask in mission critical group;
B3. Resourse Distribute is carried out to described real-time task group and un-real time job group.
2. the method for scheduling task in Distributed Data Warehouse according to claim 1, is characterized in that, described step B3 specifically:
B31. obtain the resource amount of described real-time task group according to described resource ratio, judged whether that Resourse Distribute is to real-time task group, if so, then perform step B32, otherwise perform step B34;
B32. judge whether to have in described real-time task group the subtask and the total resource amount being less than described real-time task group in the subtask run in real-time task group that need to run, if so, then perform step B33, otherwise perform step B34;
B33. according to the scheduling strategy of described real-time task group Resourse Distribute given the subtask in described real-time task group;
B34. obtain the resource amount of un-real time job group according to described resource ratio, judged whether that Resourse Distribute is to un-real time job group, if so, then perform step B35, otherwise terminate;
B35. judge whether to have in described un-real time job group the subtask and the total resource amount being less than un-real time job group in the subtask run in un-real time job group that need to run, if so, then perform step B36, otherwise terminate;
B36. according to the scheduling strategy in described un-real time job group Resourse Distribute given the subtask in described un-real time job group.
3. the method for scheduling task in Distributed Data Warehouse according to claim 2, is characterized in that, described method also comprises:
When the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group is greater than the resource amount of un-real time job group, the resource amount of real-time task group is obtained according to described resource ratio, judge whether that Resourse Distribute is to real-time task group further, if, then judge in described real-time task group, whether to have the subtask that need run further and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy in real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise terminate.
4. the task scheduling system in Distributed Data Warehouse, is characterized in that, comprising:
Grouping module, by type task is divided into mission critical group, real-time task group and un-real time job group, wherein said mission critical group comprises some and needs timing output and very important mission critical, real-time task group comprises needs and processes to obtain little task in time, un-real time job group comprises not to be needed to process to obtain large task in time, for different task groups setting priority, task in the crucial group of priority processing, secondly process needs the task in the real-time task group of process in time, then be the task in un-real time job group, setting should distribute to the resource ratio of described task groups respectively,
Counter, the subtask number that the subtask number run for the task in real-time statistics mission critical group and needs run, the subtask number that task in real-time task group is being run and the subtask number that needs run, the subtask number that task in un-real time job group is being run and the subtask number that needs run, the subtask sum run in mission critical group and the subtask sum needing operation, the subtask sum run in real-time task group and the subtask sum needing operation, the subtask sum run in un-real time job group and the subtask sum needing operation,
Resource distribution module, according to described resource ratio, Resourse Distribute is carried out specifically for judging whether there is the subtask that need run in described mission critical group to described multiple task groups, if, then according to the scheduling strategy of described mission critical group by Resourse Distribute to the subtask in mission critical group, otherwise Resourse Distribute is carried out to described real-time task group and un-real time job group.
5. the task scheduling system in Distributed Data Warehouse according to claim 4, it is characterized in that, described resource distribution module is also for obtaining the resource amount of real-time task group according to described resource ratio, judge whether that Resourse Distribute is to real-time task group, if, whether then judge further to have in described real-time task group needs the subtask of operation and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy of described real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise the resource amount of un-real time job group is obtained according to described resource ratio, judge whether that Resourse Distribute is to un-real time job group, if, whether then judge further to have in described un-real time job group needs the subtask of operation and the total resource amount being less than un-real time job group in the subtask run in un-real time job group, if, then according to the scheduling strategy in described un-real time job group Resourse Distribute given the subtask in described un-real time job group, otherwise terminate.
6. the task scheduling system in Distributed Data Warehouse according to claim 4, it is characterized in that, when described resource distribution module is also for being greater than the resource amount of un-real time job group when the subtask sum run in the subtask not needing in described un-real time job group to run or un-real time job group, the resource amount of real-time task group is obtained according to described resource ratio, judge whether that Resourse Distribute is to real-time task group further, if, then judge in described real-time task group, whether to have the subtask that need run further and the total resource amount being less than real-time task group in the subtask run in real-time task group, if, then according to the scheduling strategy in real-time task group Resourse Distribute given the subtask in described real-time task group, otherwise terminate.
CN201010188509.5A 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system Active CN102243598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010188509.5A CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010188509.5A CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Publications (2)

Publication Number Publication Date
CN102243598A CN102243598A (en) 2011-11-16
CN102243598B true CN102243598B (en) 2015-09-16

Family

ID=44961669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010188509.5A Active CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Country Status (1)

Country Link
CN (1) CN102243598B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521056B (en) * 2011-12-28 2013-08-14 用友软件股份有限公司 Task allocation device and task allocation method
CN102866920B (en) * 2012-08-02 2016-05-11 杭州海康威视数字技术股份有限公司 Host-guest architecture distributed video treatment system and dispatching method thereof
CN103593232B (en) * 2012-08-15 2017-07-04 阿里巴巴集团控股有限公司 The method for scheduling task and device of a kind of data warehouse
CN103473334B (en) * 2013-09-18 2017-01-11 中控技术(西安)有限公司 Data storage method, inquiry method and system
CN103699445B (en) * 2013-12-19 2017-02-15 北京奇艺世纪科技有限公司 Task scheduling method, device and system
CN103701886A (en) * 2013-12-19 2014-04-02 中国信息安全测评中心 Hierarchic scheduling method for service and resources in cloud computation environment
CN104102543B (en) * 2014-06-27 2018-09-11 北京奇艺世纪科技有限公司 The method and apparatus of adjustment of load in a kind of cloud computing environment
CN104391918B (en) * 2014-11-19 2018-01-19 天津南大通用数据技术股份有限公司 The implementation method of distributed networks database query priority management based on equity deployment
CN106406987B (en) * 2015-07-29 2020-01-03 阿里巴巴集团控股有限公司 Task execution method and device in cluster
US10313429B2 (en) * 2016-04-11 2019-06-04 Huawei Technologies Co., Ltd. Distributed resource management method and system
CN106649471A (en) * 2016-09-28 2017-05-10 新华三技术有限公司 Access control method and apparatus
CN107092999B (en) * 2016-11-08 2021-02-26 北京星选科技有限公司 Task processing method and device
CN108279980A (en) * 2018-01-22 2018-07-13 上海联影医疗科技有限公司 Resource allocation methods and system and resource allocation terminal
CN108280230A (en) * 2018-02-27 2018-07-13 北京中关村科金技术有限公司 A kind of method, apparatus, equipment and the storage medium of analysis data
CN108510213A (en) * 2018-05-11 2018-09-07 苏州华兴源创电子科技有限公司 Task is sequentially allocated to the method, apparatus, equipment and medium of task groups
CN109408215B (en) * 2018-11-07 2021-10-01 郑州云海信息技术有限公司 Task scheduling method and device for computing node
CN111580974B (en) * 2020-05-08 2023-06-27 抖音视界有限公司 GPU instance allocation method, device, electronic equipment and computer readable medium
CN112181662B (en) * 2020-10-13 2023-05-02 深圳壹账通智能科技有限公司 Task scheduling method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169741A (en) * 2006-10-25 2008-04-30 国际商业机器公司 Method and system for determining scheduling priority of operation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2818769B1 (en) * 2000-12-21 2004-06-18 Eads Airbus Sa MULTI-TASK REAL-TIME OPERATION METHOD AND SYSTEM

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169741A (en) * 2006-10-25 2008-04-30 国际商业机器公司 Method and system for determining scheduling priority of operation

Also Published As

Publication number Publication date
CN102243598A (en) 2011-11-16

Similar Documents

Publication Publication Date Title
CN102243598B (en) Method for scheduling task in Distributed Data Warehouse and system
Chen et al. Green-aware workload scheduling in geographically distributed data centers
CN102043675B (en) Thread pool management method based on task quantity of task processing request
CN102063336B (en) Distributed computing multiple application function asynchronous concurrent scheduling method
CN103458052B (en) Resource scheduling method and device based on IaaS cloud platform
CN104657214A (en) Multi-queue multi-priority big data task management system and method for achieving big data task management by utilizing system
CN102722417A (en) Distribution method and device for scan task
CN103970609A (en) Cloud data center task scheduling method based on improved ant colony algorithm
CN109861850B (en) SLA-based stateless cloud workflow load balancing scheduling method
CN103927225A (en) Multi-core framework Internet information processing and optimizing method
US8527988B1 (en) Proximity mapping of virtual-machine threads to processors
CN103793272A (en) Periodical task scheduling method and periodical task scheduling system
CN104023042B (en) Cloud platform resource scheduling method
CN103257896B (en) A kind of Max-D job scheduling method under cloud environment
CN105373426B (en) A kind of car networking memory aware real time job dispatching method based on Hadoop
Xiao et al. A priority based scheduling strategy for virtual machine allocations in cloud computing environment
CN102855293A (en) Mass data processing method of electric vehicle and charging/battery swap facility system
CN103455375B (en) Load-monitoring-based hybrid scheduling method under Hadoop cloud platform
CN103902384A (en) Method and device for allocating physical machines for virtual machines
CN102662761A (en) Method and device for scheduling memory pool in multi-core central processing unit system
CN108428051A (en) MapReduce job scheduling methods and device based on maximum gain towards big data platform
CN105430027A (en) Load balance dynamic pre-allocating method based on a plurality of resource scales
CN103685492A (en) Dispatching method, dispatching device and application of Hadoop trunking system
CN102917014A (en) Resource scheduling method and device
CN104156505B (en) A kind of Hadoop cluster job scheduling method and devices based on user behavior analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant