CN102243598A - Task scheduling method and system in distributed data warehouse - Google Patents

Task scheduling method and system in distributed data warehouse Download PDF

Info

Publication number
CN102243598A
CN102243598A CN2010101885095A CN201010188509A CN102243598A CN 102243598 A CN102243598 A CN 102243598A CN 2010101885095 A CN2010101885095 A CN 2010101885095A CN 201010188509 A CN201010188509 A CN 201010188509A CN 102243598 A CN102243598 A CN 102243598A
Authority
CN
China
Prior art keywords
group
real
task
subtask
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010101885095A
Other languages
Chinese (zh)
Other versions
CN102243598B (en
Inventor
李均
郭玮
洪坤乾
赵伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201010188509.5A priority Critical patent/CN102243598B/en
Publication of CN102243598A publication Critical patent/CN102243598A/en
Application granted granted Critical
Publication of CN102243598B publication Critical patent/CN102243598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The invention provides a task scheduling method and system in a distributed data warehouse. The method comprises the following steps of: A, dividing tasks into a plurality of task groups according to types, and respectively setting the proportion of resources required to be allocated for the task groups; and B, allocating the resources to the plurality of task groups according to the proportion of the resources. The system comprises a grouping module and a resource allocation module, wherein the grouping module is used for dividing tasks into a plurality of task groups according to types and respectively setting the proportion of resources required to be allocated for the task groups, and the resource allocation module is used for allocating the resources to the plurality of task groups according to the proportion of the resources. By adopting the task scheduling method and system in the distributed data warehouse, provided by the invention, the resources can be reasonably allocated, the requirements for calculating small tasks in real time can be met, and the requirements for calculating large tasks not in real time can also be met.

Description

Method for scheduling task in the distributed data warehouse and system
[technical field]
The present invention relates to technical field of data processing, relate in particular to method for scheduling task and system in a kind of distributed data warehouse.
[background technology]
Data warehouse (Data Warehouse) is the structural data environment of decision support system (DSS) and on-line analysis application data source, can study and solve the problem of the information of obtaining from database.The distributed data warehouse refers to use based on GFS (Google File System, an extendible distributed file system) and MapReduce (a kind of programming model is used for the concurrent operation of large-scale dataset) correlation technique the data warehouse solution of mass memory and calculation services is provided.
The distributed data warehouse that adopts the MapReduce programming model to realize, when carrying out multi-task scheduling, adopt FIFO (First Input First Output usually, first in first out) scheduling strategy, be after the user submits a task (job) to, time of submitting to according to task and the task priority position in fifo queue that sets the tasks, the top priority of task of formation obtains all computational resources of system.
Fig. 1 shows the task scheduling sequence chart in the fifo queue in traditional distributed data warehouse, and it has described the scheduling process of 3 tasks in fifo queue.Supposing the system always has 2 M (Map) and 2 R (Reduce) computational resource.Begin most task 1 and taken all computational resources, 2 M computational resources and 2 R computational resources are scheduled simultaneously, and the expression task that the monocline line is filled is moved.Become cross spider after 2 M computational resources of task 1 finish and fill, then continue 2 M computational resources of scheduling, 1 surplus next M computational resource of last task needs scheduling, and the resource that has more is then distributed to task 2.
Yet traditional this task scheduling mode has Task Distribution resource unfairness, the problem that resource can not obtain an equitable breakdown.When for example big task and little task are moved together, there are some little tasks may be mixed in the middle of the big task in the formation, can cause little task to be in the state of hungering and thirst for a long time like this, the also needs requirement of the real-time little task of response fast when the application scenarios that therefore can't satisfy the distributed data warehouse should satisfy the non real-time big task of user.
[summary of the invention]
Based on this, be necessary to provide the method for scheduling task in a kind of energy reasonable distribution distribution of resource formula data warehouse.
Method for scheduling task in a kind of distributed data warehouse may further comprise the steps: A. is divided into task a plurality of task groups by type, sets the resource ratio that should distribute to task groups respectively; B. according to described resource ratio described a plurality of task groups are carried out resources allocation.
Steps A is: by type task is divided into mission critical group, real-time task group and un-real time job group.
This method also can comprise: the subtask number of subtask number that the task in the real-time statistics mission critical group is being moved and needs operation, subtask number that task in the real-time task group is being moved and the subtask number that need move, subtask number that task in the un-real time job group is being moved and the subtask number that need move, subtask sum that is moving in the mission critical group and the subtask sum that needs operation, subtask sum that is moving in the real-time task group and the subtask sum that needs operation, subtask sum that is moving in the un-real time job group and the subtask sum that needs operation.
Wherein, step B can be: B1. judges the subtask whether operation of needing is arranged in the described mission critical group, if, execution in step B2 then, otherwise execution in step B3; B2. according to the scheduling strategy of described mission critical group with resources allocation to the subtask in the mission critical group; B3. described real-time task group and un-real time job group are carried out resources allocation.
Wherein, step B3 specifically can be: B31. obtains the resource amount of described real-time task group according to described resource ratio, has judged whether resources allocation to the real-time task group, if, execution in step B32 then, otherwise execution in step B34; B32. judge whether the subtask sum that moving in the subtask that needs operation and the real-time task group resource amount less than described real-time task group is arranged in the described real-time task group, if, execution in step B33 then, otherwise execution in step B34; B33. give subtask in the described real-time task group according to the scheduling strategy of described real-time task group with resources allocation; B34. obtain the resource amount of un-real time job group according to described resource ratio, judged whether resources allocation to the un-real time job group, if, execution in step B35 then, otherwise execution in step B37; B35. judge whether the subtask sum that moving in the subtask that needs operation and the un-real time job group resource amount less than the un-real time job group is arranged in the described un-real time job group, if, execution in step B36 then, otherwise finish; B36. give subtask in the described un-real time job group according to the scheduling strategy in the described un-real time job group with resources allocation.
Wherein, this method can also comprise: when the subtask sum that is moving in subtask that does not need in the described un-real time job group to move or the un-real time job group during greater than the resource amount of un-real time job group, obtain the resource amount of real-time task group according to described resource ratio, judged whether that further resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation, otherwise finish according to the scheduling strategy in the real-time task group.
In addition, also be necessary to provide a kind of task scheduling system in can reasonable distribution distribution of resource formula data warehouse.
Task scheduling system in a kind of distributed data warehouse comprises: grouping module, by type task is divided into a plurality of task groups, and set the resource ratio that should distribute to described task groups respectively; Resource distribution module carries out resources allocation according to described resource ratio to described a plurality of task groups.
This grouping module can be divided into task mission critical group, real-time task group and un-real time job group by type.
This system also can comprise counter, is used for the subtask number that the task of real-time statistics mission critical group moving and the subtask number of needs operation, subtask number that task in the real-time task group is being moved and the subtask number that need move, subtask number that task in the un-real time job group is being moved and the subtask number that need move, subtask sum that is moving in the mission critical group and the subtask sum that needs operation, subtask sum that is moving in the real-time task group and the subtask sum that needs operation, subtask sum that is moving in the un-real time job group and the subtask sum that needs operation.
Wherein, resource distribution module also can be used for judging the subtask whether operation of needing is arranged in the described mission critical group, if, then according to the scheduling strategy of described mission critical group with resources allocation to the subtask in the mission critical group, otherwise described real-time task group and un-real time job group are carried out resources allocation.
Wherein, resource distribution module also can be used for obtaining according to described resource ratio the resource amount of real-time task group, judged whether that resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation according to the scheduling strategy of described real-time task group, otherwise obtain the resource amount of un-real time job group according to described resource ratio, judged whether that resources allocation is to the un-real time job group, if, then further judge whether to have in the described un-real time job group and need the subtask sum that moving in the subtask moved and the un-real time job group resource amount less than the un-real time job group, if, then give subtask in the described un-real time job group with resources allocation, otherwise finish according to the scheduling strategy in the described un-real time job group.
Wherein, resource distribution module also can be used for when the subtask sum that is moving in subtask that does not need in the described un-real time job group to move or the un-real time job group during greater than the resource amount of un-real time job group, obtain the resource amount of real-time task group according to described resource ratio, judged whether that further resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation, otherwise finish according to the scheduling strategy in the real-time task group.
Method for scheduling task and system in the above-mentioned distributed data warehouse are divided into a plurality of task groups by task, and set the resource ratio that should distribute to the different task group respectively, according to the resource ratio a plurality of task groups are carried out resources allocation, can the reasonable distribution resource.
In addition, by task being divided into mission critical group, real-time task group, un-real time job group, can be when resources allocation, preferentially with resources allocation to the mission critical group; Owing to set the resource ratio of these three task groups, can get access to the resource amount of each task groups, when the mission critical group does not need resource, resource amount according to real-time task group and un-real time job group is allocated, can not only satisfy the resource requirement of un-real time job, also can satisfy the resource requirement of real-time task, can respond real-time task fast, make resource obtain rationally and fully utilizing.And different task groups can adopt the different scheduling strategies that is fit to this task groups, therefore can more rational Resources allocation.
[description of drawings]
Fig. 1 is the task scheduling sequence chart in the fifo queue in the traditional distributed data warehouse;
Fig. 2 is the process flow diagram of the method for scheduling task in the distributed data warehouse among the present invention;
Fig. 3 is the method flow diagram that among the embodiment task groups is carried out resources allocation;
Fig. 4 is the method flow diagram that among the embodiment real-time task group and un-real time job group is carried out resources allocation;
Fig. 5 is the structural representation of the task scheduling system in the distributed data warehouse among the embodiment;
Fig. 6 is the structural representation of the task scheduling system in the distributed data warehouse among another embodiment.
[embodiment]
As shown in Figure 2, the method for scheduling task in a kind of distributed data warehouse may further comprise the steps:
Step S10 is divided into task a plurality of task groups by type, sets the resource ratio that should distribute to task groups respectively.In one embodiment, type according to task is divided into a plurality of task groups with task, for example mission critical group, real-time task group and un-real time job group etc., wherein, the mission critical group comprises some needs regularly output and very important mission critical, for example daily paper of department, monthly magazine etc.; The real-time task group comprises the little task that needs are in time handled; The un-real time job group then comprises does not need the big task of processing in time.After the task grouping, can be different task groups and set priority, for example secondly the task in the priority processing mission critical group handles the task in the real-time task group that needs in time to handle, and is the task in the un-real time job group then.
In this embodiment, set the resource ratio that to distribute to mission critical group, real-time task group and un-real time job group respectively, can calculate the resource amount of each task groups at any time according to the resource ratio, so-called resource amount, refer to the number of resources that to distribute to task groups, the subtask quantity of moving in task groups has reached the resource amount of this task groups, can think the demand that has satisfied this task groups.In one embodiment, the resource ratio that setting should be distributed to the mission critical group is 100% of all resources, the resource ratio that should distribute to the real-time task group is 20% of surplus resources behind the computation requirement that satisfies the mission critical group, and the resource ratio that should distribute to the un-real time job group is 80% of surplus resources behind the computation requirement that satisfies the mission critical group.Among this embodiment, there are 400 computational resources in system, at this moment, the resource of then distributing to the mission critical group is 400 (the resource amount that is the mission critical group is 400), the surplus resources that satisfies behind the computation requirement of mission critical group is 200 resources, the number of resources of then distributing to the real-time task group is 40 (the resource amount that is the real-time task group is 40), and the number of resources of distributing to the un-real time job group is 160 (the resource amount that is the un-real time job group is 160).
In one embodiment, said method also comprises: the subtask number of subtask number that the task in the real-time statistics mission critical group is being moved and needs operation, subtask number that task in the real-time task group is being moved and the subtask number that need move, subtask number that task in the un-real time job group is being moved and the subtask number that need move, subtask sum that is moving in the mission critical group and the subtask sum that needs operation, subtask sum that is moving in the real-time task group and the subtask sum that needs operation, subtask sum that is moving in the un-real time job group and the subtask sum that needs operation.
Step S20 carries out resources allocation according to described resource ratio to a plurality of task groups.According to above-mentioned embodiment task is divided into mission critical group, real-time task group and un-real time job group,, therefore need preferentially satisfies the resource requirement of mission critical group because the priority of mission critical group is the highest.In one embodiment, before carrying out resources allocation, remove finishing the work in all task groups.
In one embodiment, as shown in Figure 3, the detailed process of step S20 is:
Step S210 judges the subtask whether operation of needing is arranged in the mission critical group, if, execution in step S220 then, otherwise execution in step S230.Because the task in the mission critical group needs priority processing, can judge the subtask whether the needs operation is arranged in the mission critical group according to the subtask number of the operation of the task needs in the mission critical group of real-time statistics and the subtask sum of needs operation, the subtask number that needs in the task in the mission critical group to move is that zero subtask that maybe needs to move adds up to zero, then need not Resources allocation to the mission critical group, enter step S230, if non-vanishing, then enter step S220.
Step S220, according to the scheduling strategy of mission critical group with resources allocation to the subtask in the mission critical group.What the mission critical group was moved usually is big task, can adopt the FIFO strategy, and promptly task sorts at fifo queue according to time and the priority submitted to, the subtask that resources allocation is forward to ordering in the mission critical group and priority is high.
Step S230 carries out resources allocation to real-time task group and un-real time job group.When the subtask that need not move in the mission critical group, then resource will be distributed to the subtask in real-time task group or the un-real time job group.
As shown in Figure 4, in one embodiment, the detailed process of step S230 is:
Step S2301 obtains the resource amount of real-time task group according to the resource ratio that should distribute to the real-time task group.Owing to set the resource ratio that to distribute to the real-time task group in advance, when subtask that the mission critical group does not need to move, refresh the resource amount of real-time task group this moment, the resource amount that promptly calculates the real-time task group is to satisfy surplus resources number after the demand of mission critical group and the value of taking advantage of of respective resources ratio.
Step S2302 has judged whether resources allocation to the real-time task group, if, execution in step S2303 then, otherwise execution in step S2305.The resource amount of real-time task group shows then that greater than zero have resources allocation to the real-time task group this moment.
Step S2303 judges whether the subtask sum that moving in the subtask that needs operation and the real-time task group resource amount less than the real-time task group is arranged in the real-time task group, if, execution in step S2304 then, otherwise execution in step S2305.Can judge the subtask whether the needs operation is arranged in the real-time task group according to the subtask number of the operation of the task needs in the real-time task group of statistics and the subtask sum of needs operation, when the subtask sum that the subtask number that needs in the task to move maybe needs to move is non-vanishing, then show the subtask that has needs to move in the real-time task group.When the subtask sum that is moving during less than the resource amount of real-time task group, show can be with resources allocation to the real-time task group to satisfy the demand of real-time task group.
Step S2304, according to the scheduling strategy of real-time task group with resources allocation to the subtask in the real-time task group.What the real-time task group was moved usually is to need the little task of processing in time, can adopt concurrent implementation strategy, promptly the degree of hungering and thirst by task sorts, and the task that the resource of distribution is few is more hungered and thirst more, can limit simultaneously the subtask quantity of concurrent execution, with resources allocation to the subtask in the real-time task group.
Step S2305 obtains the resource amount of un-real time job group according to the resource ratio that should distribute to the un-real time job group.Owing to set the resource ratio that to distribute to the un-real time job group in advance, refresh the resource amount of un-real time job group this moment, the resource amount that promptly calculates the un-real time job group is to satisfy surplus resources number after the demand of mission critical group and the value of taking advantage of of respective resources ratio.
Step S2306 has judged whether resources allocation to the un-real time job group, if, execution in step S2307 then, otherwise execution in step S2309.The resource amount of un-real time job group is greater than zero, and showing has resources allocation to the un-real time job group.
Step S2307 judges whether the subtask sum that moving in the subtask that needs operation and the un-real time job group resource amount less than the un-real time job group is arranged in the un-real time job group, if, execution in step S2308 then, otherwise execution in step S2309.According to the subtask number of the task needs operation in the un-real time job group of statistics maybe the subtask sum of needs operation can judge the subtask whether the needs operation is arranged in the un-real time job group, when the subtask sum that maybe needs to move when the subtask number that needs in the task to move is non-vanishing is non-vanishing, then show the subtask that has needs to move in the un-real time job group.When the subtask sum that is moving during less than the resource amount of un-real time job group, show can be with resources allocation to the un-real time job group to satisfy the demand of un-real time job group.
Step S2308, according to the scheduling strategy of un-real time job group with resources allocation to the subtask in the un-real time job group.What the un-real time job group was moved usually is big task, can adopt the FIFO strategy, and promptly task sorts at fifo queue according to time and the priority submitted to, the subtask that resources allocation is forward to ordering in the un-real time job group and priority is high.
Step S2309 obtains the resource amount of real-time task group according to the resource ratio that should distribute to the real-time task group.Refresh the resource amount of real-time task group this moment once more, the resource amount that promptly calculates the real-time task group is to satisfy surplus resources number after the demand of mission critical group and the value of taking advantage of of respective resources ratio.
Step S2310 has judged whether resources allocation to the real-time task group, if, execution in step S2311 then, otherwise finish.The resource amount of real-time task group is greater than zero, and then showing has resources allocation to the real-time task group.
Step S2311, whether in real-time task group have the subtask sum that in the subtask that need operation and real-time task group moving resource amount less than real-time task group, if then enter step S2312, otherwise finish if judging.When the subtask that need move of task in the real-time task group of statistics maybe needs the subtask sum that moves non-vanishing, and the subtask sum of the needs operation of statistics is during less than the resource amount of real-time task group, then show can be with resources allocation to the real-time task group to satisfy the demand of real-time task group.
Step S2312, according to the scheduling strategy of real-time task group with resources allocation to the subtask in the real-time task group.As implied above, the real-time task group according to concurrent implementation strategy with resources allocation subtask wherein.Above-mentioned judgement resource does not judge whether resources allocation once more to the real-time task group after not needing to distribute to the un-real time job group, the demand of the timely real-time little task of handling of can fully satisfying the demand, that is to say after the demand that satisfies the un-real time job group and may also remain resource, can be used for satisfying the demand of real-time task group this moment, makes resource to be fully utilized.
Cite an actual example below the detailed process of the method for scheduling task in the above-mentioned distributed data warehouse is described.In this example, system one has 400 computational resources, the resource ratio that setting should be distributed to the mission critical group is 100% of all resources, the resource ratio that should distribute to the real-time task group is to satisfy 20% of surplus resources after the demand of mission critical group, and the resource ratio that should distribute to the un-real time job group is to satisfy 80% of surplus resources after the demand of mission critical group.There are 200 subtasks to need operation in the mission critical group, 180 subtasks are arranged in the real-time task group, 400 subtasks are arranged in the un-real time job group.At this moment, the resource amount of mission critical group is 200, and the resource amount of real-time task group is 40, and the resource amount of un-real time job group is 160.After operation a period of time, the subtask operation in the mission critical group finishes, and refresh the resource amount that obtains the mission critical group this moment is 0, and the resource amount of real-time task group is 80, and the resource amount of un-real time job group is 320.If increased 200 subtasks in the mission critical group this moment, then refreshing the resource amount that obtains the mission critical group is 400, and the resource amount of real-time task group is 40, and the resource amount of un-real time job group is 160.And whether remaining these 200 resources distribute to the subtask in real-time task group or the un-real time job group, need carry out resources allocation according to the method described above.To any one of remaining 200 resources, need judge whether that at first resources allocation is to the real-time task group, because the resource amount of real-time task group is non-vanishing, therefore resources allocation is arranged to the real-time task group, whether further judge in the real-time task group has the subtask that needs operation and the subtask sum that is moving this moment whether less than the resource amount of real-time task group again, if, then with this resources allocation to the real-time task group, otherwise think and satisfied the demand of real-time task group this moment, do not need again resources allocation to the real-time task group, judged whether further that then resources allocation is to the un-real time job group, detailed process then repeats no more at this with reference to top description.
As shown in Figure 5, the task scheduling system in a kind of distributed data warehouse comprises grouping module 10 and resource distribution module 40, and wherein: grouping module 10 is used for by type task being divided into a plurality of task groups, sets the resource ratio that should distribute to task groups respectively.Resource distribution module 40 carries out resources allocation according to described resource ratio with a plurality of task groups.
In one embodiment, grouping module 10 is to be mission critical group, real-time task group and un-real time job group with Task Distribution, wherein comprising needs in the mission critical group in time handles and important task, the real-time task group comprises the little task that needs are in time handled, and the un-real time job group comprises does not need some big tasks of processing in time.In one embodiment, grouping module 10 is set the resource ratio that should distribute to the mission critical group and is 100% of all resources, the resource ratio that should distribute to the real-time task group is 20% of surplus resources behind the computation requirement that satisfies the mission critical group, and the resource ratio that should distribute to the un-real time job group is 80% of surplus resources behind the computation requirement that satisfies the mission critical group.
As shown in Figure 2, in one embodiment, this system also comprises task detach module 20 sum counters 30 except comprising above-mentioned grouping module 10, resource distribution module 40, wherein: task detach module 20 is used for knowing finishing the work of task groups before Resources allocation.Counter 30 comprises mission critical set of counters 310, real-time task set of counters 320 and un-real time job set of counters 330, wherein: mission critical set of counters 310 is used for the subtask moved in the task in the real-time statistics mission critical group and need the operation subtask, and the subtask of moving in mission critical group sum and need the subtask sum of operation; Real-time task set of counters 320 is used for subtask of moving in the task in the real-time statistics real-time task group and the subtask that needs operation, and the subtask of moving in real-time task group sum and need the subtask sum of operation; Un-real time job set of counters 330 is used for subtask of moving in the task in the real-time statistics un-real time job group and the subtask that needs operation, and the subtask of moving in un-real time job group sum and need the subtask sum of operation.
In one embodiment, resource distribution module 40 is used for judging whether the mission critical group has the subtask of needs operation, when the subtask that needs in the task in the mission critical group of counter 30 statistics to move maybe needs the subtask sum that moves non-vanishing, then show the subtask that the needs operation is arranged, then according to the scheduling strategy of mission critical group with resources allocation to the subtask in the mission critical group, otherwise real-time task group and un-real time job group are carried out resources allocation.What the mission critical group comprised usually all is big task, adopts the first in first out strategy, according to the first in first out strategy with the subtask of resources allocation to the mission critical group.
The resource ratio that resource distribution module 40 is further set according to grouping module 10 is obtained the resource amount of real-time task group, judged whether that resources allocation is to the real-time task group, if, then further judge the subtask sum that moving in subtask that whether operation of needing is arranged in the real-time task group and the real-time task group resource amount less than the real-time task group, if, show and fully do not satisfy the resource amount of real-time task group this moment, then according to the scheduling strategy of real-time task group with resources allocation to the subtask in the real-time task group, what the real-time task group comprised is the little task that needs are in time handled, can adopt concurrent implementation strategy with resources allocation to the subtask in the real-time task group, otherwise the un-real time job group is carried out resources allocation.
Resource distribution module 40 is used for obtaining according to the resource ratio resource amount of un-real time job group, judged whether resource dispensing un-real time job group, if, then further judge whether to have in the described un-real time job group and need the subtask sum that moving in the subtask moved and the un-real time job group resource amount less than the un-real time job group, if, then give subtask in the described un-real time job group with resources allocation according to the scheduling strategy in the un-real time job group, in like manner, what the un-real time job group comprised is big task, can adopt the first in first out strategy with resources allocation to the subtask in the un-real time job group, otherwise finish.In a preferred embodiment, when the subtask sum that is moving in subtask that does not need in the described un-real time job group to move or the un-real time job group during greater than the resource amount of un-real time job group, can also carry out resources allocation to the real-time task group once more, the demand of the real-time task group of in time handling fully to satisfy the demand is fully utilized resource.
When resource distribution module 40 also is used for the subtask sum that moving greater than the resource amount of un-real time job group in described un-real time job group does not need the subtask moved or un-real time job group, obtain the resource amount of real-time task group according to the resource ratio, judged whether that further resources allocation is to the real-time task group, if then further judge the subtask sum that moving in subtask that whether operation of needing is arranged in the real-time task group and the real-time task group resource amount less than the real-time task group, if, then according to the scheduling strategy in the real-time task group with resources allocation to the subtask in the real-time task group, otherwise finish.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (12)

1. the method for scheduling task in the distributed data warehouse may further comprise the steps:
A. by type task is divided into a plurality of task groups, sets the resource ratio that to distribute to task groups respectively;
B. according to described resource ratio described a plurality of task groups are carried out resources allocation.
2. the method for scheduling task in the distributed data according to claim 1 warehouse is characterized in that, described steps A is: by type task is divided into mission critical group, real-time task group and un-real time job group.
3. the method for scheduling task in the distributed data according to claim 2 warehouse, it is characterized in that described method also comprises: the subtask number of subtask number that the task in the real-time statistics mission critical group is being moved and needs operation, subtask number that task in the real-time task group is being moved and the subtask number that need move, subtask number that task in the un-real time job group is being moved and the subtask number that need move, subtask sum that is moving in the mission critical group and the subtask sum that needs operation, subtask sum that is moving in the real-time task group and the subtask sum that needs operation, subtask sum that is moving in the un-real time job group and the subtask sum that needs operation.
4. the method for scheduling task in the distributed data according to claim 3 warehouse is characterized in that, described step B is:
B1. judge the subtask whether operation of needing is arranged in the described mission critical group, if, execution in step B2 then, otherwise execution in step B3;
B2. according to the scheduling strategy of described mission critical group with resources allocation to the subtask in the mission critical group;
B3. described real-time task group and un-real time job group are carried out resources allocation.
5. the method for scheduling task in the distributed data according to claim 4 warehouse is characterized in that, described step B3 specifically:
B31. obtain the resource amount of described real-time task group according to described resource ratio, judged whether resources allocation to the real-time task group, if, execution in step B32 then, otherwise execution in step B34;
B32. judge whether the subtask sum that moving in the subtask that needs operation and the real-time task group resource amount less than described real-time task group is arranged in the described real-time task group, if, execution in step B33 then, otherwise execution in step B34;
B33. give subtask in the described real-time task group according to the scheduling strategy of described real-time task group with resources allocation;
B34. obtain the resource amount of un-real time job group according to described resource ratio, judged whether resources allocation to the un-real time job group, if, execution in step B35 then, otherwise execution in step B37;
B35. judge whether the subtask sum that moving in the subtask that needs operation and the un-real time job group resource amount less than the un-real time job group is arranged in the described un-real time job group, if, execution in step B36 then, otherwise finish;
B36. give subtask in the described un-real time job group according to the scheduling strategy in the described un-real time job group with resources allocation.
6. the method for scheduling task in the distributed data according to claim 5 warehouse is characterized in that, described method also comprises:
When the subtask sum that is moving in subtask that does not need in the described un-real time job group to move or the un-real time job group during greater than the resource amount of un-real time job group, obtain the resource amount of real-time task group according to described resource ratio, judged whether that further resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation, otherwise finish according to the scheduling strategy in the real-time task group.
7. the task scheduling system in the distributed data warehouse is characterized in that, comprising:
Grouping module is divided into task a plurality of task groups by type, sets the resource ratio that should distribute to described task groups respectively;
Resource distribution module carries out resources allocation according to described resource ratio to described a plurality of task groups.
8. the task scheduling system in the distributed data according to claim 7 warehouse is characterized in that, described grouping module is divided into task mission critical group, real-time task group and un-real time job group by type.
9. the task scheduling system in the distributed data according to claim 8 warehouse, it is characterized in that, described system also comprises counter, is used for the subtask number that the task of real-time statistics mission critical group moving and the subtask number of needs operation, subtask number that task in the real-time task group is being moved and the subtask number that need move, subtask number that task in the un-real time job group is being moved and the subtask number that need move, subtask sum that is moving in the mission critical group and the subtask sum that needs operation, subtask sum that is moving in the real-time task group and the subtask sum that needs operation, subtask sum that is moving in the un-real time job group and the subtask sum that needs operation.
10. the task scheduling system in the distributed data according to claim 9 warehouse, it is characterized in that, described resource distribution module is used for also judging whether described mission critical group has the subtask of the operation of needing, if, then according to the scheduling strategy of described mission critical group with resources allocation to the subtask in the mission critical group, otherwise described real-time task group and un-real time job group are carried out resources allocation.
11. the task scheduling system in the distributed data according to claim 10 warehouse, it is characterized in that, described resource distribution module also is used for obtaining according to described resource ratio the resource amount of real-time task group, judged whether that resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation according to the scheduling strategy of described real-time task group, otherwise obtain the resource amount of un-real time job group according to described resource ratio, judged whether that resources allocation is to the un-real time job group, if, then further judge whether to have in the described un-real time job group and need the subtask sum that moving in the subtask moved and the un-real time job group resource amount less than the un-real time job group, if, then give subtask in the described un-real time job group with resources allocation, otherwise finish according to the scheduling strategy in the described un-real time job group.
12. the task scheduling system in the distributed data according to claim 11 warehouse, it is characterized in that, when described resource distribution module also is used for the subtask sum that moving greater than the resource amount of un-real time job group in described un-real time job group does not need the subtask moved or un-real time job group, obtain the resource amount of real-time task group according to described resource ratio, judged whether that further resources allocation is to the real-time task group, if, then further judge whether to have in the described real-time task group and need the subtask sum that moving in the subtask moved and the real-time task group resource amount less than the real-time task group, if, then give subtask in the described real-time task group with resources allocation, otherwise finish according to the scheduling strategy in the real-time task group.
CN201010188509.5A 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system Active CN102243598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010188509.5A CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010188509.5A CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Publications (2)

Publication Number Publication Date
CN102243598A true CN102243598A (en) 2011-11-16
CN102243598B CN102243598B (en) 2015-09-16

Family

ID=44961669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010188509.5A Active CN102243598B (en) 2010-05-14 2010-05-14 Method for scheduling task in Distributed Data Warehouse and system

Country Status (1)

Country Link
CN (1) CN102243598B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521056A (en) * 2011-12-28 2012-06-27 用友软件股份有限公司 Task allocation device and task allocation method
CN102866920A (en) * 2012-08-02 2013-01-09 杭州海康威视系统技术有限公司 Master-slave structure distributed video processing system and scheduling method thereof
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN103593232A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Task scheduling method and device of data warehouse
CN103701886A (en) * 2013-12-19 2014-04-02 中国信息安全测评中心 Hierarchic scheduling method for service and resources in cloud computation environment
CN104102543A (en) * 2014-06-27 2014-10-15 北京奇艺世纪科技有限公司 Load regulation method and load regulation device in cloud computing environment
CN104391918A (en) * 2014-11-19 2015-03-04 天津南大通用数据技术股份有限公司 Method for achieving distributed database query priority management based on peer deployment
CN103699445B (en) * 2013-12-19 2017-02-15 北京奇艺世纪科技有限公司 Task scheduling method, device and system
CN106406987A (en) * 2015-07-29 2017-02-15 阿里巴巴集团控股有限公司 Task execution method and apparatus in cluster
CN106649471A (en) * 2016-09-28 2017-05-10 新华三技术有限公司 Access control method and apparatus
CN107092999A (en) * 2016-11-08 2017-08-25 北京小度信息科技有限公司 Task processing method and device
WO2017177865A1 (en) * 2016-04-11 2017-10-19 Huawei Technologies Co., Ltd. Distributed resource management method and system
CN108279980A (en) * 2018-01-22 2018-07-13 上海联影医疗科技有限公司 Resource allocation methods and system and resource allocation terminal
CN108280230A (en) * 2018-02-27 2018-07-13 北京中关村科金技术有限公司 A kind of method, apparatus, equipment and the storage medium of analysis data
CN108510213A (en) * 2018-05-11 2018-09-07 苏州华兴源创电子科技有限公司 Task is sequentially allocated to the method, apparatus, equipment and medium of task groups
CN109408215A (en) * 2018-11-07 2019-03-01 郑州云海信息技术有限公司 A kind of method for scheduling task and device of calculate node
CN111580974A (en) * 2020-05-08 2020-08-25 北京字节跳动网络技术有限公司 GPU instance distribution method and device, electronic equipment and computer readable medium
CN112181662A (en) * 2020-10-13 2021-01-05 深圳壹账通智能科技有限公司 Task scheduling method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120662A1 (en) * 2000-12-21 2002-08-29 Serge Goiffon Real time multi-task process and operating system
CN101169741A (en) * 2006-10-25 2008-04-30 国际商业机器公司 Method and system for determining scheduling priority of operation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120662A1 (en) * 2000-12-21 2002-08-29 Serge Goiffon Real time multi-task process and operating system
CN101169741A (en) * 2006-10-25 2008-04-30 国际商业机器公司 Method and system for determining scheduling priority of operation

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521056B (en) * 2011-12-28 2013-08-14 用友软件股份有限公司 Task allocation device and task allocation method
CN102521056A (en) * 2011-12-28 2012-06-27 用友软件股份有限公司 Task allocation device and task allocation method
CN102866920A (en) * 2012-08-02 2013-01-09 杭州海康威视系统技术有限公司 Master-slave structure distributed video processing system and scheduling method thereof
CN102866920B (en) * 2012-08-02 2016-05-11 杭州海康威视数字技术股份有限公司 Host-guest architecture distributed video treatment system and dispatching method thereof
CN103593232B (en) * 2012-08-15 2017-07-04 阿里巴巴集团控股有限公司 The method for scheduling task and device of a kind of data warehouse
CN103593232A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Task scheduling method and device of data warehouse
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN103473334B (en) * 2013-09-18 2017-01-11 中控技术(西安)有限公司 Data storage method, inquiry method and system
CN103701886A (en) * 2013-12-19 2014-04-02 中国信息安全测评中心 Hierarchic scheduling method for service and resources in cloud computation environment
CN103699445B (en) * 2013-12-19 2017-02-15 北京奇艺世纪科技有限公司 Task scheduling method, device and system
CN104102543A (en) * 2014-06-27 2014-10-15 北京奇艺世纪科技有限公司 Load regulation method and load regulation device in cloud computing environment
CN104102543B (en) * 2014-06-27 2018-09-11 北京奇艺世纪科技有限公司 The method and apparatus of adjustment of load in a kind of cloud computing environment
CN104391918A (en) * 2014-11-19 2015-03-04 天津南大通用数据技术股份有限公司 Method for achieving distributed database query priority management based on peer deployment
CN104391918B (en) * 2014-11-19 2018-01-19 天津南大通用数据技术股份有限公司 The implementation method of distributed networks database query priority management based on equity deployment
CN106406987A (en) * 2015-07-29 2017-02-15 阿里巴巴集团控股有限公司 Task execution method and apparatus in cluster
CN106406987B (en) * 2015-07-29 2020-01-03 阿里巴巴集团控股有限公司 Task execution method and device in cluster
US20180150326A1 (en) * 2015-07-29 2018-05-31 Alibaba Group Holding Limited Method and apparatus for executing task in cluster
WO2017177865A1 (en) * 2016-04-11 2017-10-19 Huawei Technologies Co., Ltd. Distributed resource management method and system
US10313429B2 (en) 2016-04-11 2019-06-04 Huawei Technologies Co., Ltd. Distributed resource management method and system
CN106649471A (en) * 2016-09-28 2017-05-10 新华三技术有限公司 Access control method and apparatus
CN107092999A (en) * 2016-11-08 2017-08-25 北京小度信息科技有限公司 Task processing method and device
CN107092999B (en) * 2016-11-08 2021-02-26 北京星选科技有限公司 Task processing method and device
CN108279980A (en) * 2018-01-22 2018-07-13 上海联影医疗科技有限公司 Resource allocation methods and system and resource allocation terminal
CN108280230A (en) * 2018-02-27 2018-07-13 北京中关村科金技术有限公司 A kind of method, apparatus, equipment and the storage medium of analysis data
CN108510213A (en) * 2018-05-11 2018-09-07 苏州华兴源创电子科技有限公司 Task is sequentially allocated to the method, apparatus, equipment and medium of task groups
CN109408215A (en) * 2018-11-07 2019-03-01 郑州云海信息技术有限公司 A kind of method for scheduling task and device of calculate node
CN109408215B (en) * 2018-11-07 2021-10-01 郑州云海信息技术有限公司 Task scheduling method and device for computing node
CN111580974A (en) * 2020-05-08 2020-08-25 北京字节跳动网络技术有限公司 GPU instance distribution method and device, electronic equipment and computer readable medium
CN112181662A (en) * 2020-10-13 2021-01-05 深圳壹账通智能科技有限公司 Task scheduling method and device, electronic equipment and storage medium
CN112181662B (en) * 2020-10-13 2023-05-02 深圳壹账通智能科技有限公司 Task scheduling method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102243598B (en) 2015-09-16

Similar Documents

Publication Publication Date Title
CN102243598A (en) Task scheduling method and system in distributed data warehouse
CN103458052B (en) Resource scheduling method and device based on IaaS cloud platform
CN102063336B (en) Distributed computing multiple application function asynchronous concurrent scheduling method
CN103729246B (en) Method and device for dispatching tasks
Mills et al. A stochastic framework for multiprocessor soft real-time scheduling
CN104023042B (en) Cloud platform resource scheduling method
CN100576177C (en) Bidirectional grade gridding resource scheduling method based on the QoS constraint
CN103970609A (en) Cloud data center task scheduling method based on improved ant colony algorithm
CN109861850B (en) SLA-based stateless cloud workflow load balancing scheduling method
CN102722417A (en) Distribution method and device for scan task
WO2019000780A1 (en) Method and device for order scheduling, electronic device, and computer-readable storage medium
CN103927225A (en) Multi-core framework Internet information processing and optimizing method
CN101582043A (en) Dynamic task allocation method of heterogeneous computing system
CN111782355A (en) Cloud computing task scheduling method and system based on mixed load
CN107292419A (en) The Cost Optimization strategy that dynamic Multi-workflow scheduling is performed in a kind of mixing cloud environment
CN104965755A (en) Cloud service workflow scheduling method based on budget constraint
CN106326003A (en) Operation scheduling and computing resource allocation method
Xiao et al. A priority based scheduling strategy for virtual machine allocations in cloud computing environment
CN107562528A (en) Support the blocking on-demand computing method and relevant apparatus of a variety of Computational frames
CN110347504A (en) Many-core computing resource dispatching method and device
CN105373426A (en) Method for memory ware real-time job scheduling of car networking based on Hadoop
Granichin et al. Comparing adaptive and non-adaptive models of cargo transportation in multi-agent system for real time truck scheduling
CN102402461A (en) Balanced scheduling method based on operation scale
CN110311965A (en) Method for scheduling task and system under a kind of cloud computing environment
CN106951313A (en) The sub- time limit acquisition methods of Multi-workflow shared resource cooperative scheduling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant