CN105653357A

CN105653357A - Hadoop cluster online total completion time minimizing scheduling method and device

Info

Publication number: CN105653357A
Application number: CN201410635768.6A
Authority: CN
Inventors: 田文洪; 李国忠; 蒋亚秋; 徐敏贤
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-11-11
Filing date: 2014-11-11
Publication date: 2016-06-08

Abstract

The invention embodiment discloses an online Hadoop cluster scheduling management method and device, and relates to the cluster scheduling field; the method comprises the following steps: calculating arrival online working Map and Reduce phase lasting time (online work continuously arrives until all online works arrive); distributing system available maximum resource for each online work; submitting a novel result to the scheduling system; the system distributes recalculated MapReduce resource quantity needed by tasks, thus processing tasks. The method and device are suitable for online Hadoop cluster node dynamic scheduling management; in an execution period of each online work, the system available maximum resource is assigned for each work, thus minimizing the online total completion time.

Description

The scheduling method of the online minimumization total complete time of a kind of Hadoop cluster and device

Technical field

The present invention relates to online colony dispatching technical field, particularly relate to scheduling method, the dispatching device of a kind of Hadoop cluster on-line system.

Background technology

Hadoop is a software framework that mass data carries out in the way of reliable, efficient, scalable distributed processing. The main task deployment of Hadoop cluster (cluster) is divided into client terminal (Client) machine, major joint (Masternodes) and from node (Slavenodes) 3 parts, as shown in Figure 1. Data store (Hadoop distributed document system, HadoopDistributedFileSystem, HDFS) and to the supervision operating in the parallel computation on these data (MapReduce) being two key function module of Hadoop, these two key function module are responsible for primarily of major joint. HDFS adopts principal and subordinate (Master/Slave) structural models, and a HDFS cluster is made up of a name node (NameNode) and some data nodes (DataNode). MapReduce framework runs separately job trace device (JobTracker) on the primary node by one and operates in each cluster and jointly forms from task tracking device (TaskTracker) node. HDFS and MR forms the core of Hadoop distributed systems architecture jointly.

Hadoop is a distributed parallel programming framework increased income achieving MapReduce pattern, and the features such as it is general with it, convenient and practical are widely applied in cloud computing and big data processing epoch. MapReduce is the programming model of a kind of concurrent operation for large-scale dataset (being greater than 1TB). MapReduce working process comprises two stages: Map stage and Reduce stage. The Map stage comprises multiple Map task, and the Reduce stage comprises multiple Reduce task. Before formal execution Map function, it is necessary to input data are carried out burst, and each Map task processes a logic burst (split). Split contains the metadata informations such as data start, data length, data place node, and its division methods is determined by user oneself usually. The quantity of split determines the quantity of Map task.

HDFS realizes the base support of distributed storage being stored in Hadoop system structure.

The name space of NameNode execute file system, as opened, close, Rename file or catalogue etc., is also responsible for the mapping of data block to concrete DataNode. DataNode is data memory node, is also computing node, and it is responsible for processing the file read-write of file system client, and carries out the establishment of database, deletion and duplication work under the unified scheduling of NameNode.

Each subtask task of JobTracker primary responsibility scheduling Job runs on TaskTracker, and monitors them, if the task finding that there is failure just reruns it. JobTracker is also responsible for the information such as the implementation progress of tracking task, resource usage quantity, and these information are told task dispatcher (TaskScheduler), so that these resources allocations are given suitable task when resource occurs idle by scheduling device. Call heartbeat RPC function TaskTracker active period, to JobTracker report node and task running state information, get JobTracker simultaneously and return the various order of heartbeat packet and perform operation accordingly. TaskTracker uses " slot " equivalent to divide the stock number on this node. Slot is a logical concept, is the resource units of Hadoop, and the quantity of the slot of a node is used for representing the capacity of the resource of certain node or perhaps the size of ability. Slot is divided into Mapslot and Reduceslot two kinds, respectively for MapTask and ReduceTask. Each operation application resource is in units of slot, and each node can determine computing power and the storer of oneself, it is determined that the slot total amount oneself comprised. When certain operation to be started to perform, first applying for slot to JobTracker, a task just has an opportunity to run after getting a slot, and the effect that Hadoop dispatches device is exactly the idle slot on each TaskTracker distributes to task use.

Client machine comprises all settings of Hadoop cluster, but it is neither major joint neither from node. The effect of client machine preserves data to cluster, submits to operation to carry out data processing to MapReduce, obtains the calculation result of task of checking.

Core technology in Hadoop group system is task scheduling, and in cloud computing is studied, the online assignment scheduling of MapReduce environment brings new problem and challenge, causes more and more attention. At first, the FIFO (FIFO) of Hadoop acquiescence dispatches device specially for periodically performing extensive batch jobs and design. Along with the increase of the number of users of MapReduce group system, the appearance of computing power scheduling device and Hadoop equity dispatching device (HFS:HadoopFairScheduling), provide more efficient cluster and share mode, but, existing scheduling device can't provide the support being minimized in line operation set completion date, when submitting to online assignment to be an operation set, thus completion date longer may cause total energy consumption higher.

Summary of the invention

The technical problem to be solved in the present invention is: scheduling method, the dispatching device providing a kind of Hadoop cluster on-line system, it is possible to be minimized in the total complete time of line operation set.

For solving the problems of the technologies described above, first aspect, embodiments provides a kind of online Hadoop group system scheduling method, and described method comprises following three steps:

Calculate the time length in Map and the Reduce stage arriving online assignment;

For each online assignment distribution system can maximum resource;

Online assignment is dispatched according to the execution order of prerequisite variable.

According to first aspect, in the implementation that the first is possible, described for each operation distribution system can maximum resource in:

When the system resource R of job request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.

According to first aspect, in the 2nd kind of possible implementation, described for each operation distribution system can maximum resource in:

When the system resource R of job request be less than system can maximum resource S time, according to

Described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S.

According to first aspect, in the implementation that the third is possible, described for each operation distribution system can maximum resource in:

When the system resource R of job request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;

Wherein, N=R/S upwards gets whole.

The third possible implementation according to first aspect, in the 4th kind of possible implementation, when R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;

When R/S is integer, for every ripple task matching system can maximum resource.

According to any one possible implementation above-mentioned of first aspect or first aspect, in the 5th kind of possible implementation, described job property is A_iComprise time length in stage and stage type, and

Wherein, A_iIt is the attribute of the operation, t_iFor operation arrives the moment of system, m_iAnd r_iIt is respectively operation J_iThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.

The 5th kind of possible implementation according to first aspect, in the 6th kind of possible implementation, described according to online assignment attribute, according to prerequisite variable algorithm arrange online assignment perform order.

The 6th kind of possible implementation according to first aspect, in the 7th kind of possible implementation, described method also comprises step:

For each online assignment is numbered.

The 7th kind of possible implementation according to first aspect, in the 8th kind of possible implementation, described method also comprises step:

The time length in stage of estimation online assignment.

The 8th kind of possible implementation according to first aspect, in the 9th kind of possible implementation, in the step of the time length in stage of described estimation operation:

When the system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.

The 8th kind of possible implementation according to first aspect, in the tenth kind of possible implementation, in the step of the time length in stage of described estimation operation:

When the system resource R of online assignment request be not equal to described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.

Second aspect, embodiments provides the dispatching device of a kind of online Hadoop group system, and the big module of described device three comprises:

Distribution module, calculates the time length in Map and the Reduce stage arriving online assignment, for each online assignment distribution system can maximum resource;

Order module, according to prerequisite variable algorithm arrange operation perform order;

Scheduling module, for dispatching online assignment according to described execution order.

According to second aspect, in the implementation that the first is possible, described distribution module: the system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.

According to second aspect, in the 2nd kind of possible implementation, described distribution module:

The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S.

According to second aspect, in the implementation that the third is possible, described distribution module:

The system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;

Wherein, N=R/S gets whole downwards.

The third possible implementation according to second aspect, in the 4th kind of possible implementation, device distribution module when R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;

According to any one possible implementation above-mentioned of second aspect or second aspect, in the 5th kind of possible implementation, described job property for comprising time length in stage and stage type, and

The 5th kind of possible implementation according to second aspect, in the 6th kind of possible implementation, described order module:

Described according to online assignment attribute, according to prerequisite variable algorithm arrange online assignment perform order.

The 6th kind of possible implementation according to second aspect, in the 7th kind of possible implementation, described device also comprises:

Numbering module, for being each online assignment numbering.

The 7th kind of possible implementation according to second aspect, in the 8th kind of possible implementation, described device also comprises:

Estimation block, for estimating the time length in stage of online assignment.

The 8th kind of possible implementation according to second aspect, in the 9th kind of possible implementation, described device also comprises:

The system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the prior imformation system of the system resource of described online assignment request and system.

The 8th kind of possible implementation according to second aspect, in the tenth kind of possible implementation, described device also comprises:

The system resource R of online assignment request be not equal to described system can maximum resource S time, resource assigned by described online assignment and the prior imformation of system estimate the time length in stage of described online assignment.

The third aspect, embodiments provides a kind of online Hadoop group system, comprises any one dispatching device described in possible implementation of second aspect or second aspect.

Fourth aspect, embodiments provides the method for reducing power consumption of a kind of online Hadoop group system, it is characterised in that, described online Hadoop group system uses any one method described in possible implementation of first aspect or first aspect to dispatch.

Accompanying drawing explanation

Fig. 1 is the Hadoop colony system deployment schematic diagram of an embodiment of the present invention;

Fig. 2 is the schema of the Hadoop group system scheduling method of an embodiment of the present invention;

Fig. 3 is the structural representation of the Hadoop group system dispatching device of an embodiment of the present invention;

Fig. 4 is the structural representation of the Hadoop group system dispatching device of the present invention's another kind of embodiment;

Fig. 5 is the execution result schematic diagram performing operation according to the Hadoop group system scheduling method of an embodiment of the present invention.

Fig. 6 is R_k��M_k+1Legend;

Fig. 7 is R_k<M_k+1Legend;

Embodiment

Below according to drawings and embodiments, the specific embodiment of the present invention is described in further detail. Following examples are for illustration of the present invention, but are not used for limiting the scope of the invention.

For a better understanding of the present invention, now the term related in the embodiment of the present invention is done following explanation:

The one batch of online assignment total complete time (totalmakespan) in Hadoop group system: refer to execute in a certain order the total time that all Map/Reduce stages of this batch job spend, namely terminate the total time spent from the Map stage of first online assignment to last online assignment Reduce stage.

Ripple (wave): refer to that an operation needs the number of times performed in a given Hadoop group system, when the resource of job request is R, system can be S with maximum resource, and when R is greater than S, the execution number of times of this operation also namely wave number N equal R/S and upwards get whole. Such as a job request uses 30 Mapslots and 30 Reduceslots, one have 20 �� 20 available resources (20 Mapslots and 20 Reduceslots) Hadoop group system in, its number of times performed is that (the Map stage performs 2 ripples to 2 ripples, the Reduce stage is also 2 ripples), the rest may be inferred.

As shown in Figure 2, embodiments providing a kind of scheduling method of online Hadoop group system, the method comprising the steps of:

S210. it is online assignment sequence that each arrives, and according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged operation.

S220. for each online assignment distribution system can maximum resource.

The method of the embodiment of the present invention is so that, in online situation, the process of the operation set comprising n operation that client machine is inputted by group system is example. Calculate the time length in Map and the Reduce stage arriving online assignment, for each operation distribution system can maximum resource, according to prerequisite variable execution order dispatch online assignment, it is achieved minimumization total complete time.

In various embodiments of the present invention, system can resource refer to MapReduce resource timeslot (slot) total in given Hadoop group system. Various embodiments of the present invention are assumed the node in Hadoop group system has a Mapslot and Reduceslot simultaneously, for Hadoop group system, there are 60 nodes, it is possible to represent that its total available maximum resource is 60 �� 60 slots. Certainly this can also dynamically according to particular case setting.

S230. operation is dispatched according to described execution order.

To sum up, the time length of the two benches (Map stage and Reduce stage) of MapReduce is calculated by the method for the embodiment of the present invention, and each online assignment the term of execution, for the available resources that its distribution system is maximum, and then can minimumization total complete time, reduce group system energy consumption.

Tool says it, in step S210, for each online assignment: when the system resource R of job request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.Such as, when the system resource R of job request is 30 �� 30 (30 Mapslots, 30 Reduceslots), system can maximum resource S be 30 �� 30 (30 Mapslots, 30 Reduceslots) time, the system resource by 30 �� 30 all distributes to this operation.

When the system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S. Such as, when the system resource R of job request is 20 �� 20, system can maximum resource S when being 30 �� 30, after again this operation being carried out burst, the system resource by 30 �� 30 all distributes to this operation.

When the system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation burst, wherein, N=R/S upwards gets whole.

When R/S is not integer, be the 1st to N ripple task matching system can maximum resource S, also, for front N-1 ripple, system is fully loaded to be performed, and last ripple also should be fully loaded with execution. Such as, the resource R that job request uses is 30 �� 30, and when system can be 20 �� 20 with maximum resource S, the number of times of the execution of this operation is 2 ripples, and during the first ripple, system is fully loaded performs, and during the 2nd ripple, system is also fully loaded performs.

When R/S is integer, for every ripple task matching system can maximum resource, now, similar when the execution of every ripple task and R=S.

In addition, in various embodiments of the present invention, online assignment attribute is A_iComprise time of arrival, time length in stage and stage type, and

S240. it is each online assignment numbering.

In addition, in order to know the attribute of each online assignment, the method for the embodiment of the present invention also comprises step:

S250. the time length in stage of online assignment is estimated.

In step s 250, when the system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described operation according to the system resource R of online assignment request and system prior imformation. For periodic job, from the execution conventional according to group system, it can be analyzed automatically. When needs process new operation set, it is possible to use the operation overview extracted, the estimated value of the Map stage of computational tasks and the time length in stage in Reduce stage.

And when the system resource R of online assignment request be not equal to described system can maximum resource S time, estimate the time length in stage of described operation according to the prior imformation in resource assigned by operation described in step S210 and system.

There are two kinds of extreme situations in online assignment scheduling, the general scheduling time is all between therebetween:

1) best-case is consistent with off-line scheduling, and Johnson1954 paper " OptimalTwo-andThree-StageProductionScheduleswithSetupTim esIncluded_Johnson_1954 " proves;

2) minimum makespan of worst case

T = M_{1} + R_{n} + Σ_{i = 1}^{n - 1} \max {R_{i}, M_{i + 1}}

A., as n=1, both only had a MapReduce task:

T=M₁+R₁

Above-mentioned formula

T = M_{1} + R_{n} + Σ_{i = 1}^{n - 1} \max {R_{i}, M_{i + 1}}

Set up;

B. assume as n=k, formula

T = M_{1} + R_{k} + Σ_{i = 1}^{k - 1} \max {R_{i}, M_{i + 1}}

Set up,

As n=k+1,

Work as R_k��M_k+1Time,

As shown in Figure 6, have:

\begin{matrix} T = M_{1} + R_{k} + Σ_{i = 1}^{k - 1} \max {R_{i}, M_{i + 1}} + R_{k + 1} \\ = T = M_{1} + R_{k + 1} + Σ_{i = 1}^{k} \max {R_{i}, M_{i + 1}} \end{matrix}

Work as R_k<M_k+1Time,

As shown in Figure 7, have:

\begin{matrix} T = M_{1} + M_{k} + R_{k + 1} + Σ_{i = 1}^{k - 1} \max {R_{i}, M_{i + 1}} \\ = T = M_{1} + R_{k + 1} + Σ_{i = 1}^{k} \max {R_{i}, M_{i + 1}} \end{matrix}

So

T = M_{1} + R_{k + 1} + Σ_{i = 1}^{k} \max {R_{i}, M_{i + 1}}

Set up;

By 1), 2) learn the minimum makespan of worst case

T = M_{1} + R_{n} +

Σ_{i = 1}^{n - 1} \max {R_{i}, M_{i + 1}}

Set up;

It will be appreciated by those skilled in the art that, in the method for various embodiments of the present invention, the sequence number size of each step does not also mean that the priority of execution order, and the execution order of each step should be determined with its function and inherent logic, and the implementation process of the specific embodiment of the invention should not formed any restriction.

As shown in Figure 3, the scheduling that the embodiment of the present invention additionally provides a kind of online Hadoop group system fills 300, and this device 300 comprises:

Distribution module 310, for for each online assignment distribution system can maximum resource.

The device of the embodiment of the present invention is so that, in online situation, the process of the operation set comprising n operation that client machine is inputted by group system is example. Calculate the time length in Map and the Reduce stage arriving online assignment, for each operation distribution system can maximum resource, according to prerequisite variable execution order dispatch online assignment, it is achieved minimumization total complete time.

Order module 320, for according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged online assignment.

Scheduling module 330, for dispatching online assignment according to described execution order.

Tool says it, for each online assignment, and distribution module 320:

The system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S. Such as, when the system resource R of job request is 30 �� 30 (30 Mapslots, 30 Reduceslots), system can maximum resource S be 30 �� 30 (30 Mapslots, 30 Reduceslots) time, the system resource by 30 �� 30 all distributes to this operation.

The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S. Such as, when the system resource R of job request is 20 �� 20, system can maximum resource S when being 30 �� 30, after again this operation being carried out burst, the system resource by 30 �� 30 all distributes to this operation.

The system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation burst, wherein, N=R/S upwards gets whole.

If R/S is not integer, be the 1st to N ripple task matching system can maximum resource S, also, for the 1st to N ripple, system is fully loaded to be performed. Such as, the resource R that job request uses is 30 �� 30, when system can be 20 �� 20 with maximum resource S, the number of times of the execution of this operation is 2 ripples, it it is the resource of the first ripple task matching 20 �� 20, being the resource that the task of the 2nd ripple also distributes 20 �� 20, during the first ripple, system is fully loaded performs, and during the 2nd ripple, system is also fully loaded performs.

If R/S is integer, for every ripple task matching system can maximum resource, now, similar when the execution of every ripple task and R=S.

Correspondingly, as shown in Figure 4, the device of the embodiment of the present invention also comprises step:

Numbering module 340, for being each job number.

In addition, in order to know the attribute of each online assignment, the device of the embodiment of the present invention also comprises: estimation block 350, for estimating the time length in stage of online assignment. Estimation block 350 the system resource R of industry request equal described system can maximum resource S time, estimate the time length in stage of described operation according to the system resource R of job request and system prior imformation. For periodic job, from the execution that group system is conventional, it can be analyzed automatically. When needs process new operation set, it is possible to use the operation overview extracted, the estimated value of the Map stage of computational tasks and the time length in stage in Reduce stage.

And when the system resource R of job request be not equal to described system can maximum resource S time, the prior imformation of estimation block 350 resource assigned by operation and system according to order module 320 estimates the time length in stage of described operation.

The embodiment of the present invention additionally provides the Hadoop group system of the dispatching device shown in a kind of Fig. 3 to Fig. 4 comprising the embodiment of the present invention, and this group system can be disposed according to the framework shown in Fig. 1, and this dispatching device can be the task dispatcher shown in Fig. 1.

Below by way of specific examples, various embodiments of the present invention are described further.

It is that the Hadoop group system of 30 �� 30 performs an operation set J comprising 5 operations maximum available resources_i, 5 operations are numbered according to 1-5, and wherein, online assignment J₁��J₂��J₃��J₄And J₅All ask 30 Map and 30 Reduceslots. Prior imformation according to group system, estimates the execution time in stage of each business, and with (m_i, r_i) form by each job property such as following table:

J_i(job number)	M_i(map task treatment time)	(reduce task treatment time)	(operation due in)
				J₁	4	5	0
J₂	1	4	1
				J₃	30	4	2
J₄	6	30	3
				J₅	2	3	4

Wherein, the implication represented by the time length in stage is time unit, such as, arrives in the moment, and the time length in stage in Map stage is 4 time units, and the time length in stage of Reduce is 5 time units.

According to the method for the embodiment of the present invention, the process this operation set processed is as follows:

S510. for each operation distribution system arrived can maximum resource 30 �� 30: be operation J₁��J₂Distribution maximum resource, and dispatch J according to prerequisite variable algorithm₁��J₂;

S520. for each new operation distribution system arrived can maximum resource 30 �� 30: be operation J₃��J₄And J₅Distribution maximum resource, and dispatch J according to prerequisite variable algorithm₃��J₄And J₅;

S530. according to job property, according to prerequisite variable algorithm arrange five operations execution order be: ��=(J₁��J₂��J₃��J₄��J₅)��

S540. performing above-mentioned operation according to the execution order of procedure, as shown in Figure 5, total complete time is 52 time units to the execution result of operation.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, it is can be completed by the hardware that computer program carrys out instruction relevant, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment such as above-mentioned each side method. Wherein, described storage media can be magnetic disc, CD, read-only storage memory body (Read-OnlyMemory, ROM) or random storage and remembers body (RandomAccessMemory, RAM) etc.

The above; it is only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any it is familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention. Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. the method for a Hadoop cluster on-line scheduling management, it is characterised in that, described method comprises step:

For each online assignment distribution system can maximum resource;

2. method according to claim 1, it is characterised in that, the competition of on-line Algorithm than (competitive-ratio) be [1,2),

Than the total complete time being new proposition algorithm, (this algorithm is on-line Algorithm, T in competition_online) divided by the optimum total complete time (T of off-line_opt),

Both it was:

\frac{T_{online}}{T_{opt}} .

3. method according to claim 1, it is characterised in that, the described time length calculating Map and the Reduce stage arriving online assignment for each online assignment, for each online assignment distribution system can maximum resource in:

When the system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S;

When the system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S;

When the system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;

Wherein, N=R/S upwards gets whole;

When R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;

4. method according to claim 3, it is characterised in that, described job property is A_iComprise time of arrival, time length in stage and stage type, and:

Wherein, A_iIt is i-th operation J_iAttribute, t_iFor operation J_iThe moment of arrival system, m_iAnd r_iIt is respectively operation J_iThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.

5. method according to claim 4, it is characterised in that, described according to job property, according to prerequisite variable algorithm arrange online assignment perform order;

For each online assignment is numbered;

In the step of the time length in stage of estimation online assignment and the time length in stage of estimation online assignment:

6. the dispatching device of an online Hadoop group system, it is characterised in that, described device comprises:

Distribution module, for for each online assignment distribution system can maximum resource;

Order module, for according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged operation;

7. device according to claim 6, it is characterised in that, described distribution module:

The system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.

The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described online assignment after burst distribute described system can maximum resource S.

The system resource R of job request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;

Wherein, N=R/S gets whole downwards.

8. device according to any one of claim 6 to 7, it is characterised in that, described online assignment attribute is A_iComprise time of arrival, time length in stage and stage type, and:

9. device according to claim 8, it is characterised in that,

Order module, according to prerequisite variable algorithm arrange online assignment perform order;

Numbering module, for being each online assignment numbering;

Estimation block, for estimating the time length in stage of online assignment; The system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the prior imformation system of the system resource of described online assignment request and system; The system resource R of online assignment request be not equal to described system can maximum resource S time, resource assigned by described online assignment and the prior imformation of system estimate the time length in stage of described online assignment.

10. an online Hadoop group system and method for reducing power consumption thereof, it is characterised in that, comprise the dispatching device according to any one of claim 6 to 9 and use the scheduling method according to any one of claim 1 to 5.