CN105653357A - Hadoop cluster online total completion time minimizing scheduling method and device - Google Patents

Hadoop cluster online total completion time minimizing scheduling method and device Download PDF

Info

Publication number
CN105653357A
CN105653357A CN201410635768.6A CN201410635768A CN105653357A CN 105653357 A CN105653357 A CN 105653357A CN 201410635768 A CN201410635768 A CN 201410635768A CN 105653357 A CN105653357 A CN 105653357A
Authority
CN
China
Prior art keywords
resource
stage
online assignment
online
maximum resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410635768.6A
Other languages
Chinese (zh)
Inventor
田文洪
李国忠
蒋亚秋
徐敏贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410635768.6A priority Critical patent/CN105653357A/en
Publication of CN105653357A publication Critical patent/CN105653357A/en
Pending legal-status Critical Current

Links

Abstract

The invention embodiment discloses an online Hadoop cluster scheduling management method and device, and relates to the cluster scheduling field; the method comprises the following steps: calculating arrival online working Map and Reduce phase lasting time (online work continuously arrives until all online works arrive); distributing system available maximum resource for each online work; submitting a novel result to the scheduling system; the system distributes recalculated MapReduce resource quantity needed by tasks, thus processing tasks. The method and device are suitable for online Hadoop cluster node dynamic scheduling management; in an execution period of each online work, the system available maximum resource is assigned for each work, thus minimizing the online total completion time.

Description

The scheduling method of the online minimumization total complete time of a kind of Hadoop cluster and device
Technical field
The present invention relates to online colony dispatching technical field, particularly relate to scheduling method, the dispatching device of a kind of Hadoop cluster on-line system.
Background technology
Hadoop is a software framework that mass data carries out in the way of reliable, efficient, scalable distributed processing. The main task deployment of Hadoop cluster (cluster) is divided into client terminal (Client) machine, major joint (Masternodes) and from node (Slavenodes) 3 parts, as shown in Figure 1. Data store (Hadoop distributed document system, HadoopDistributedFileSystem, HDFS) and to the supervision operating in the parallel computation on these data (MapReduce) being two key function module of Hadoop, these two key function module are responsible for primarily of major joint. HDFS adopts principal and subordinate (Master/Slave) structural models, and a HDFS cluster is made up of a name node (NameNode) and some data nodes (DataNode). MapReduce framework runs separately job trace device (JobTracker) on the primary node by one and operates in each cluster and jointly forms from task tracking device (TaskTracker) node. HDFS and MR forms the core of Hadoop distributed systems architecture jointly.
Hadoop is a distributed parallel programming framework increased income achieving MapReduce pattern, and the features such as it is general with it, convenient and practical are widely applied in cloud computing and big data processing epoch. MapReduce is the programming model of a kind of concurrent operation for large-scale dataset (being greater than 1TB). MapReduce working process comprises two stages: Map stage and Reduce stage. The Map stage comprises multiple Map task, and the Reduce stage comprises multiple Reduce task. Before formal execution Map function, it is necessary to input data are carried out burst, and each Map task processes a logic burst (split). Split contains the metadata informations such as data start, data length, data place node, and its division methods is determined by user oneself usually. The quantity of split determines the quantity of Map task.
HDFS realizes the base support of distributed storage being stored in Hadoop system structure.
The name space of NameNode execute file system, as opened, close, Rename file or catalogue etc., is also responsible for the mapping of data block to concrete DataNode. DataNode is data memory node, is also computing node, and it is responsible for processing the file read-write of file system client, and carries out the establishment of database, deletion and duplication work under the unified scheduling of NameNode.
Each subtask task of JobTracker primary responsibility scheduling Job runs on TaskTracker, and monitors them, if the task finding that there is failure just reruns it. JobTracker is also responsible for the information such as the implementation progress of tracking task, resource usage quantity, and these information are told task dispatcher (TaskScheduler), so that these resources allocations are given suitable task when resource occurs idle by scheduling device. Call heartbeat RPC function TaskTracker active period, to JobTracker report node and task running state information, get JobTracker simultaneously and return the various order of heartbeat packet and perform operation accordingly. TaskTracker uses " slot " equivalent to divide the stock number on this node. Slot is a logical concept, is the resource units of Hadoop, and the quantity of the slot of a node is used for representing the capacity of the resource of certain node or perhaps the size of ability. Slot is divided into Mapslot and Reduceslot two kinds, respectively for MapTask and ReduceTask. Each operation application resource is in units of slot, and each node can determine computing power and the storer of oneself, it is determined that the slot total amount oneself comprised. When certain operation to be started to perform, first applying for slot to JobTracker, a task just has an opportunity to run after getting a slot, and the effect that Hadoop dispatches device is exactly the idle slot on each TaskTracker distributes to task use.
Client machine comprises all settings of Hadoop cluster, but it is neither major joint neither from node. The effect of client machine preserves data to cluster, submits to operation to carry out data processing to MapReduce, obtains the calculation result of task of checking.
Core technology in Hadoop group system is task scheduling, and in cloud computing is studied, the online assignment scheduling of MapReduce environment brings new problem and challenge, causes more and more attention. At first, the FIFO (FIFO) of Hadoop acquiescence dispatches device specially for periodically performing extensive batch jobs and design. Along with the increase of the number of users of MapReduce group system, the appearance of computing power scheduling device and Hadoop equity dispatching device (HFS:HadoopFairScheduling), provide more efficient cluster and share mode, but, existing scheduling device can't provide the support being minimized in line operation set completion date, when submitting to online assignment to be an operation set, thus completion date longer may cause total energy consumption higher.
Summary of the invention
The technical problem to be solved in the present invention is: scheduling method, the dispatching device providing a kind of Hadoop cluster on-line system, it is possible to be minimized in the total complete time of line operation set.
For solving the problems of the technologies described above, first aspect, embodiments provides a kind of online Hadoop group system scheduling method, and described method comprises following three steps:
Calculate the time length in Map and the Reduce stage arriving online assignment;
For each online assignment distribution system can maximum resource;
Online assignment is dispatched according to the execution order of prerequisite variable.
According to first aspect, in the implementation that the first is possible, described for each operation distribution system can maximum resource in:
When the system resource R of job request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.
According to first aspect, in the 2nd kind of possible implementation, described for each operation distribution system can maximum resource in:
When the system resource R of job request be less than system can maximum resource S time, according to
Described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S.
According to first aspect, in the implementation that the third is possible, described for each operation distribution system can maximum resource in:
When the system resource R of job request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;
Wherein, N=R/S upwards gets whole.
The third possible implementation according to first aspect, in the 4th kind of possible implementation, when R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;
When R/S is integer, for every ripple task matching system can maximum resource.
According to any one possible implementation above-mentioned of first aspect or first aspect, in the 5th kind of possible implementation, described job property is AiComprise time length in stage and stage type, and
Wherein, AiIt is the attribute of the operation, tiFor operation arrives the moment of system, miAnd riIt is respectively operation JiThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.
The 5th kind of possible implementation according to first aspect, in the 6th kind of possible implementation, described according to online assignment attribute, according to prerequisite variable algorithm arrange online assignment perform order.
The 6th kind of possible implementation according to first aspect, in the 7th kind of possible implementation, described method also comprises step:
For each online assignment is numbered.
The 7th kind of possible implementation according to first aspect, in the 8th kind of possible implementation, described method also comprises step:
The time length in stage of estimation online assignment.
The 8th kind of possible implementation according to first aspect, in the 9th kind of possible implementation, in the step of the time length in stage of described estimation operation:
When the system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.
The 8th kind of possible implementation according to first aspect, in the tenth kind of possible implementation, in the step of the time length in stage of described estimation operation:
When the system resource R of online assignment request be not equal to described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.
Second aspect, embodiments provides the dispatching device of a kind of online Hadoop group system, and the big module of described device three comprises:
Distribution module, calculates the time length in Map and the Reduce stage arriving online assignment, for each online assignment distribution system can maximum resource;
Order module, according to prerequisite variable algorithm arrange operation perform order;
Scheduling module, for dispatching online assignment according to described execution order.
According to second aspect, in the implementation that the first is possible, described distribution module: the system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.
According to second aspect, in the 2nd kind of possible implementation, described distribution module:
The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S.
According to second aspect, in the implementation that the third is possible, described distribution module:
The system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;
Wherein, N=R/S gets whole downwards.
The third possible implementation according to second aspect, in the 4th kind of possible implementation, device distribution module when R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;
When R/S is integer, for every ripple task matching system can maximum resource.
According to any one possible implementation above-mentioned of second aspect or second aspect, in the 5th kind of possible implementation, described job property for comprising time length in stage and stage type, and
Wherein, AiIt is the attribute of the operation, tiFor operation arrives the moment of system, miAnd riIt is respectively operation JiThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.
The 5th kind of possible implementation according to second aspect, in the 6th kind of possible implementation, described order module:
Described according to online assignment attribute, according to prerequisite variable algorithm arrange online assignment perform order.
The 6th kind of possible implementation according to second aspect, in the 7th kind of possible implementation, described device also comprises:
Numbering module, for being each online assignment numbering.
The 7th kind of possible implementation according to second aspect, in the 8th kind of possible implementation, described device also comprises:
Estimation block, for estimating the time length in stage of online assignment.
The 8th kind of possible implementation according to second aspect, in the 9th kind of possible implementation, described device also comprises:
The system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the prior imformation system of the system resource of described online assignment request and system.
The 8th kind of possible implementation according to second aspect, in the tenth kind of possible implementation, described device also comprises:
The system resource R of online assignment request be not equal to described system can maximum resource S time, resource assigned by described online assignment and the prior imformation of system estimate the time length in stage of described online assignment.
The third aspect, embodiments provides a kind of online Hadoop group system, comprises any one dispatching device described in possible implementation of second aspect or second aspect.
Fourth aspect, embodiments provides the method for reducing power consumption of a kind of online Hadoop group system, it is characterised in that, described online Hadoop group system uses any one method described in possible implementation of first aspect or first aspect to dispatch.
Accompanying drawing explanation
Fig. 1 is the Hadoop colony system deployment schematic diagram of an embodiment of the present invention;
Fig. 2 is the schema of the Hadoop group system scheduling method of an embodiment of the present invention;
Fig. 3 is the structural representation of the Hadoop group system dispatching device of an embodiment of the present invention;
Fig. 4 is the structural representation of the Hadoop group system dispatching device of the present invention's another kind of embodiment;
Fig. 5 is the execution result schematic diagram performing operation according to the Hadoop group system scheduling method of an embodiment of the present invention.
Fig. 6 is Rk��Mk+1Legend;
Fig. 7 is Rk<Mk+1Legend;
Embodiment
Below according to drawings and embodiments, the specific embodiment of the present invention is described in further detail. Following examples are for illustration of the present invention, but are not used for limiting the scope of the invention.
For a better understanding of the present invention, now the term related in the embodiment of the present invention is done following explanation:
The one batch of online assignment total complete time (totalmakespan) in Hadoop group system: refer to execute in a certain order the total time that all Map/Reduce stages of this batch job spend, namely terminate the total time spent from the Map stage of first online assignment to last online assignment Reduce stage.
Ripple (wave): refer to that an operation needs the number of times performed in a given Hadoop group system, when the resource of job request is R, system can be S with maximum resource, and when R is greater than S, the execution number of times of this operation also namely wave number N equal R/S and upwards get whole. Such as a job request uses 30 Mapslots and 30 Reduceslots, one have 20 �� 20 available resources (20 Mapslots and 20 Reduceslots) Hadoop group system in, its number of times performed is that (the Map stage performs 2 ripples to 2 ripples, the Reduce stage is also 2 ripples), the rest may be inferred.
As shown in Figure 2, embodiments providing a kind of scheduling method of online Hadoop group system, the method comprising the steps of:
S210. it is online assignment sequence that each arrives, and according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged operation.
S220. for each online assignment distribution system can maximum resource.
The method of the embodiment of the present invention is so that, in online situation, the process of the operation set comprising n operation that client machine is inputted by group system is example. Calculate the time length in Map and the Reduce stage arriving online assignment, for each operation distribution system can maximum resource, according to prerequisite variable execution order dispatch online assignment, it is achieved minimumization total complete time.
In various embodiments of the present invention, system can resource refer to MapReduce resource timeslot (slot) total in given Hadoop group system. Various embodiments of the present invention are assumed the node in Hadoop group system has a Mapslot and Reduceslot simultaneously, for Hadoop group system, there are 60 nodes, it is possible to represent that its total available maximum resource is 60 �� 60 slots. Certainly this can also dynamically according to particular case setting.
S230. operation is dispatched according to described execution order.
To sum up, the time length of the two benches (Map stage and Reduce stage) of MapReduce is calculated by the method for the embodiment of the present invention, and each online assignment the term of execution, for the available resources that its distribution system is maximum, and then can minimumization total complete time, reduce group system energy consumption.
Tool says it, in step S210, for each online assignment: when the system resource R of job request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.Such as, when the system resource R of job request is 30 �� 30 (30 Mapslots, 30 Reduceslots), system can maximum resource S be 30 �� 30 (30 Mapslots, 30 Reduceslots) time, the system resource by 30 �� 30 all distributes to this operation.
When the system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S. Such as, when the system resource R of job request is 20 �� 20, system can maximum resource S when being 30 �� 30, after again this operation being carried out burst, the system resource by 30 �� 30 all distributes to this operation.
When the system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation burst, wherein, N=R/S upwards gets whole.
When R/S is not integer, be the 1st to N ripple task matching system can maximum resource S, also, for front N-1 ripple, system is fully loaded to be performed, and last ripple also should be fully loaded with execution. Such as, the resource R that job request uses is 30 �� 30, and when system can be 20 �� 20 with maximum resource S, the number of times of the execution of this operation is 2 ripples, and during the first ripple, system is fully loaded performs, and during the 2nd ripple, system is also fully loaded performs.
When R/S is integer, for every ripple task matching system can maximum resource, now, similar when the execution of every ripple task and R=S.
In addition, in various embodiments of the present invention, online assignment attribute is AiComprise time of arrival, time length in stage and stage type, and
Wherein, AiIt is the attribute of the operation, tiFor operation arrives the moment of system, miAnd riIt is respectively operation JiThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.
Online assignment is dispatched according to the execution order of prerequisite variable.
S240. it is each online assignment numbering.
In addition, in order to know the attribute of each online assignment, the method for the embodiment of the present invention also comprises step:
S250. the time length in stage of online assignment is estimated.
In step s 250, when the system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described operation according to the system resource R of online assignment request and system prior imformation. For periodic job, from the execution conventional according to group system, it can be analyzed automatically. When needs process new operation set, it is possible to use the operation overview extracted, the estimated value of the Map stage of computational tasks and the time length in stage in Reduce stage.
And when the system resource R of online assignment request be not equal to described system can maximum resource S time, estimate the time length in stage of described operation according to the prior imformation in resource assigned by operation described in step S210 and system.
There are two kinds of extreme situations in online assignment scheduling, the general scheduling time is all between therebetween:
1) best-case is consistent with off-line scheduling, and Johnson1954 paper " OptimalTwo-andThree-StageProductionScheduleswithSetupTim esIncluded_Johnson_1954 " proves;
2) minimum makespan of worst case T = M 1 + R n + &Sigma; i = 1 n - 1 max { R i , M i + 1 }
A., as n=1, both only had a MapReduce task:
T=M1+R1
Above-mentioned formula T = M 1 + R n + &Sigma; i = 1 n - 1 max { R i , M i + 1 } Set up;
B. assume as n=k, formula T = M 1 + R k + &Sigma; i = 1 k - 1 max { R i , M i + 1 } Set up,
As n=k+1,
Work as Rk��Mk+1Time,
As shown in Figure 6, have:
T = M 1 + R k + &Sigma; i = 1 k - 1 max { R i , M i + 1 } + R k + 1 = T = M 1 + R k + 1 + &Sigma; i = 1 k max { R i , M i + 1 }
Work as Rk<Mk+1Time,
As shown in Figure 7, have:
T = M 1 + M k + R k + 1 + &Sigma; i = 1 k - 1 max { R i , M i + 1 } = T = M 1 + R k + 1 + &Sigma; i = 1 k max { R i , M i + 1 }
So T = M 1 + R k + 1 + &Sigma; i = 1 k max { R i , M i + 1 } Set up;
By 1), 2) learn the minimum makespan of worst case T = M 1 + R n + &Sigma; i = 1 n - 1 max { R i , M i + 1 } Set up;
It will be appreciated by those skilled in the art that, in the method for various embodiments of the present invention, the sequence number size of each step does not also mean that the priority of execution order, and the execution order of each step should be determined with its function and inherent logic, and the implementation process of the specific embodiment of the invention should not formed any restriction.
As shown in Figure 3, the scheduling that the embodiment of the present invention additionally provides a kind of online Hadoop group system fills 300, and this device 300 comprises:
Distribution module 310, for for each online assignment distribution system can maximum resource.
The device of the embodiment of the present invention is so that, in online situation, the process of the operation set comprising n operation that client machine is inputted by group system is example. Calculate the time length in Map and the Reduce stage arriving online assignment, for each operation distribution system can maximum resource, according to prerequisite variable execution order dispatch online assignment, it is achieved minimumization total complete time.
Order module 320, for according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged online assignment.
Scheduling module 330, for dispatching online assignment according to described execution order.
To sum up, the time length of the two benches (Map stage and Reduce stage) of MapReduce is calculated by the method for the embodiment of the present invention, and each online assignment the term of execution, for the available resources that its distribution system is maximum, and then can minimumization total complete time, reduce group system energy consumption.
Tool says it, for each online assignment, and distribution module 320:
The system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S. Such as, when the system resource R of job request is 30 �� 30 (30 Mapslots, 30 Reduceslots), system can maximum resource S be 30 �� 30 (30 Mapslots, 30 Reduceslots) time, the system resource by 30 �� 30 all distributes to this operation.
The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S. Such as, when the system resource R of job request is 20 �� 20, system can maximum resource S when being 30 �� 30, after again this operation being carried out burst, the system resource by 30 �� 30 all distributes to this operation.
The system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation burst, wherein, N=R/S upwards gets whole.
If R/S is not integer, be the 1st to N ripple task matching system can maximum resource S, also, for the 1st to N ripple, system is fully loaded to be performed. Such as, the resource R that job request uses is 30 �� 30, when system can be 20 �� 20 with maximum resource S, the number of times of the execution of this operation is 2 ripples, it it is the resource of the first ripple task matching 20 �� 20, being the resource that the task of the 2nd ripple also distributes 20 �� 20, during the first ripple, system is fully loaded performs, and during the 2nd ripple, system is also fully loaded performs.
If R/S is integer, for every ripple task matching system can maximum resource, now, similar when the execution of every ripple task and R=S.
Correspondingly, as shown in Figure 4, the device of the embodiment of the present invention also comprises step:
Numbering module 340, for being each job number.
In addition, in order to know the attribute of each online assignment, the device of the embodiment of the present invention also comprises: estimation block 350, for estimating the time length in stage of online assignment. Estimation block 350 the system resource R of industry request equal described system can maximum resource S time, estimate the time length in stage of described operation according to the system resource R of job request and system prior imformation. For periodic job, from the execution that group system is conventional, it can be analyzed automatically. When needs process new operation set, it is possible to use the operation overview extracted, the estimated value of the Map stage of computational tasks and the time length in stage in Reduce stage.
And when the system resource R of job request be not equal to described system can maximum resource S time, the prior imformation of estimation block 350 resource assigned by operation and system according to order module 320 estimates the time length in stage of described operation.
The embodiment of the present invention additionally provides the Hadoop group system of the dispatching device shown in a kind of Fig. 3 to Fig. 4 comprising the embodiment of the present invention, and this group system can be disposed according to the framework shown in Fig. 1, and this dispatching device can be the task dispatcher shown in Fig. 1.
Below by way of specific examples, various embodiments of the present invention are described further.
It is that the Hadoop group system of 30 �� 30 performs an operation set J comprising 5 operations maximum available resourcesi, 5 operations are numbered according to 1-5, and wherein, online assignment J1��J2��J3��J4And J5All ask 30 Map and 30 Reduceslots. Prior imformation according to group system, estimates the execution time in stage of each business, and with (mi, ri) form by each job property such as following table:
Ji(job number) Mi(map task treatment time) (reduce task treatment time) (operation due in)
J1 4 5 0
J2 1 4 1
J3 30 4 2
J4 6 30 3
J5 2 3 4
Wherein, the implication represented by the time length in stage is time unit, such as, arrives in the moment, and the time length in stage in Map stage is 4 time units, and the time length in stage of Reduce is 5 time units.
According to the method for the embodiment of the present invention, the process this operation set processed is as follows:
S510. for each operation distribution system arrived can maximum resource 30 �� 30: be operation J1��J2Distribution maximum resource, and dispatch J according to prerequisite variable algorithm1��J2;
S520. for each new operation distribution system arrived can maximum resource 30 �� 30: be operation J3��J4And J5Distribution maximum resource, and dispatch J according to prerequisite variable algorithm3��J4And J5;
S530. according to job property, according to prerequisite variable algorithm arrange five operations execution order be: ��=(J1��J2��J3��J4��J5)��
S540. performing above-mentioned operation according to the execution order of procedure, as shown in Figure 5, total complete time is 52 time units to the execution result of operation.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, it is can be completed by the hardware that computer program carrys out instruction relevant, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment such as above-mentioned each side method. Wherein, described storage media can be magnetic disc, CD, read-only storage memory body (Read-OnlyMemory, ROM) or random storage and remembers body (RandomAccessMemory, RAM) etc.
The above; it is only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any it is familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention. Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (10)

1. the method for a Hadoop cluster on-line scheduling management, it is characterised in that, described method comprises step:
Calculate the time length in Map and the Reduce stage arriving online assignment;
For each online assignment distribution system can maximum resource;
Online assignment is dispatched according to the execution order of prerequisite variable.
2. method according to claim 1, it is characterised in that, the competition of on-line Algorithm than (competitive-ratio) be [1,2),
Than the total complete time being new proposition algorithm, (this algorithm is on-line Algorithm, T in competitiononline) divided by the optimum total complete time (T of off-lineopt),
Both it was: T online T opt .
3. method according to claim 1, it is characterised in that, the described time length calculating Map and the Reduce stage arriving online assignment for each online assignment, for each online assignment distribution system can maximum resource in:
When the system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S;
When the system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described operation after burst distribute described system can maximum resource S;
When the system resource R of online assignment request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;
Wherein, N=R/S upwards gets whole;
When R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;
When R/S is integer, for every ripple task matching system can maximum resource.
4. method according to claim 3, it is characterised in that, described job property is AiComprise time of arrival, time length in stage and stage type, and:
Wherein, AiIt is i-th operation JiAttribute, tiFor operation JiThe moment of arrival system, miAnd riIt is respectively operation JiThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.
5. method according to claim 4, it is characterised in that, described according to job property, according to prerequisite variable algorithm arrange online assignment perform order;
For each online assignment is numbered;
In the step of the time length in stage of estimation online assignment and the time length in stage of estimation online assignment:
When the system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.
When the system resource R of online assignment request be not equal to described system can maximum resource S time, estimate the time length in stage of described online assignment according to the described system resource of online assignment request and the prior imformation of system.
6. the dispatching device of an online Hadoop group system, it is characterised in that, described device comprises:
Distribution module, for for each online assignment distribution system can maximum resource;
Order module, for according to online assignment attribute, performs order according to what prerequisite variable algorithm arranged operation;
Scheduling module, for dispatching online assignment according to described execution order.
7. device according to claim 6, it is characterised in that, described distribution module:
The system resource R of online assignment request equal system can maximum resource S time, for described operation distribute described system can maximum resource S.
The system resource R of online assignment request be less than system can maximum resource S time, according to described system can maximum resource S to described operation burst, and for the described online assignment after burst distribute described system can maximum resource S.
The system resource R of job request be greater than system can maximum resource S time, according to execution N ripple come for described operation distribute resource;
Wherein, N=R/S gets whole downwards.
When R/S is not integer, be the 1st to N ripple task matching system can maximum resource S;
When R/S is integer, for every ripple task matching system can maximum resource.
8. device according to any one of claim 6 to 7, it is characterised in that, described online assignment attribute is AiComprise time of arrival, time length in stage and stage type, and:
Wherein, AiIt is i-th operation JiAttribute, tiFor operation JiThe moment of arrival system, miAnd riIt is respectively operation JiThe Map stage and time length in stage in Reduce stage, m and r represents that the stage type of operation is Map stage and Reduce stage respectively.
9. device according to claim 8, it is characterised in that,
Order module, according to prerequisite variable algorithm arrange online assignment perform order;
Numbering module, for being each online assignment numbering;
Estimation block, for estimating the time length in stage of online assignment; The system resource R of online assignment request equal described system can maximum resource S time, estimate the time length in stage of described online assignment according to the prior imformation system of the system resource of described online assignment request and system; The system resource R of online assignment request be not equal to described system can maximum resource S time, resource assigned by described online assignment and the prior imformation of system estimate the time length in stage of described online assignment.
10. an online Hadoop group system and method for reducing power consumption thereof, it is characterised in that, comprise the dispatching device according to any one of claim 6 to 9 and use the scheduling method according to any one of claim 1 to 5.
CN201410635768.6A 2014-11-11 2014-11-11 Hadoop cluster online total completion time minimizing scheduling method and device Pending CN105653357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410635768.6A CN105653357A (en) 2014-11-11 2014-11-11 Hadoop cluster online total completion time minimizing scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410635768.6A CN105653357A (en) 2014-11-11 2014-11-11 Hadoop cluster online total completion time minimizing scheduling method and device

Publications (1)

Publication Number Publication Date
CN105653357A true CN105653357A (en) 2016-06-08

Family

ID=56483161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410635768.6A Pending CN105653357A (en) 2014-11-11 2014-11-11 Hadoop cluster online total completion time minimizing scheduling method and device

Country Status (1)

Country Link
CN (1) CN105653357A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108173905A (en) * 2017-12-07 2018-06-15 北京奇艺世纪科技有限公司 A kind of resource allocation method, device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365729A (en) * 2013-07-19 2013-10-23 哈尔滨工业大学深圳研究生院 Dynamic MapReduce dispatching method and system based on task type
CN103685492A (en) * 2013-12-03 2014-03-26 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN104123182A (en) * 2014-07-18 2014-10-29 西安交通大学 Map Reduce task data-center-across scheduling system and method based on master-slave framework

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365729A (en) * 2013-07-19 2013-10-23 哈尔滨工业大学深圳研究生院 Dynamic MapReduce dispatching method and system based on task type
CN103685492A (en) * 2013-12-03 2014-03-26 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN104123182A (en) * 2014-07-18 2014-10-29 西安交通大学 Map Reduce task data-center-across scheduling system and method based on master-slave framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
田文洪: "A Note on Orchestrating an Ensemble of MapReduce Job for Minizing Their Makespan", 《IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING》 *
田文洪: "最小化多MapReduce任务总完工时间的分析模型及其应用", 《计算机工程与科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108173905A (en) * 2017-12-07 2018-06-15 北京奇艺世纪科技有限公司 A kind of resource allocation method, device and electronic equipment
CN108173905B (en) * 2017-12-07 2021-06-18 北京奇艺世纪科技有限公司 Resource allocation method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN105117286B (en) The dispatching method of task and streamlined perform method in MapReduce
US8595732B2 (en) Reducing the response time of flexible highly data parallel task by assigning task sets using dynamic combined longest processing time scheme
CN105700948A (en) Method and device for scheduling calculation task in cluster
CN102609303B (en) Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN107291550B (en) A kind of Spark platform resource dynamic allocation method and system for iterated application
CN104601664A (en) Cloud computing platform resource management and virtual machine dispatching control system
CN103729246A (en) Method and device for dispatching tasks
CN104023042B (en) Cloud platform resource scheduling method
CN103685492B (en) Dispatching method, dispatching device and application of Hadoop trunking system
Ghose et al. Adaptive divisible load scheduling strategies for workstation clusters with unknown network resources
CN105740085A (en) Fault tolerance processing method and device
CN104794239A (en) Cloud platform data processing method
CN112101773A (en) Task scheduling method and system for multi-agent system in process industry
Shi et al. MapReduce short jobs optimization based on resource reuse
CN100531070C (en) Network resource scheduling simulation system
CN113255165A (en) Experimental scheme parallel deduction system based on dynamic task allocation
Djebbar et al. Optimization of tasks scheduling by an efficacy data placement and replication in cloud computing
CN102156659A (en) Scheduling method and system for job task of file
Gouasmi et al. Exact and heuristic MapReduce scheduling algorithms for cloud federation
Biswas et al. A novel resource aware scheduling with multi-criteria for heterogeneous computing systems
CN103617083A (en) Storage scheduling method and system, job scheduling method and system and management node
CN105653357A (en) Hadoop cluster online total completion time minimizing scheduling method and device
Xu et al. Stochastic customer order scheduling to minimize long-run expected order cycle time
Zhao et al. RAS: a task scheduling algorithm based on resource attribute selection in a task scheduling framework
Gunasekaran et al. Dynamic scheduling algorithm for reducing start time in Hadoop

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160608