CN104281492A - Fair Hadoop task scheduling method in heterogeneous environment - Google Patents
Fair Hadoop task scheduling method in heterogeneous environment Download PDFInfo
- Publication number
- CN104281492A CN104281492A CN201310283998.6A CN201310283998A CN104281492A CN 104281492 A CN104281492 A CN 104281492A CN 201310283998 A CN201310283998 A CN 201310283998A CN 104281492 A CN104281492 A CN 104281492A
- Authority
- CN
- China
- Prior art keywords
- resource pool
- task
- job
- current
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a fair Hadoop task scheduling method in the heterogeneous environment. The method comprises the steps of judging whether to execute a big job; calling the job scheduling algorithm, selecting an appropriate job and calling the task scheduling algorithm if the big job is to be executed; calling the resource pool scheduling algorithm, executing the job scheduling algorithm, selecting the appropriate job and finally calling the task scheduling algorithm if the big job is not to be executed. Compared with the prior art, the method has the advantages of supporting the large memory job, shortening the response time of a single job and enabling resource allocation to be fairer in the heterogeneous cluster environment.
Description
Technical field
The invention belongs to computer task dispatching technique, the Hadoop task equity dispatching method particularly under a kind of isomerous environment.
Background technology
Along with the develop rapidly of informationization technology, the development of internet and the mobile Internet interconnected life that to have allowed the people far away from the ends of the earth enjoy abundant.When people enjoy such actual life easily, we also create many data, and let us has stepped into large data age.How processing the data of these flood tides, is many companies key tasks.
In distributed programming model, Google proposes the distributed programmed model of Map/Reduce, utilizes map and reduce framework can realize large-scale distributed calculating on distributed type colony, and has very high stability.(Dean?J.?and?Ghemawat?S.?Mapreduce:?simplified?data?processing?on?large?clusters[J],?Commun.?ACM,?51,?1,?pp.?107-113,?2008.)
Receive the inspiration of Google just, Doug Cutting starts its open source system---the research and development of Hadoop.Hadoop is the general name of project.Mainly be made up of HDFS and MapReduce.HDFS is Google File System(GFS) realization of increasing income.MapReduce is the realization of increasing income of Google MapReduce.
This Distributed Architecture is very creative, and has great extendability, makes Google on throughput of system, have very large competitive power.Therefore Apache foundation Java achieves a version of increasing income, and supports the Linux platform such as Fedora, Ubuntu.Hadoop achieves HDFS file system and MapRecue.As long as user inherits MapReduceBase, provide two classes realizing Map and Reduce respectively, and register Job can automatic distributed operation.
Due to increasing income of Hadoop, and unsurpassed expressive force, increasing company have selected the platform that it carries out data analysis.Abroad, the company such as facebook uses it to analyze the data produced every day; In China, Taobao uses it to analyze user data, creates many useful business datas.
But carrying out task scheduling on Hadoop is a difficult problem.Each major company is all generally according to oneself in-company demand, carries out customized task dispatching algorithm.It is the FIFO algorithm of foundation that Hadoop acquiescence provided with time, also been proposed computing power dispatching algorithm (Capacity-Scheduler) afterwards, proposes fair scheduling algorithm (Fair-scheduler) afterwards by facebook.
Summary of the invention
1, object of the present invention.
The present invention in order to solve in prior art do not support large operation, the corresponding time course of single operation and easily purchase Resourse Distribute injustice in cluster environment when, propose the Hadoop task equity dispatching method under a kind of isomerous environment.
2, the technical solution adopted in the present invention.
The technical solution realizing the object of the invention is:
Hadoop task equity dispatching method step under a kind of isomerous environment is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group and need how many Map groove and Reduce groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution.
3, beneficial effect of the present invention.
The present invention compared with prior art, its remarkable advantage: (1) supports large memory operation; (2) improve the response time of single operation; (3) in isomeric group environment, make Resourse Distribute more fair.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the program flow diagram using MapReduce model to write.
Embodiment
Embodiment
The present invention relates to the Hadoop task equity dispatching method under a kind of isomerous environment, step is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group and need how many Map groove and Reduce groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
Job scheduling algorithm's concrete steps are as follows:
Step a, according to following computing formula, calculating weight is carried out to the operation in resource pool.
; Wherein
for this operation is from being submitted to the time waited at present;
, wherein Remainer is the remaining task number of this operation;
Step b is according to weight
operation is sorted;
Step c selects operation to dispatch successively from the head of job queue;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution;
Resource pool dispatching algorithm is as follows:
Step a obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;
Step b obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;
Step c compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests.Then formula is utilized
, wherein
for the dominant frequency of CPU,
for memory size.Calculate the weight of the work nest taken at present in this resource pool.This work nest is distributed to the less resource pool of weight, come before list by it;
If steps d only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.
Above-described embodiment does not limit the present invention in any way, and the technical scheme that the mode that every employing is equal to replacement or equivalent transformation obtains all drops in protection scope of the present invention.
Claims (3)
1. the Hadoop task equity dispatching method under isomerous environment, is characterized in that step is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group of needs how many statistics groove and converge groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution.
2. the Hadoop task equity dispatching method under isomerous environment according to claim 1, is characterized in that: the job scheduling algorithm's concrete steps in described step 4 are as follows:
Operation in step 2.1 pair resource pool carries out calculating weight according to following computing formula,
; Wherein
for this operation is from being submitted to the time waited at present;
, wherein Remainer is the remaining task number of this operation;
Step 2.2 is according to weight
operation is sorted;
Step 2.3 selects operation to dispatch successively from the head of job queue.
3. the Hadoop task equity dispatching method under isomerous environment according to claim 1 and 2, is characterized in that the resource pool dispatching algorithm in described step 5:
Step 3.1 obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;
Step 3.2 obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;
Step 3.3 compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests, then utilizes formula
, wherein
for the dominant frequency of CPU,
for memory size, calculate the weight of the work nest taken at present in this resource pool, this work nest is distributed to the less resource pool of weight, come before list by it;
If step 3.4 only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310283998.6A CN104281492A (en) | 2013-07-08 | 2013-07-08 | Fair Hadoop task scheduling method in heterogeneous environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310283998.6A CN104281492A (en) | 2013-07-08 | 2013-07-08 | Fair Hadoop task scheduling method in heterogeneous environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104281492A true CN104281492A (en) | 2015-01-14 |
Family
ID=52256393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310283998.6A Pending CN104281492A (en) | 2013-07-08 | 2013-07-08 | Fair Hadoop task scheduling method in heterogeneous environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104281492A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373426A (en) * | 2015-07-28 | 2016-03-02 | 哈尔滨工程大学 | Method for memory ware real-time job scheduling of car networking based on Hadoop |
CN105610621A (en) * | 2015-12-31 | 2016-05-25 | 中国科学院深圳先进技术研究院 | Method and device for dynamically adjusting task level parameter of distributed system architecture |
CN106095573A (en) * | 2016-06-08 | 2016-11-09 | 北京大学 | The Storm platform operations of a kind of work nest perception divides equally dispatching method |
WO2016197716A1 (en) * | 2016-01-18 | 2016-12-15 | 中兴通讯股份有限公司 | Task scheduling method and device |
CN113454614A (en) * | 2018-12-04 | 2021-09-28 | 华为技术加拿大有限公司 | System and method for resource partitioning in distributed computing |
GB2571651B (en) * | 2016-10-21 | 2022-09-21 | Datarobot Inc | Systems for predictive data analytics, and related methods and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194248A1 (en) * | 2001-05-01 | 2002-12-19 | The Regents Of The University Of California | Dedicated heterogeneous node scheduling including backfill scheduling |
CN101916209A (en) * | 2010-08-06 | 2010-12-15 | 华东交通大学 | Cluster task resource allocation method for multi-core processor |
-
2013
- 2013-07-08 CN CN201310283998.6A patent/CN104281492A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194248A1 (en) * | 2001-05-01 | 2002-12-19 | The Regents Of The University Of California | Dedicated heterogeneous node scheduling including backfill scheduling |
CN101916209A (en) * | 2010-08-06 | 2010-12-15 | 华东交通大学 | Cluster task resource allocation method for multi-core processor |
Non-Patent Citations (2)
Title |
---|
夏祎: "Hadoop平台下的作业调度算法研究与改进", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
姜淼: "Hadoop云平台下调度算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373426A (en) * | 2015-07-28 | 2016-03-02 | 哈尔滨工程大学 | Method for memory ware real-time job scheduling of car networking based on Hadoop |
CN105373426B (en) * | 2015-07-28 | 2019-01-15 | 哈尔滨工程大学 | A kind of car networking memory aware real time job dispatching method based on Hadoop |
CN105610621A (en) * | 2015-12-31 | 2016-05-25 | 中国科学院深圳先进技术研究院 | Method and device for dynamically adjusting task level parameter of distributed system architecture |
CN105610621B (en) * | 2015-12-31 | 2019-04-26 | 中国科学院深圳先进技术研究院 | A kind of method and device of distributed system architecture task level dynamic state of parameters adjustment |
WO2016197716A1 (en) * | 2016-01-18 | 2016-12-15 | 中兴通讯股份有限公司 | Task scheduling method and device |
CN106095573A (en) * | 2016-06-08 | 2016-11-09 | 北京大学 | The Storm platform operations of a kind of work nest perception divides equally dispatching method |
GB2571651B (en) * | 2016-10-21 | 2022-09-21 | Datarobot Inc | Systems for predictive data analytics, and related methods and apparatus |
CN113454614A (en) * | 2018-12-04 | 2021-09-28 | 华为技术加拿大有限公司 | System and method for resource partitioning in distributed computing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104281492A (en) | Fair Hadoop task scheduling method in heterogeneous environment | |
Cheng et al. | Heterogeneity-aware workload placement and migration in distributed sustainable datacenters | |
Rejiba et al. | Custom scheduling in Kubernetes: A survey on common problems and solution approaches | |
CN103593323A (en) | Machine learning method for Map Reduce task resource allocation parameters | |
CN104112049B (en) | Based on the MapReduce task of P2P framework across data center scheduling system and method | |
CN101582043A (en) | Dynamic task allocation method of heterogeneous computing system | |
Cheng et al. | Heterogeneity aware workload management in distributed sustainable datacenters | |
CN102708003A (en) | Method for allocating resources under cloud platform | |
Abdullah et al. | Diminishing returns and deep learning for adaptive CPU resource allocation of containers | |
Chen et al. | Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud | |
CN101971141A (en) | System and method for managing a hybrid compute environment | |
Lu et al. | InSTechAH: Cost-effectively autoscaling smart computing hadoop cluster in private cloud | |
Harichane et al. | KubeSC‐RTP: Smart scheduler for Kubernetes platform on CPU‐GPU heterogeneous systems | |
Li et al. | Cost-efficient fault-tolerant workflow scheduling for deadline-constrained microservice-based applications in clouds | |
CN106802822A (en) | A kind of cloud data center cognitive resources dispatching method based on moth algorithm | |
Niu et al. | An adaptive efficiency-fairness meta-scheduler for data-intensive computing | |
Guyon et al. | How much energy can green HPC cloud users save? | |
CN105426247A (en) | HLA federate planning and scheduling method | |
Xu et al. | Intelligent scheduling for parallel jobs in big data processing systems | |
Malathy et al. | Performance improvement in cloud computing using resource clustering | |
Zhou et al. | Resource allocation in cloud computing based on clustering method | |
Li et al. | Cress: Dynamic scheduling for resource constrained jobs | |
Reuther et al. | Technical challenges of supporting interactive HPC | |
Han et al. | Stochastic Matrix Modelling and Scheduling Algorithm of Distributed Intelligent Computing System | |
Wang et al. | Adaptive Pricing and Online Scheduling for Distributed Machine Learning Jobs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150114 |