CN104281492A - Fair Hadoop task scheduling method in heterogeneous environment - Google Patents

Fair Hadoop task scheduling method in heterogeneous environment Download PDF

Info

Publication number
CN104281492A
CN104281492A CN201310283998.6A CN201310283998A CN104281492A CN 104281492 A CN104281492 A CN 104281492A CN 201310283998 A CN201310283998 A CN 201310283998A CN 104281492 A CN104281492 A CN 104281492A
Authority
CN
China
Prior art keywords
resource pool
task
job
current
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310283998.6A
Other languages
Chinese (zh)
Inventor
李千目
侯君
魏士祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Wuxi Nanligong Technology Development Co Ltd
Original Assignee
Nanjing University of Science and Technology
Wuxi Nanligong Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology, Wuxi Nanligong Technology Development Co Ltd filed Critical Nanjing University of Science and Technology
Priority to CN201310283998.6A priority Critical patent/CN104281492A/en
Publication of CN104281492A publication Critical patent/CN104281492A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a fair Hadoop task scheduling method in the heterogeneous environment. The method comprises the steps of judging whether to execute a big job; calling the job scheduling algorithm, selecting an appropriate job and calling the task scheduling algorithm if the big job is to be executed; calling the resource pool scheduling algorithm, executing the job scheduling algorithm, selecting the appropriate job and finally calling the task scheduling algorithm if the big job is not to be executed. Compared with the prior art, the method has the advantages of supporting the large memory job, shortening the response time of a single job and enabling resource allocation to be fairer in the heterogeneous cluster environment.

Description

Hadoop task equity dispatching method under a kind of isomerous environment
Technical field
The invention belongs to computer task dispatching technique, the Hadoop task equity dispatching method particularly under a kind of isomerous environment.
Background technology
Along with the develop rapidly of informationization technology, the development of internet and the mobile Internet interconnected life that to have allowed the people far away from the ends of the earth enjoy abundant.When people enjoy such actual life easily, we also create many data, and let us has stepped into large data age.How processing the data of these flood tides, is many companies key tasks.
In distributed programming model, Google proposes the distributed programmed model of Map/Reduce, utilizes map and reduce framework can realize large-scale distributed calculating on distributed type colony, and has very high stability.(Dean?J.?and?Ghemawat?S.?Mapreduce:?simplified?data?processing?on?large?clusters[J],?Commun.?ACM,?51,?1,?pp.?107-113,?2008.)
Receive the inspiration of Google just, Doug Cutting starts its open source system---the research and development of Hadoop.Hadoop is the general name of project.Mainly be made up of HDFS and MapReduce.HDFS is Google File System(GFS) realization of increasing income.MapReduce is the realization of increasing income of Google MapReduce.
This Distributed Architecture is very creative, and has great extendability, makes Google on throughput of system, have very large competitive power.Therefore Apache foundation Java achieves a version of increasing income, and supports the Linux platform such as Fedora, Ubuntu.Hadoop achieves HDFS file system and MapRecue.As long as user inherits MapReduceBase, provide two classes realizing Map and Reduce respectively, and register Job can automatic distributed operation.
Due to increasing income of Hadoop, and unsurpassed expressive force, increasing company have selected the platform that it carries out data analysis.Abroad, the company such as facebook uses it to analyze the data produced every day; In China, Taobao uses it to analyze user data, creates many useful business datas.
But carrying out task scheduling on Hadoop is a difficult problem.Each major company is all generally according to oneself in-company demand, carries out customized task dispatching algorithm.It is the FIFO algorithm of foundation that Hadoop acquiescence provided with time, also been proposed computing power dispatching algorithm (Capacity-Scheduler) afterwards, proposes fair scheduling algorithm (Fair-scheduler) afterwards by facebook.
Summary of the invention
1, object of the present invention.
The present invention in order to solve in prior art do not support large operation, the corresponding time course of single operation and easily purchase Resourse Distribute injustice in cluster environment when, propose the Hadoop task equity dispatching method under a kind of isomerous environment.
2, the technical solution adopted in the present invention.
The technical solution realizing the object of the invention is:
Hadoop task equity dispatching method step under a kind of isomerous environment is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group and need how many Map groove and Reduce groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution.
3, beneficial effect of the present invention.
The present invention compared with prior art, its remarkable advantage: (1) supports large memory operation; (2) improve the response time of single operation; (3) in isomeric group environment, make Resourse Distribute more fair.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the program flow diagram using MapReduce model to write.
Embodiment
Embodiment
The present invention relates to the Hadoop task equity dispatching method under a kind of isomerous environment, step is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group and need how many Map groove and Reduce groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
Job scheduling algorithm's concrete steps are as follows:
Step a, according to following computing formula, calculating weight is carried out to the operation in resource pool. ; Wherein for this operation is from being submitted to the time waited at present; , wherein Remainer is the remaining task number of this operation;
Step b is according to weight operation is sorted;
Step c selects operation to dispatch successively from the head of job queue;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution;
Resource pool dispatching algorithm is as follows:
Step a obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;
Step b obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;
Step c compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests.Then formula is utilized , wherein for the dominant frequency of CPU, for memory size.Calculate the weight of the work nest taken at present in this resource pool.This work nest is distributed to the less resource pool of weight, come before list by it;
If steps d only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.
Above-described embodiment does not limit the present invention in any way, and the technical scheme that the mode that every employing is equal to replacement or equivalent transformation obtains all drops in protection scope of the present invention.

Claims (3)

1. the Hadoop task equity dispatching method under isomerous environment, is characterized in that step is as follows:
Step 1, reading configuration information
The system information that initialization dispatching method needs, inherit the class of Fair Scheduler definition: resource pool, minimum shared amount, weight etc., user's group that each resource pool is corresponding, this user organizes the operation submitted to all leaves in this resource pool, and minimum shared amount is the minimum groove that resource pool needs when normally working to take; Weight is then for representing that each resource pool or operation are obtaining the priority had in resource;
Step 2, start more new thread
More new thread primary responsibility upgrades: upgrade the job state of in current system all groups, upgrade each group of needs how many statistics groove and converge groove, upgrade the Fairshare amount of each group and heavy dispensed amount, if allow to seize, then according to system current state, seize;
Step 3, judge whether to perform large operation
If in certain performer, available free groove exists in system, so judge whether have the large operation of memory requirements to need scheduling in current system, if there is large memory operation, and the authority of large memory operation group reaches maximal value, then perform step 4, otherwise perform step 5;
Step 4, employing job scheduling method, from the job queue in large memory operation group, select operation to distribute, judge whether current task executor is applicable to running large memory operation, if unaccommodated words, select next operation, until choose suitable operation, if there is no suitable operation, so proceed to the 4th step; If chosen suitable operation, calling task allocation algorithm, current groove is distributed to this task, dispatching method terminates current distribution;
If step 5 fail large memory operation distribute, then call resource pool dispatching algorithm, suitable operation group is selected from Current resource pond, then job scheduling method is called, suitable operation is selected from the operation set of this operation group, last calling task allocation algorithm selects task, distributes, and dispatching method terminates current distribution.
2. the Hadoop task equity dispatching method under isomerous environment according to claim 1, is characterized in that: the job scheduling algorithm's concrete steps in described step 4 are as follows:
Operation in step 2.1 pair resource pool carries out calculating weight according to following computing formula, ; Wherein for this operation is from being submitted to the time waited at present; , wherein Remainer is the remaining task number of this operation;
Step 2.2 is according to weight operation is sorted;
Step 2.3 selects operation to dispatch successively from the head of job queue.
3. the Hadoop task equity dispatching method under isomerous environment according to claim 1 and 2, is characterized in that the resource pool dispatching algorithm in described step 5:
Step 3.1 obtains the task number t1 participating in running in two resource pools comparing respectively, t2, and performer tt1 and tt2 running these tasks;
Step 3.2 obtains minimum shared amount and the demand in Current resource pond respectively, and asks minimum value minShare1 therebetween and minShare2;
Step 3.3 compares the difference between t1 and minShare1, if the minimum shared amount of resource pool that two participations are compared all does not have the number of task many, so represents resource pool all demand work nests, then utilizes formula , wherein for the dominant frequency of CPU, for memory size, calculate the weight of the work nest taken at present in this resource pool, this work nest is distributed to the less resource pool of weight, come before list by it;
If step 3.4 only has the minimum shared amount of one of them resource pool less than task amount, so represent and only have this resource pool to need work nest, so this work nest can be distributed to this resource pool, come before list by it.
CN201310283998.6A 2013-07-08 2013-07-08 Fair Hadoop task scheduling method in heterogeneous environment Pending CN104281492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310283998.6A CN104281492A (en) 2013-07-08 2013-07-08 Fair Hadoop task scheduling method in heterogeneous environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310283998.6A CN104281492A (en) 2013-07-08 2013-07-08 Fair Hadoop task scheduling method in heterogeneous environment

Publications (1)

Publication Number Publication Date
CN104281492A true CN104281492A (en) 2015-01-14

Family

ID=52256393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310283998.6A Pending CN104281492A (en) 2013-07-08 2013-07-08 Fair Hadoop task scheduling method in heterogeneous environment

Country Status (1)

Country Link
CN (1) CN104281492A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373426A (en) * 2015-07-28 2016-03-02 哈尔滨工程大学 Method for memory ware real-time job scheduling of car networking based on Hadoop
CN105610621A (en) * 2015-12-31 2016-05-25 中国科学院深圳先进技术研究院 Method and device for dynamically adjusting task level parameter of distributed system architecture
CN106095573A (en) * 2016-06-08 2016-11-09 北京大学 The Storm platform operations of a kind of work nest perception divides equally dispatching method
WO2016197716A1 (en) * 2016-01-18 2016-12-15 中兴通讯股份有限公司 Task scheduling method and device
CN113454614A (en) * 2018-12-04 2021-09-28 华为技术加拿大有限公司 System and method for resource partitioning in distributed computing
GB2571651B (en) * 2016-10-21 2022-09-21 Datarobot Inc Systems for predictive data analytics, and related methods and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194248A1 (en) * 2001-05-01 2002-12-19 The Regents Of The University Of California Dedicated heterogeneous node scheduling including backfill scheduling
CN101916209A (en) * 2010-08-06 2010-12-15 华东交通大学 Cluster task resource allocation method for multi-core processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194248A1 (en) * 2001-05-01 2002-12-19 The Regents Of The University Of California Dedicated heterogeneous node scheduling including backfill scheduling
CN101916209A (en) * 2010-08-06 2010-12-15 华东交通大学 Cluster task resource allocation method for multi-core processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏祎: "Hadoop平台下的作业调度算法研究与改进", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
姜淼: "Hadoop云平台下调度算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373426A (en) * 2015-07-28 2016-03-02 哈尔滨工程大学 Method for memory ware real-time job scheduling of car networking based on Hadoop
CN105373426B (en) * 2015-07-28 2019-01-15 哈尔滨工程大学 A kind of car networking memory aware real time job dispatching method based on Hadoop
CN105610621A (en) * 2015-12-31 2016-05-25 中国科学院深圳先进技术研究院 Method and device for dynamically adjusting task level parameter of distributed system architecture
CN105610621B (en) * 2015-12-31 2019-04-26 中国科学院深圳先进技术研究院 A kind of method and device of distributed system architecture task level dynamic state of parameters adjustment
WO2016197716A1 (en) * 2016-01-18 2016-12-15 中兴通讯股份有限公司 Task scheduling method and device
CN106095573A (en) * 2016-06-08 2016-11-09 北京大学 The Storm platform operations of a kind of work nest perception divides equally dispatching method
GB2571651B (en) * 2016-10-21 2022-09-21 Datarobot Inc Systems for predictive data analytics, and related methods and apparatus
CN113454614A (en) * 2018-12-04 2021-09-28 华为技术加拿大有限公司 System and method for resource partitioning in distributed computing

Similar Documents

Publication Publication Date Title
CN104281492A (en) Fair Hadoop task scheduling method in heterogeneous environment
Cheng et al. Heterogeneity-aware workload placement and migration in distributed sustainable datacenters
Rejiba et al. Custom scheduling in Kubernetes: A survey on common problems and solution approaches
CN103593323A (en) Machine learning method for Map Reduce task resource allocation parameters
CN104112049B (en) Based on the MapReduce task of P2P framework across data center scheduling system and method
CN101582043A (en) Dynamic task allocation method of heterogeneous computing system
Cheng et al. Heterogeneity aware workload management in distributed sustainable datacenters
CN102708003A (en) Method for allocating resources under cloud platform
Abdullah et al. Diminishing returns and deep learning for adaptive CPU resource allocation of containers
Chen et al. Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud
CN101971141A (en) System and method for managing a hybrid compute environment
Lu et al. InSTechAH: Cost-effectively autoscaling smart computing hadoop cluster in private cloud
Harichane et al. KubeSC‐RTP: Smart scheduler for Kubernetes platform on CPU‐GPU heterogeneous systems
Li et al. Cost-efficient fault-tolerant workflow scheduling for deadline-constrained microservice-based applications in clouds
CN106802822A (en) A kind of cloud data center cognitive resources dispatching method based on moth algorithm
Niu et al. An adaptive efficiency-fairness meta-scheduler for data-intensive computing
Guyon et al. How much energy can green HPC cloud users save?
CN105426247A (en) HLA federate planning and scheduling method
Xu et al. Intelligent scheduling for parallel jobs in big data processing systems
Malathy et al. Performance improvement in cloud computing using resource clustering
Zhou et al. Resource allocation in cloud computing based on clustering method
Li et al. Cress: Dynamic scheduling for resource constrained jobs
Reuther et al. Technical challenges of supporting interactive HPC
Han et al. Stochastic Matrix Modelling and Scheduling Algorithm of Distributed Intelligent Computing System
Wang et al. Adaptive Pricing and Online Scheduling for Distributed Machine Learning Jobs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150114