CN103246570A - Hadoop scheduling method and system and management node - Google Patents

Hadoop scheduling method and system and management node Download PDF

Info

Publication number
CN103246570A
CN103246570A CN2013101881806A CN201310188180A CN103246570A CN 103246570 A CN103246570 A CN 103246570A CN 2013101881806 A CN2013101881806 A CN 2013101881806A CN 201310188180 A CN201310188180 A CN 201310188180A CN 103246570 A CN103246570 A CN 103246570A
Authority
CN
China
Prior art keywords
resource
task
scheduling
management node
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101881806A
Other languages
Chinese (zh)
Inventor
孙垚光
黎樵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN2013101881806A priority Critical patent/CN103246570A/en
Publication of CN103246570A publication Critical patent/CN103246570A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a Hadoop scheduling method. The method comprises that a management node obtains resource consumption information of completed tasks in a plurality of computational nodes; the management node generates resource scheduling values according to the resource consumption information of completed tasks in the plurality of computational nodes; the management node receives an assignment request of new tasks and assigns resources for new tasks according to resource scheduling values. According to the Hadoop scheduling method, the stand-alone concurrency of Hadoop computational nodes (TaskTracker) can be improved, so that the resource utilization ratio of the whole cluster (the plurality of computational nodes) can be improved. The invention also discloses a Hadoop scheduling system and the management node.

Description

The dispatching method of Hadoop, system and management node
Technical field
The present invention relates to the cloud computing technical field, particularly the dispatching method of a kind of Hadoop, system and management node.
Background technology
Apache Hadoop is a software platform that can carry out distributed treatment to mass data, and mass data is professional more and more, and the use of Hadoop also more and more widely.Along with the expansion day by day (first generation Hadoop cluster approximately can be supported 4000 machines) of the scale of single cluster, how to improve the cluster resource utilization rate and also become the topic that people are concerned about gradually.The key that improves the cluster resource utilization factor is the scheduling of cluster.
Hadoop supports multiple scheduler at present, substantially all be according to machines configurations information with TaskTracker, distribute fixing groove position (slot) number, such as 16, expression separate unit TaskTracker machine can be carried out 16 Task at most simultaneously, JobTracker dispatches according to these numbers of slots, and each Task takies at least one groove position.
The scheme of this fixed configurations number of slots has two shortcomings:
(1) number of slots that holds of every machine is fixed, the resource of each groove position correspondence is also fixed, Hadoop gives tacit consent to the corresponding 800MB internal memory in each groove position, the Task that only needs the 100MB internal memory in actual moving process, at JobTracker and TaskTracker, still take a groove position, still need to consume the 800MB internal memory;
(2) certain concrete Task takies several grooves position, and the configuration according to submit job converts fully, needs how many resources in the program operation process of user to self under most of situation, can not accomplish very accurate estimating.
Therefore, if the stand-alone configuration number of slots is less, then can't take full advantage of cluster resource, and if configured slot figure place number is more, when the more operation of resource consumption occurring, the not enough situation of unit resource (for example occur because of the complete machine Out of Memory machine delay machine) can appear again.
Summary of the invention
Purpose of the present invention is intended to solve at least one of described technological deficiency.
For this reason, one object of the present invention is to propose a kind of dispatching method that promotes the Hadoop of resource utilization in the computing node.
Another object of the present invention is to propose the dispatching system of a kind of Hadoop.
A further object of the present invention is to propose a kind of management node.
For achieving the above object, the embodiment of first aspect present invention discloses the dispatching method of a kind of Hadoop, may further comprise the steps: management node obtains the resource consumption information of having finished the work in a plurality of computing nodes; Described management node generates the scheduling of resource value according to the resource consumption information of having finished the work in described a plurality of computing nodes; And described management node receives the distribution request of new task, and is described new task Resources allocation according to described scheduling of resource value.
Dispatching method according to the Hadoop of the embodiment of the invention can improve the unit concurrency of Hadoop computing node (TaskTracker), thereby improve the resource utilization of whole cluster (a plurality of computing node).
In addition, the dispatching method of Hadoop according to the above embodiment of the present invention can also have following additional technical characterictic:
In some instances, operation has a plurality of tasks in the described computing node.
In some instances, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
In some instances, described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
The embodiment of second aspect present invention discloses the dispatching system of a kind of Hadoop, comprise management node and a plurality of computing node, wherein, management node, be used for obtaining the resource consumption information that a plurality of computing nodes have been finished the work, and generate the scheduling of resource value according to the resource consumption information of having finished the work in described a plurality of computing nodes, and after the distribution request that receives new task, be described new task Resources allocation according to described scheduling of resource value.
Dispatching system according to the Hadoop of the embodiment of the invention can improve the unit concurrency of Hadoop computing node (TaskTracker), thereby improve the resource utilization of whole cluster (a plurality of computing node).
In addition, the dispatching system of Hadoop according to the above embodiment of the present invention can also have following additional technical characterictic:
In some instances, operation has a plurality of tasks in the described computing node.
In some instances, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
In some instances, described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
The embodiment of third aspect present invention discloses a kind of management node, comprising: acquisition module is used for obtaining the resource consumption information that a plurality of computing nodes have been finished the work; Generation module, the resource consumption information that is used for having finished the work according to described a plurality of computing nodes generates the scheduling of resource value; And resource distribution module, being used for after the distribution request that receives new task according to described scheduling of resource value is described new task Resources allocation.
According to the management node of the embodiment of the invention, can improve the unit concurrency of Hadoop computing node, thereby improve the resource utilization of whole cluster (a plurality of computing node).
In addition, management node according to the above embodiment of the present invention can also have following additional technical characterictic:
In some instances, operation has a plurality of tasks in the described computing node.
In some instances, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
In some instances, described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
The aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Of the present invention and/or additional aspect and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the process flow diagram of the dispatching method of Hadoop according to an embodiment of the invention;
Fig. 2 is the detail flowchart of the dispatching method of Hadoop according to an embodiment of the invention;
Fig. 3 is the structural drawing of the dispatching system of Hadoop according to an embodiment of the invention; And
Fig. 4 is the structural drawing of management node according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that, term " vertically ", " laterally ", " on ", close the orientation of indications such as D score, " preceding ", " back ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward " or position is based on orientation shown in the drawings or position relation, only be that the present invention for convenience of description and simplification are described, rather than indication or the hint device of indication or element must have specific orientation, with specific orientation structure and operation, therefore can not be interpreted as limitation of the present invention.
In description of the invention, need to prove, unless otherwise prescribed and limit, term " installation ", " linking to each other ", " connection " should be done broad understanding, for example, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be directly to link to each other, and also can link to each other indirectly by intermediary, for the ordinary skill in the art, can understand the concrete implication of described term as the case may be.
Below in conjunction with dispatching method, system and the management node of accompanying drawing description according to the Hadoop of the embodiment of the invention.
Fig. 1 is the process flow diagram of the dispatching method of Hadoop according to an embodiment of the invention.As shown in Figure 1, the dispatching method of this Hadoop comprises the steps:
Step S101: management node obtains the resource consumption information of having finished the work in a plurality of computing nodes.
Wherein, operation has a plurality of tasks in the computing node, namely can move a plurality of tasks in each computing node.And the task in the computing node can be sent to management node with the resource consumption information of task correspondence by heartbeat message after task finishes.In this example, if operation has a plurality of tasks in the computing node, then the resource consumption information in this computing node is total resource consumption information of all tasks of moving in this computing node.
In conjunction with shown in Figure 2, management node is Master node and scheduler, is shown in (1) by symbol among Fig. 2, certain the concrete operation of Master node and scheduler schedules, resource information according to this operation configuration starts a collection of Task, such as each Task default allocation internal memory 800MB.
When Task specifically carried out in the computing node, the resource information that computing node collection self Task group consumes was reported to Master node and scheduler with heartbeat when Task finishes.In this example, Task finishes to refer to that tasks all in the computing node is all finished dealing with or some completed task.
Step S102: management node generates the scheduling of resource value according to the resource consumption information of having finished the work in a plurality of computing nodes.
As a concrete example, be shown in (2) as symbol among Fig. 2, Master node and scheduler are collected the resource consumption information that the Task of all computing nodes reports up, and calculate an average T ask resource consumption (being the scheduling of resource value), and get access to computing node at every turn and report the resource consumption information of having finished the work in the next computing node, the Task memory consumption that i.e. each report comes up, all as a new collection sample, and by management node the scheduling of resource value is upgraded.
For example: management node generates the scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1), in this example, management node can carry out flexible configuration to the p value according to the operating feature of a plurality of computing nodes.In other words, the current single Task average resource of up-to-date single Task average resource consumption=last samples value * p+ consumes * (1-p).
Step S103: management node receives the distribution request of new task, and is the new task Resources allocation according to the scheduling of resource value.
Be shown in (3) in conjunction with symbol among Fig. 2, be the follow-up scheduling of Master node and the scheduler resource information that the time do not re-use the operation configuration of above-mentioned acquiescence (as be defaulted as each Task configuration 800MB internal memory), but adopt up-to-date single Task average resource to consume scheduling of resource value as Task in the computing node.Be 500MB such as the scheduling of resource value that calculates, then management node is each Task storage allocation 500MB in the computing node.
Below the dispatching method of the Hadoop of the embodiment of the invention is carried out applicating example, as follows:
Randomly draw a TaskTracker machine (computing node), this machine can be for saving as 24GB in the scheduling, proportioning according to 800MB pickup groove position, this machine can dispose 30 groove positions at most (also needs to consider the memory cost outside the Task in the practical operation, therefore the number of slots actual disposition of this machine can be far below 30, be generally 10~20), and the scheduling of resource value that the dispatching method that utilizes the Hadoop of embodiment generates is dispatched, after memory source implemented collection of resources and dynamically adjust then, the task information that this machine is moving is as shown in table 1,21 Task have been moved altogether, be converted into the number of slots of Hadoop, 38 groove positions have then been taken (usually, the Task that surpasses 800MB among the Hadoop can take a plurality of grooves position, and for example 1500MB takies 2 groove positions, and 2100MB takies 3 groove positions), as shown in Table 1, the dispatching method of the Hadoop of the embodiment of the invention has greatly improved the memory usage of the concurrent and unit of unit (computing node).
Table 1
In the above description, the Distributed Calculation platform that Hadoop increases income for the Apache foundation, Jobtracker are that the Master node (management node) of Hadoop cluster, execution computing node, the Slot that Tasktracker is the Hadoop cluster are the groove position, the performance element that it is the Hadoop operation that Task, a Task can be carried out in groove position.
Dispatching method according to the Hadoop of the embodiment of the invention, Task on the uniform machinery (computing node) is divided into groups, include but not limited to that " with process group ID (pgid) " is unit, be unit etc. with " TaskID ", management node no longer disposes " unit number of slots " to TaskTracker, but directly configuration " unit available resources ", and can utilize information in ps instrument or the proc file system, add up each Task and be grouped in the resource situation of actual consumption in the operational process, along with constantly moving, finishes Task, management node can obtain the divide into groups resource of required consumption of the single Task of this operation, and shared resource size when adjusting follow-up Task operation according to the scheduling of resource value that the resource consumption information of having finished the work in a plurality of computing nodes generates, dispatching system is not always dispatched according to default resource, but dispatch according to the real resource consumption figures (scheduling of resource value) of concrete Task, and in TaskTracker execution Task process, utilize certain technology, prevent unit TaskTracker because machine OOM too much takes place memory consumption.
The method of the embodiment of the invention has solved, the scheduling defective that the fixing number of slots of the TaskTracker machines configurations of Hadoop brings, as remove the concept of " groove position ", the last configuration of TaskTracker directly be spendable resource, include but not limited to internal memory, CPU, IO etc., and the resource that scheduler (management node) does not always arrange according to the user in scheduling process do not dispatch, but the resource consumption situation of the Task that finishes according to actual motion is dynamically adjusted the resource of the Task that distributes to follow-up startup.
The method of the embodiment of the invention can improve the unit concurrency of Hadoop computing node (TaskTracker), thereby improves the resource utilization of whole cluster.Generally speaking, the computing node stand-alone configuration number of slots of Hadoop is according to the difference of self EMS memory configuration and difference, generally between 10~20, concrete resource utilization gets a promotion in the method computing node of the embodiment of the invention and utilize, for example: observe through actual effect on the line, save as the Hadoop machine of 16GB in one, the configured slot figure place is 16, namely moves 16 Task at most simultaneously, adopt this technical scheme after, the concurrent Task of unit can reach more than 20, promotes resource utilization 20%.
In addition, the method for the embodiment of the invention uses most scenes at Hadoop.
(1) just in time takies the operation of an integer groove position resource for single Task resource consumption, DeGrain (the worst result maintains an equal level with Hadoop, for example single Task to consume just in time be the 800MB internal memory).
(2) for single Task consumption of natural resource and N the operation that groove position resource gap is bigger, effect is more remarkable.
Fig. 3 is the structural drawing of the dispatching system of Hadoop according to an embodiment of the invention.As shown in Figure 3, the dispatching system 300 of the Hadoop of the embodiment of the invention comprises management node 310 and a plurality of computing node 320.
Wherein, management node 310 is used for obtaining the resource consumption information that a plurality of computing nodes 320 have been finished the work, and generate the scheduling of resource value according to the resource consumption information of having finished the work in a plurality of computing nodes 320, and after the distribution request that receives new task, be the new task Resources allocation according to the scheduling of resource value.
Specifically, operation has a plurality of tasks in the computing node 320, namely can move a plurality of tasks in each computing node 320.And the task in the computing node 320 can be sent to management node 310 with the resource consumption information of task correspondence by heartbeat message after task finishes.In this example, if operation has a plurality of tasks in the computing node 320, then the resource consumption information in this computing node 320 is total resource consumption information of all tasks of operation in this computing node 320.
In conjunction with shown in Figure 2, management node 310 is Master node and scheduler, is shown in (1) by symbol among Fig. 2, certain the concrete operation of Master node and scheduler schedules, resource information according to this operation configuration starts a collection of Task, such as each Task default allocation internal memory 800MB.
When Task specifically carried out in the computing node 320, the resource information that computing node 320 collections self Task group consumes was reported to Master node and scheduler with heartbeat when Task finishes.In this example, Task finishes to refer to that tasks all in the computing node 320 is all finished dealing with or some completed task.
Be shown in (2) as symbol among Fig. 2, Master node and scheduler are collected the resource consumption information that the Task of all computing nodes 320 reports up, and calculate an average T ask resource consumption (being the scheduling of resource value), and get access to computing node 320 at every turn and report the resource consumption information of having finished the work in the next computing node 320, the Task memory consumption that i.e. each report comes up, upgrade all as a new collection sample, and by 310 pairs of scheduling of resource values of management node.
For example: management node 310 generates the scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1), in this example, management node 310 can carry out flexible configuration to the p value according to the operating feature of a plurality of computing nodes 320.In other words, the current single Task average resource of up-to-date single Task average resource consumption=last samples value * p+ consumes * (1-p).
Be shown in (3) in conjunction with symbol among Fig. 2, be the follow-up scheduling of Master node and the scheduler resource information that the time do not re-use the operation configuration of above-mentioned acquiescence (as be defaulted as each Task configuration 800MB internal memory), but adopt up-to-date single Task average resource to consume scheduling of resource value as Task in the computing node 320.Be 500MB such as the scheduling of resource value that calculates, then management node 310 is each Task storage allocation 500MB in the computing node 320.
In the above description, the Distributed Calculation platform that Hadoop increases income for the Apache foundation, Jobtracker are that the Master node (management node 310) of Hadoop cluster, execution computing node 320, the Slot that Tasktracker is the Hadoop cluster are the groove position, the performance element that it is the Hadoop operation that Task, a Task can be carried out in groove position.
Dispatching system according to the Hadoop of the embodiment of the invention, Task on the uniform machinery (computing node 320) is divided into groups, include but not limited to that " with process group ID (pgid) " is unit, be unit etc. with " TaskID ", management node no longer disposes " unit number of slots " to TaskTracker, but directly configuration " unit available resources ", and can utilize information in ps instrument or the proc file system, add up each Task and be grouped in the resource situation of actual consumption in the operational process, along with constantly moving, finishes Task, management node can obtain the divide into groups resource of required consumption of the single Task of this operation, and shared resource size when adjusting follow-up Task operation according to the scheduling of resource value that the resource consumption information of having finished the work in a plurality of computing nodes generates, dispatching system is not always dispatched according to default resource, but dispatch according to the real resource consumption figures (scheduling of resource value) of concrete Task, and in TaskTracker execution Task process, utilize certain technology, prevent unit TaskTracker because machine OOM too much takes place memory consumption.
The system of the embodiment of the invention has solved, the scheduling defective that the fixing number of slots of the TaskTracker machines configurations of Hadoop brings, as remove the concept of " groove position ", the last configuration of TaskTracker directly be spendable resource, include but not limited to internal memory, CPU, IO etc., and the resource that scheduler (management node) does not always arrange according to the user in scheduling process do not dispatch, but the resource consumption situation of the Task that finishes according to actual motion is dynamically adjusted the resource of the Task that distributes to follow-up startup.
The system of the embodiment of the invention can improve the unit concurrency of Hadoop computing node (TaskTracker), thereby improves the resource utilization of whole cluster.Generally speaking, the computing node stand-alone configuration number of slots of Hadoop is according to the difference of self EMS memory configuration and difference, and generally between 10~20, concrete resource utilization gets a promotion in the method computing node of the embodiment of the invention and utilize.
In addition, the system of the embodiment of the invention can use the most scenes at Hadoop.
(1) just in time takies the operation of an integer groove position resource for single Task resource consumption, DeGrain (the worst result maintains an equal level with Hadoop, for example single Task to consume just in time be the 800MB internal memory).
(2) for single Task consumption of natural resource and N the operation that groove position resource gap is bigger, effect is more remarkable.
Fig. 4 is the structural drawing of management node according to an embodiment of the invention.As shown in Figure 4, the management node 310 of the embodiment of the invention comprises: acquisition module 311, generation module 312 and resource distribution module 313.
Wherein, acquisition module 311 is used for obtaining the resource consumption information that a plurality of computing nodes 320 have been finished the work.The resource consumption information that generation module 312 is used for having finished the work according to a plurality of computing nodes 320 generates the scheduling of resource value.It is the new task Resources allocation that resource distribution module 313 is used for after the distribution request that receives new task according to the scheduling of resource value.
Specifically, operation has a plurality of tasks in the computing node 320, namely can move a plurality of tasks in each computing node 320.And the task in the computing node 320 can be sent to management node 310 with the resource consumption information of task correspondence by heartbeat message after task finishes.In this example, if operation has a plurality of tasks in the computing node 320, then the resource consumption information in this computing node 320 is total resource consumption information of all tasks of operation in this computing node 320.
In conjunction with shown in Figure 2, management node 310 is Master node and scheduler, is shown in (1) by symbol among Fig. 2, certain the concrete operation of Master node and scheduler schedules, resource information according to this operation configuration starts a collection of Task, such as each Task default allocation internal memory 800MB.
When Task specifically carried out in the computing node 320, the resource information that computing node 320 collections self Task group consumes was reported to Master node and scheduler with heartbeat when Task finishes.In this example, Task finishes to refer to that tasks all in the computing node 320 is all finished dealing with or some completed task.
Be shown in (2) as symbol among Fig. 2, Master node and scheduler are collected the resource consumption information that the Task of all computing nodes 320 reports up, and calculate an average T ask resource consumption (being the scheduling of resource value), and get access to computing node 320 at every turn and report the resource consumption information of having finished the work in the next computing node 320, the Task memory consumption that i.e. each report comes up, upgrade all as a new collection sample, and by 310 pairs of scheduling of resource values of management node.
For example: management node 310 generates the scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1), in this example, management node 310 can carry out flexible configuration to the p value according to the operating feature of a plurality of computing nodes 320.In other words, the current single Task average resource of up-to-date single Task average resource consumption=last samples value * p+ consumes * (1-p).
Be shown in (3) in conjunction with symbol among Fig. 2, be the follow-up scheduling of Master node and the scheduler resource information that the time do not re-use the operation configuration of above-mentioned acquiescence (as be defaulted as each Task configuration 800MB internal memory), but adopt up-to-date single Task average resource to consume scheduling of resource value as Task in the computing node 320.Be 500MB such as the scheduling of resource value that calculates, then management node 310 is each Task storage allocation 500MB in the computing node 320.
In the above description, the Distributed Calculation platform that Hadoop increases income for the Apache foundation, Jobtracker are that the Master node (management node 310) of Hadoop cluster, execution computing node 320, the Slot that Tasktracker is the Hadoop cluster are the groove position, the performance element that it is the Hadoop operation that Task, a Task can be carried out in groove position.
Management node according to the embodiment of the invention, can add up each Task and be grouped in the resource situation of actual consumption in the operational process, along with constantly moving, finishes Task, management node can obtain the divide into groups resource of required consumption of the single Task of this operation, and shared resource size when adjusting follow-up Task operation according to the scheduling of resource value that the resource consumption information of having finished the work in a plurality of computing nodes generates, dispatching system is not always dispatched according to default resource, but dispatch according to the real resource consumption figures (scheduling of resource value) of concrete Task, and in TaskTracker execution Task process, utilize certain technology, prevent unit TaskTracker because machine OOM too much takes place memory consumption.
The management node of the embodiment of the invention has solved the scheduling defective that the fixing number of slots of TaskTracker machines configurations of Hadoop brings, as remove the concept of " groove position ", the last configuration of TaskTracker directly be spendable resource, include but not limited to internal memory, CPU, IO etc., and the resource that scheduler (management node) does not always arrange according to the user in scheduling process do not dispatch, but the resource consumption situation of the Task that finishes according to actual motion is dynamically adjusted the resource of the Task that distributes to follow-up startup.
The management node of the embodiment of the invention can improve the unit concurrency of Hadoop computing node (TaskTracker), thereby improves the resource utilization of whole cluster.Generally speaking, the computing node stand-alone configuration number of slots of Hadoop is according to the difference of self EMS memory configuration and difference, and generally between 10~20, concrete resource utilization gets a promotion in the method computing node of the embodiment of the invention and utilize.
In addition, the management node of the embodiment of the invention can use the most scenes at Hadoop.
(1) just in time takies the operation of an integer groove position resource for single Task resource consumption, DeGrain (the worst result maintains an equal level with Hadoop, for example single Task to consume just in time be the 800MB internal memory).
(2) for single Task consumption of natural resource and N the operation that groove position resource gap is bigger, effect is more remarkable.
In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to described term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment that scope of the present invention is by claims and be equal to and limit.

Claims (12)

1. the dispatching method of a Hadoop is characterized in that, may further comprise the steps:
Management node obtains the resource consumption information of having finished the work in a plurality of computing nodes;
Described management node generates the scheduling of resource value according to the resource consumption information of having finished the work in described a plurality of computing nodes; And
Described management node receives the distribution request of new task, and is described new task Resources allocation according to described scheduling of resource value.
2. the method for claim 1 is characterized in that, operation has a plurality of tasks in the described computing node.
3. method as claimed in claim 1 or 2 is characterized in that, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
4. as each described method of claim 1-3, it is characterized in that described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
5. the dispatching system of a Hadoop is characterized in that, comprises management node and a plurality of computing node, wherein,
Management node, be used for obtaining the resource consumption information that a plurality of computing nodes have been finished the work, and generate the scheduling of resource value according to the resource consumption information of having finished the work in described a plurality of computing nodes, and after the distribution request that receives new task, be described new task Resources allocation according to described scheduling of resource value.
6. system as claimed in claim 5 is characterized in that, operation has a plurality of tasks in the described computing node.
7. system as claimed in claim 5 is characterized in that, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
8. system as claimed in claim 5 is characterized in that, described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
9. a management node is characterized in that, comprising:
Acquisition module is used for obtaining the resource consumption information that a plurality of computing nodes have been finished the work;
Generation module, the resource consumption information that is used for having finished the work according to described a plurality of computing nodes generates the scheduling of resource value; And
Resource distribution module, being used for after the distribution request that receives new task according to described scheduling of resource value is described new task Resources allocation.
10. management node as claimed in claim 9 is characterized in that, operation has a plurality of tasks in the described computing node.
11. management node as claimed in claim 9 is characterized in that, the task in the described computing node is sent to described management node by heartbeat message with the resource consumption information of described task correspondence after described task finishes.
12. management node as claimed in claim 9 is characterized in that, described management node generates described scheduling of resource value by following formula:
Up-to-date scheduling of resource value=last samples value * p+ current scheduling of resource value * (1-p), wherein, the p value is (0,1).
CN2013101881806A 2013-05-20 2013-05-20 Hadoop scheduling method and system and management node Pending CN103246570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101881806A CN103246570A (en) 2013-05-20 2013-05-20 Hadoop scheduling method and system and management node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101881806A CN103246570A (en) 2013-05-20 2013-05-20 Hadoop scheduling method and system and management node

Publications (1)

Publication Number Publication Date
CN103246570A true CN103246570A (en) 2013-08-14

Family

ID=48926101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101881806A Pending CN103246570A (en) 2013-05-20 2013-05-20 Hadoop scheduling method and system and management node

Country Status (1)

Country Link
CN (1) CN103246570A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870340A (en) * 2014-03-06 2014-06-18 华为技术有限公司 Data processing method and control node in stream computation system and stream computation system
CN104199739A (en) * 2014-08-26 2014-12-10 浪潮(北京)电子信息产业有限公司 Speculation type Hadoop scheduling method based on load balancing
CN104780146A (en) * 2014-01-13 2015-07-15 华为技术有限公司 Resource manage method and device
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN103685492B (en) * 2013-12-03 2017-01-25 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN103605576B (en) * 2013-11-25 2017-02-08 华中科技大学 Multithreading-based MapReduce execution system
WO2017107456A1 (en) * 2015-12-25 2017-06-29 乐视控股(北京)有限公司 Method and apparatus for determining resources consumed by task
CN106982137A (en) * 2017-03-08 2017-07-25 中国人民解放军国防科学技术大学 Hadoop cluster Automation arranging methods based on kylin cloud computing platform
CN107179945A (en) * 2017-03-31 2017-09-19 北京奇艺世纪科技有限公司 A kind of resource allocation methods and device
CN107203422A (en) * 2016-08-28 2017-09-26 深圳晶泰科技有限公司 A kind of job scheduling method towards high-performance calculation cloud platform
CN107515786A (en) * 2017-08-04 2017-12-26 北京奇虎科技有限公司 Resource allocation methods, master device, from device and distributed computing system
CN108268316A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 The method and device of job scheduling
CN108827382A (en) * 2018-06-13 2018-11-16 珠海格力电器股份有限公司 Method for diagnosing faults, apparatus and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080263559A1 (en) * 2004-01-30 2008-10-23 Rajarshi Das Method and apparatus for utility-based dynamic resource allocation in a distributed computing system
CN101593134A (en) * 2009-06-29 2009-12-02 北京航空航天大学 Virtual machine cpu resource distribution method and device
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080263559A1 (en) * 2004-01-30 2008-10-23 Rajarshi Das Method and apparatus for utility-based dynamic resource allocation in a distributed computing system
CN101593134A (en) * 2009-06-29 2009-12-02 北京航空航天大学 Virtual machine cpu resource distribution method and device
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BIKASH SHARMA等: "MROrchestrator: A Fine-Grained Resource", 《2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING》, 29 June 2012 (2012-06-29), pages 1 - 8, XP032215271, DOI: doi:10.1109/CLOUD.2012.37 *
林伟伟等: "云计算资源调度研究综述", 《计算机科学》, vol. 39, no. 10, 31 October 2012 (2012-10-31), pages 1 - 5 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605576B (en) * 2013-11-25 2017-02-08 华中科技大学 Multithreading-based MapReduce execution system
CN103685492B (en) * 2013-12-03 2017-01-25 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
US10503555B2 (en) 2014-01-13 2019-12-10 Huawei Technologies Co., Ltd. Selecting type and quantity of application masters that need to be started in advance
CN104780146A (en) * 2014-01-13 2015-07-15 华为技术有限公司 Resource manage method and device
WO2015103925A1 (en) * 2014-01-13 2015-07-16 华为技术有限公司 Resource management method and apparatus
CN104780146B (en) * 2014-01-13 2018-04-27 华为技术有限公司 Method for managing resource and device
CN103870340B (en) * 2014-03-06 2017-11-07 华为技术有限公司 Data processing method, control node and stream calculation system in stream calculation system
US10097595B2 (en) 2014-03-06 2018-10-09 Huawei Technologies Co., Ltd. Data processing method in stream computing system, control node, and stream computing system
CN107729147B (en) * 2014-03-06 2021-09-21 华为技术有限公司 Data processing method in stream computing system, control node and stream computing system
WO2015131721A1 (en) * 2014-03-06 2015-09-11 华为技术有限公司 Data processing method in stream computing system, control node and stream computing system
CN107729147A (en) * 2014-03-06 2018-02-23 华为技术有限公司 Data processing method, control node and stream calculation system in stream calculation system
CN103870340A (en) * 2014-03-06 2014-06-18 华为技术有限公司 Data processing method and control node in stream computation system and stream computation system
CN104199739A (en) * 2014-08-26 2014-12-10 浪潮(北京)电子信息产业有限公司 Speculation type Hadoop scheduling method based on load balancing
CN104199739B (en) * 2014-08-26 2018-09-25 浪潮(北京)电子信息产业有限公司 A kind of speculating type Hadoop dispatching methods based on load balancing
CN105159769B (en) * 2015-09-11 2018-06-29 国电南瑞科技股份有限公司 A kind of Distributed Job Scheduling method suitable for computing capability isomeric group
CN105159769A (en) * 2015-09-11 2015-12-16 国电南瑞科技股份有限公司 Distributed job scheduling method suitable for heterogeneous computational capability cluster
WO2017107456A1 (en) * 2015-12-25 2017-06-29 乐视控股(北京)有限公司 Method and apparatus for determining resources consumed by task
CN107203422B (en) * 2016-08-28 2020-09-01 深圳晶泰科技有限公司 Job scheduling method for high-performance computing cloud platform
CN107203422A (en) * 2016-08-28 2017-09-26 深圳晶泰科技有限公司 A kind of job scheduling method towards high-performance calculation cloud platform
CN108268316A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 The method and device of job scheduling
CN106982137A (en) * 2017-03-08 2017-07-25 中国人民解放军国防科学技术大学 Hadoop cluster Automation arranging methods based on kylin cloud computing platform
CN106982137B (en) * 2017-03-08 2019-09-20 中国人民解放军国防科学技术大学 Hadoop cluster Automation arranging method based on kylin cloud computing platform
CN107179945A (en) * 2017-03-31 2017-09-19 北京奇艺世纪科技有限公司 A kind of resource allocation methods and device
CN107515786A (en) * 2017-08-04 2017-12-26 北京奇虎科技有限公司 Resource allocation methods, master device, from device and distributed computing system
CN108827382A (en) * 2018-06-13 2018-11-16 珠海格力电器股份有限公司 Method for diagnosing faults, apparatus and system

Similar Documents

Publication Publication Date Title
CN103246570A (en) Hadoop scheduling method and system and management node
CN111813513B (en) Method, device, equipment and medium for scheduling real-time tasks based on distribution
Ibrahim et al. Governing energy consumption in Hadoop through CPU frequency scaling: An analysis
Chen et al. Green-aware workload scheduling in geographically distributed data centers
Gu et al. Greening cloud data centers in an economical way by energy trading with power grid
CN101968750B (en) Computer system and working method thereof
CN103279390B (en) A kind of parallel processing system (PPS) towards little optimization of job
CN102508709B (en) Distributed-cache-based acquisition task scheduling method in purchase, supply and selling integrated electric energy acquiring and monitoring system
Bhuiyan et al. Energy-efficient parallel real-time scheduling on clustered multi-core
CN106020934A (en) Optimized deploying method based on virtual cluster online migration
US20130198758A1 (en) Task distribution method and apparatus for multi-core system
CN104252390A (en) Resource scheduling method, device and system
CN103761146A (en) Method for dynamically setting quantities of slots for MapReduce
Kao et al. Data-locality-aware mapreduce real-time scheduling framework
CN109840141B (en) Thread control method and device based on cloud monitoring, electronic equipment and storage medium
CN106293947B (en) GPU-CPU (graphics processing Unit-Central processing Unit) mixed resource allocation system and method in virtualized cloud environment
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
CA2738456A1 (en) Calculator device, system management apparatus, calculation method and program
CN114138488A (en) Cloud-native implementation method and system based on elastic high-performance computing
Bhattacharya et al. Software bloat and wasted joules: Is modularity a hurdle to green software?
CN109586970B (en) Resource allocation method, device and system
Peng et al. Energy-efficient management of data centers using a renewable-aware scheduler
CN112948088B (en) Cloud workflow intelligent management and scheduling system in cloud computing platform
CN103036975A (en) Virtual machine control method and control device
CN103049326A (en) Method and system for managing job program of job management and scheduling system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130814

RJ01 Rejection of invention patent application after publication