CN107832153B - Hadoop cluster resource self-adaptive allocation method - Google Patents

Hadoop cluster resource self-adaptive allocation method Download PDF

Info

Publication number
CN107832153B
CN107832153B CN201711120624.7A CN201711120624A CN107832153B CN 107832153 B CN107832153 B CN 107832153B CN 201711120624 A CN201711120624 A CN 201711120624A CN 107832153 B CN107832153 B CN 107832153B
Authority
CN
China
Prior art keywords
job
type
slave node
map
cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711120624.7A
Other languages
Chinese (zh)
Other versions
CN107832153A (en
Inventor
李林林
张勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201711120624.7A priority Critical patent/CN107832153B/en
Publication of CN107832153A publication Critical patent/CN107832153A/en
Application granted granted Critical
Publication of CN107832153B publication Critical patent/CN107832153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Abstract

The invention provides a Hadoop cluster resource self-adaptive allocation method, which can enable the cluster to run more efficiently. The method comprises the following steps: determining the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node. The invention relates to the field of big data and cloud computing.

Description

Hadoop cluster resource self-adaptive allocation method
Technical Field
The invention relates to the field of big data and cloud computing, in particular to a Hadoop cluster resource self-adaptive distribution method.
Background
With the popularity of large-scale parallel distributed processing systems, especially the wide application of cluster systems, what effective scheduling strategy is adopted to balance the load of each node, and further improve the utilization rate of the whole system resources, which has become a research focus and a hotspot of people.
In recent years, a novel, tall and efficient load balancing algorithm has become one of the research hotspots of domestic and foreign research institutions. The distributed heterogeneous cluster generally has a load balancing problem, the Hadoop platform does not have the capability of detecting the performance of nodes, and although the resource management system YARN of the Hadoop cluster has a scheduling strategy aiming at load imbalance, the YARN is too simple and is not suitable for the complex heterogeneous cluster in reality, so that the load balancing problem is more prominent in the Hadoop platform built based on the heterogeneous cluster.
In the prior art, the adaptive dispatching method of the Hadoop cluster task based on the node capability does not consider the influence of different operation types on resource dispatching, so that the resource division is not refined enough.
Disclosure of Invention
The invention aims to provide a Hadoop cluster resource self-adaptive allocation method to solve the technical problem that resource division is not fine enough in the prior art.
In order to solve the above technical problem, an embodiment of the present invention provides a method for adaptively allocating Hadoop cluster resources, including:
determining the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation;
if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I;
and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node.
Further, the determining the type of the job submitted by the user according to the preset job type classification rule includes:
judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general;
if the operation is a common operation, judging whether the size of the operation is smaller than a preset size threshold value, and if so, judging that the operation is a small operation;
otherwise, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold value, if so, judging that the operation is a CPU type operation or an I/O type operation according to the resource occupation ratio consumed when the operation is executed;
otherwise, judging whether the operation is Map type operation or Reduce type operation according to different load degrees of the operation Map stage and the Reduce stage.
Further, the determining that the job is a CPU type job or an I/O type job includes:
if the job satisfies a first formula, then it is marked as an I/O type job, wherein the first formula is expressed as:
Figure BDA0001467264860000021
if the job satisfies a second formula, then it is marked as a CPU type job, where the second formula is expressed as:
Figure BDA0001467264860000022
wherein n represents the number of tasks being executed in parallel in the slave node, ρ represents the ratio of the output data volume of the Map terminal to the input data volume of the Map terminal, MID represents the input data volume of the Map terminal, DIOR represents the IO transmission rate of the disk, and MTCT represents the time required by the completion of the Map task.
Further, the determining that the job is a Map-type job or a Reduce-type job:
when the operation satisfies a third formula, it is determined that the operation is a Reduce-type operation, otherwise it is determined that the operation is a Map-type operation, wherein the third formula is represented as:
Figure BDA0001467264860000031
wherein S ismapRepresenting the total amount of data, S, input in the Map phasereduceRepresents the total amount of data input in the Reduce stage, and td represents a preset proportional threshold.
Further, the weight value of the slave node is expressed as:
Wi=A·Y+B·D+C·F
wherein, WiThe weight of the slave node i is shown, Y shows the hardware performance of the slave node i, D shows the operating performance of the slave node i, F shows the node failure rate, and A, B, C is a coefficient Y, D, F respectively.
Further, the hardware performance Y of the slave node i is expressed as:
Figure BDA0001467264860000032
wherein S iscpuRepresenting CPU dominant frequency, SmemIndicating the memory capacity, SnetRepresenting the network bandwidth, SdiskIndicating maximum read-write speed, avg, of the diskcpu、avgmem、avgnet、avgdiskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster1、K2、K3、K4Both represent coefficients.
Further, the operation performance D of the slave node i is represented as:
Figure BDA0001467264860000033
wherein, tcmRun time of CPU type jobs in Map phase, t, expressed as unit sizecrRun time of CPU type jobs in Reduce phase, t, expressed as unit sizeiomRun time of I/O type jobs in Map phase, t, expressed as unit sizeiorRun time, avg, for I/O type jobs in Reduce phase expressed as unit sizecmAverage run time, avg, of CPU type jobs clustered at Map stage, expressed as unit sizecrAverage run time, avg, of CPU type jobs, expressed as unit size, clustered during Reduce phaseiomAverage run time, avg, of I/O type jobs, expressed as unit size, clustered in Map phaseiorAverage run time, G, of I/O type jobs, expressed as unit size, clustered during Reduce phase1、G2、G3、G4Both represent coefficients.
Further, the node failure rate F is expressed as:
Figure BDA0001467264860000034
wherein n isnumRepresenting the number of running tasks of each node read by the log file, nfailIndicating the number of failures in the operation of each node, tnumRepresenting the average number of running tasks, t, of the entire clusterfailIndicating the number of tasks that failed the average run.
Further, if the job is a CPU type job, the coefficient K describing the CPU performance of the slave node is increased1、K2And G1、G3A value of (d);
if the operation is I/O type operation, increasing the coefficient K for describing the I/O performance of the slave node3、K4And G2、G4A value of (d);
and if the operation is important, increasing the value of the coefficient C of the node failure rate F.
Further, if the job is a Map type job, the slave nodes with more data storage are preferentially scheduled to perform calculation;
and if the operation is Reduce type operation, preferentially scheduling the slave nodes with large output data volume of the Map task to calculate.
The technical scheme of the invention has the following beneficial effects:
in the scheme, the types of the jobs submitted by the users are determined according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node, so that each slave node receives the task requests in corresponding proportion parameter. Therefore, the influence of the node performance difference and the job different types on job scheduling of the whole cluster is comprehensively considered, and compared with the original resource scheduling method only considering the node capacity difference, finer scheduling of cluster resources is realized, so that the cluster operation is more efficient.
Drawings
Fig. 1 is a schematic flow chart of a Hadoop cluster resource adaptive allocation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a process for determining a type of a job submitted by a user according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a heterogeneous Hadoop cluster according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating the detailed operation of YARN after submitting a job according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the operation of an improved pre-pi estimation process provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of an improved pi estimation process according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the operational status of a WordCount program before improvement according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an improved WordCount program operating scenario according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of the comparison between the improved pre-and post-pi estimation procedure and the WordCount procedure run time provided by the embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a Hadoop cluster resource self-adaptive allocation method aiming at the problem that the existing resource division is not fine enough.
In order to better understand the method for adaptively allocating the Hadoop cluster resources described in this embodiment, a simple description is first performed on the slave nodes, the master node, the Map task, and the Reduce task:
1. the slave node is equivalent to a computing point, has network connection and can independently process tasks issued by the master node and the resource manager. One server can be provided with one slave node or a plurality of slave nodes;
2. the master node is responsible for job classification, resource and task scheduling, and the execution of the tasks is carried out at the slave nodes;
3. each job can be split into N tasks to realize distributed parallel computation, the N tasks can be completed at one slave node or a plurality of different slave nodes, and N is more than or equal to 2;
4. map tasks represent tasks with large calculation amount in the Map stage;
5. reduce tasks represent tasks that are computationally intensive during the Reduce phase.
As shown in fig. 1, the method for adaptively allocating Hadoop cluster resources provided by the embodiment of the present invention includes:
s101, determining the types of the jobs submitted by a user according to a preset job type classification rule, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer;
s102, if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of a slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I;
s103, distributing task requests with corresponding proportions to each slave node according to the weight proportion parameter of each slave node.
The Hadoop cluster resource self-adaptive allocation method determines the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node, so that each slave node receives the task requests in corresponding proportion parameter. Therefore, the influence of the node performance difference and the job different types on job scheduling of the whole cluster is comprehensively considered, and compared with the original resource scheduling method only considering the node capacity difference, finer scheduling of cluster resources is realized, so that the cluster operation is more efficient.
In a specific embodiment of the foregoing method for adaptively allocating Hadoop cluster resources, further, the determining, according to a preset job type classification rule, the type of a job submitted by a user includes:
judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general;
if the operation is a common operation, judging whether the size of the operation is smaller than a preset size threshold value, and if so, judging that the operation is a small operation;
otherwise, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold value, if so, judging that the operation is a CPU type operation or an I/O type operation according to the resource occupation ratio consumed when the operation is executed;
otherwise, judging whether the operation is Map type operation or Reduce type operation according to different load degrees of the operation Map stage and the Reduce stage.
In this embodiment, as shown in fig. 2, the type of the job submitted by the user may be determined according to a preset job type classification rule, and the specific steps may include:
a11, distinguishing important jobs from general property jobs, wherein the classification is to consider that some jobs have higher requirements on the reliability of the cluster in practice, so that the important jobs can be distributed to the slave nodes with higher reliability for calculation after the reliability of each slave node is quantified; the classification method comprises the following steps: judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general.
A12, if the job is a normal job, continuing to classify the jobs submitted by the user according to the job size. It is necessary to classify according to the job size, because if the large jobs are not distinguished in this way but the large jobs are mixed in a queue, it is likely that a situation occurs that a large job is submitted and then a resource is first obtained to be executed, and if a small job is submitted at this time, because the resource can be applied only after the previous large job is executed in the same queue, the waiting time of the small job becomes very long, and the job execution efficiency and the resource utilization rate of the whole cluster are low; the classification method specifically comprises the following steps: if the operation is a general operation, judging whether the size of the operation is smaller than a preset size threshold value, if so, judging that the operation is a small operation, otherwise, judging that the operation is a large operation.
A13, if the job is a big job, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold, if so, judging that the job is a CPU type job or an I/O type job according to the resource occupation ratio consumed when the job is executed; the CPU type operation is mainly performed in a memory, such as various scientific calculations and large-scale data modeling, and the I/O type operation is performed by frequently reading and writing a hard disk or other storage media, such as various data centers, network storage, and a cloud storage server. The classification is beneficial to realizing the fine scheduling of the operation, and the cluster resources are more efficiently and reasonably utilized.
A14, if the difference between the CPU resource and the I/O resource of the slave node in the cluster is not more than the preset difference threshold, judging that the operation is Map type operation or Reduce type operation according to different load degrees of the operation Map stage and the Reduce stage.
According to the steps A11-A14, the job classification work submitted by the user can be completed, and then corresponding queues, namely queues of CPU type jobs, I/O type jobs, Map type jobs, Reduce type jobs, important jobs, small jobs and the like can be respectively established in the fair scheduler according to the job types, so that different jobs can be conveniently and pertinently scheduled through the corresponding queues. The configuration file of the fair scheduler is located in a fair-scheduler xml file under a class path, which can be modified by a yarn. Each queue may be configured in a configuration file. Different scheduling strategies, such as a first-in first-out scheduling strategy, a weight round training scheduling strategy and the like, can still exist in each queue. And then a targeted resource scheduling strategy is adopted according to different operation types, so that more refined scheduling of resources is realized, the load balancing problem of a heterogeneous cluster (for example, a Hadoop cluster) is solved, the resource utilization rate of the whole heterogeneous cluster is improved, and the comprehensive performance of the heterogeneous cluster is improved.
In an embodiment of the foregoing method for Hadoop cluster resource adaptive allocation, the determining whether the job is a CPU type job or an I/O type job further includes:
if the job satisfies a first formula, then it is marked as an I/O type job, wherein the first formula is expressed as:
Figure BDA0001467264860000081
if the job satisfies a second formula, then it is marked as a CPU type job, where the second formula is expressed as:
Figure BDA0001467264860000082
wherein n represents the number of tasks being executed in parallel in the slave node, ρ represents the ratio of the output data volume of the Map terminal to the input data volume of the Map terminal, MID represents the input data volume of the Map terminal, DIOR represents the IO transmission rate of the disk, and MTCT represents the time required by the completion of the Map task.
In this embodiment, ρ × MID is MOD, where MOD represents the output data amount of the map terminal.
In an embodiment of the foregoing method for adaptively allocating resources in a Hadoop cluster, further, the determining that the job is a Map-type job or a Reduce-type job:
when the operation satisfies a third formula, it is determined that the operation is a Reduce-type operation, otherwise it is determined that the operation is a Map-type operation, wherein the third formula is represented as:
Figure BDA0001467264860000083
wherein S ismapRepresenting the total amount of data, S, input in the Map phasereduceRepresents the total amount of data input in the Reduce stage, and td represents a preset proportional threshold.
In the present embodiment, the first and second electrodes are,
Figure BDA0001467264860000084
where K represents the number of servers in the cluster.
In this embodiment, in the Hadoop cluster, the slave node weights are used to measure the performance of different slave nodes, specifically, three aspects of node hardware performance Y, operation performance D, and node failure rate F are measured. The weight of the slave node can be expressed as:
Wi=A·Y+B·D+C·F
wherein, WiThe weight of the slave node i is shown, Y shows the hardware performance of the slave node i, D shows the operating performance of the slave node i, F shows the node failure rate, and A, B, C is a coefficient Y, D, F respectively.
In this embodiment, the hardware performance of the slave node mainly considers the CPU master frequency, the memory capacity, the network bandwidth, the maximum disk read-write speed, and the like of the slave node, and these hardware parameters represent the resource conditions of the slave node itself and are basic indexes for measuring the node performance. The slave node hardware performance can be expressed as:
Figure BDA0001467264860000091
where Y represents the hardware performance of the slave i, ScpuRepresenting CPU dominant frequency, SmemIndicating the memory capacity, SnetRepresenting the network bandwidth, SdiskIndicating maximum read of diskWrite speed, avgcpu、avgmem、avgnet、avgdiskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster1、K2、K3、K4Both represent coefficients.
In this embodiment, the hardware performance indexes of the cluster are all static indexes, so that each parameter of the hardware performance index can be directly obtained from the node.
In this embodiment, since the performance of a slave node cannot be completely expressed only by a part of the hardware performance indicators, the dynamic performance of the slave node needs to be described more accurately by introducing the runtime performance indicators. The running performance is dynamic performance and thus needs to be acquired by reading a log generated after the job is run.
In this embodiment, the operation performance D of the slave node i is represented as:
Figure BDA0001467264860000092
wherein, tcmRun time of CPU type jobs in Map phase, t, expressed as unit sizecrRun time of CPU type jobs in Reduce phase, t, expressed as unit sizeiomRun time of I/O type jobs in Map phase, t, expressed as unit sizeiorRun time, avg, for I/O type jobs in Reduce phase expressed as unit sizecmAverage run time, avg, of CPU type jobs clustered at Map stage, expressed as unit sizecrAverage run time, avg, of CPU type jobs, expressed as unit size, clustered during Reduce phaseiomAverage run time, avg, of I/O type jobs, expressed as unit size, clustered in Map phaseiorAverage run time, G, of I/O type jobs, expressed as unit size, clustered during Reduce phase1、G2、G3、G4Both represent coefficients.
In this embodiment, the running time of the slave node in the two job types, i.e., the CPU type job and the I/O type job, is used as a description index for measuring the actual running performance of the slave node.
In this embodiment, a CPU type job and an I/O type job each taking 1GB data amount are used and run on each slave node, and at this time, tcmRun time of CPU type jobs in Map phase, t, denoted 1GBcrRun time, t, of CPU type jobs in Reduce phase, denoted 1GBiomRun time of an I/O type job, denoted 1GB, in Map phase, tiorRun time, avg, for 1GB I/O type jobs in Reduce phasecmAverage run time, avg, of CPU type jobs, denoted 1GB, clustered in Map phasecrAverage run time, avg, of CPU type jobs, denoted 1GB, clustered during Reduce phaseiomAverage run time, avg, for 1GB I/O jobs clustered in Map phaseiorThe average run time of a 1GB I/O job clustered during the Reduce phase. In order to avoid jitter as much as possible and ensure the credibility of data when collecting the index, 10 groups of data are respectively collected for each item of data, and the average value is taken as the index after the maximum value and the minimum value are removed.
In this embodiment, the performance index of the node failure rate is a parameter selected for measuring the reliability of the slave node, and considering that some jobs in actual operation have a high requirement on the reliability of the slave node, the performance index has an important meaning on the description of the performance of the whole slave node.
In this embodiment, calculating the performance index of node failure rate requires reading the number n of running tasks of each node through a log filenumAnd the number n of operation failures thereinfailAnd calculating the average running task number t of the whole clusternumAnd the average number of failed tasks tfailThe node failure rate F may be expressed as:
Figure BDA0001467264860000101
in this embodiment, each coefficient in the node weight calculation formula is determined according to different job types, specifically: if the job is a CPU type job, thenIncreasing coefficient K describing slave node CPU performance1、K2And G1、G3A value of (d); if the operation is I/O type operation, increasing the coefficient K for describing the I/O performance of the slave node3、K4And G2、G4A value of (d); and if the operation is important, increasing the value of the coefficient C of the node failure rate F. Therefore, the influence of the node performance difference and the different types of the operation on the operation scheduling of the whole cluster is comprehensively considered, the coefficient in the formula of the node performance is adjusted and measured according to the different types of the operation, and compared with the original scheduling method only considering the node performance difference, the cluster resource is scheduled more finely, so that the cluster operation is more efficient.
In this embodiment, considering that the small jobs consume less cluster resources and the running time is shorter, a queue is generally established separately to run the small jobs. If the operation is a Map type operation, preferentially scheduling the slave nodes with more data storage for calculation; and if the operation is Reduce type operation, the slave nodes with large output data volume of the Map task are preferentially scheduled for calculation, so that the resource scheduling is more refined.
In this embodiment, in order to perform scheduling allocation of cluster resources by using the node weight, the node weight needs to be converted into a shape similar to a weight ratio parameter. The weight ratio parameter is defined in such a way that m slave nodes are arranged in the cluster, and the weight of the slave node i is WiThe weight sum of all the slave nodes in the cluster is WsumThen, the weight ratio parameter P of the slave node i can be expressed as:
Figure BDA0001467264860000111
in this embodiment, if the type of the job submitted by the user is a CPU type job, an I/O type job, or an important job, a weight polling scheduling policy is adopted for scheduling, specifically: and distributing corresponding weight ratio list parameters to each slave node in the cluster according to the performance of the slave nodes, so that the master node can distribute task requests with corresponding proportions to each slave node according to the weight ratio list parameters of each slave node, and each slave node can receive the task requests with the corresponding proportion parameters.
In this embodiment, assuming that there is a group of slave nodes S ═ { S0, S1, …, Sn-1}, the initialized weight ratio column parameter is 0, and according to the performance of the slave nodes, a corresponding weight ratio column parameter is allocated to each slave node, when scheduling for the first time, the slave node with the largest weight ratio column parameter is taken, and by continuously decreasing the weight ratio column parameter, a suitable slave node is found to execute a task until polling is finished, and the weight ratio column parameter returns to 0. If there are 2 slave nodes a and B, a has 2 times the processing capacity of B, then a has 2 times the weight ratio column parameter/weight of B, and a also has 2 times the number of task requests it accepts. That is, the slave nodes with higher weight than the row parameter/weight receive the task requests first, the slave nodes with higher weight than the row parameter/weight process more task requests than the servers with lower weight than the row parameter/weight, and the slave nodes with the same weight than the row parameter/weight process the same number of task requests.
In this embodiment, as shown in fig. 3, fig. 3 is a schematic diagram of a heterogeneous Hadoop cluster, and a server may be divided into two roles, namely a master node (NameNode) and a slave node (DataNode), from two angles; specifically, the method comprises the following steps:
firstly, from the perspective of a distributed file system (HDFS), a server is divided into a master node and a slave node, in the HDFS, management of a directory is important, and the master node is a directory manager;
the NameNode is a master node and stores metadata of files such as file names, file directory structures, file attributes (generation time, copy number, file authority), and block lists of each file and the DataNode where the block is located. The primary node is a central server, which is responsible for managing the namespace (namespace) of the file system and the access of the client to the files, and maintains all the files and directories in each file system tree and the whole tree, and these information are permanently stored on the local disk in two file forms: named control image file (Fsimage) and Edit log (Edit log).
The DataNode stores the block data of the file and the checksum of the block data in the local file system. Files may be created, deleted, moved, or renamed that cannot be modified in their contents after they have been created, written, and closed. A data block is stored on a disk in a file in a DataNode, and comprises two files, wherein one file is data, and the other file is metadata comprising the length of the data block, the checksum of the data block and a time stamp. After the DataNode is started, registering the data to the NameNode, and periodically (1 hour) reporting all block information to the NameNode after the data is passed. The heartbeat is once every 3 seconds, and returns a result with a command from the NameNode to the DataNode, such as copying block data to another machine, or deleting a block. If no heartbeat is received from a DataNode for more than 10 minutes, the node is considered to be unavailable.
File operation, NameNode for operating file metadata, DataNode for processing read-write request of file content, data stream related to file content not passing through NameNode only asking it to contact with that DataNode, otherwise NameNode becoming bottleneck of system
Second, from the YARN perspective, the master node typically deploys a resource manager (ResourceManager) that is globally responsible for monitoring, allocation, and management of all resources, while the slave node deploys a node manager (NodeManager) that is responsible for maintenance of each slave node.
Fig. 4 is a schematic diagram of a work flow of YARN, which can implement weight polling scheduling when applying for resources for a job according to the obtained weight ratio parameter of each slave node, and the specific flow is as follows:
1) the user submits an application program including an application manager ApplicationMaster program, a command for starting the ApplicationMaster, a user program and the like to the resource management platform YARN. Labels can be added to frequently-running jobs and the frequently-running jobs are directly dispatched to corresponding queues, partial tasks of the jobs with undetermined types can be run in advance, and relevant information is collected and divided according to a job type classification method.
2) The resource manager assigns a first Container (computing resource unit) for the application program, and communicates with the corresponding node manager, requesting it to start the ApplicationMaster of the application program in this Container (computing resource unit).
3) The ApplicationMaster firstly registers to the ResourceManager, so that a user can directly check the running state of the application program through the ResourceManager, then the ApplicationMaster applies for resources for each task and monitors the running state of the application program until the running is finished, and the steps 4-7 are repeated.
4) The ApplicationMaster applies for and obtains resources from the Resourcemanager through the RPC protocol by adopting a weight polling mode.
5) Once the ApplicationMaster applies for the resource, it communicates with the corresponding NodeManager, asking it to start the task.
6) After the NodeManager sets an operation environment (comprising environment variables, JAR packages, binary programs and the like) for the task, a task starting command is written into a script, and the task is started by operating the script.
7) Each task reports the state and the progress of the task to the ApplicationMaster through a certain RPC protocol so that the ApplicationMaster can master the running state of each task at any time, and therefore the task can be restarted when the task fails.
8) After the application runs, the ApplicationMaster logs out to the ResourceManager and closes itself.
As can be seen from fig. 5, when a pi value estimation procedure is run by using a Hadoop original policy (before improvement) based on a round-robin scheduling algorithm, since a performance difference between nodes cannot be considered, the ApplicationMaster applies and acquires resources to the ResourceManager through an RPC protocol in a simple round-robin manner. Therefore, as the job is executed, the performance of the S1 node is the worst, the execution efficiency is the slowest, so that the CPU utilization rate is higher than that of other nodes in the later stage of S1, and the performance of the S3 node is the best, so that the CPU utilization rate is gradually lower than that of other nodes as the job is executed, and thus, the cluster resources are obviously not fully utilized, so that the load among the nodes is uneven in the later stage of the job execution, and the execution efficiency of the whole job is influenced to be low. When the improved policy (the weight polling scheduling policy described in this embodiment) is adopted, it can be seen that the load of each node is more balanced when the job runs, and the running efficiency is more efficient than before the improvement, as shown in fig. 6.
In the present embodiment, the principle of the polling scheduling algorithm used before improvement is to alternately allocate task requests from a user to slave nodes each time, starting from 1 until the last slave node, and then start the loop again. The round-robin scheduling algorithm assumes that the processing performance of all slave nodes is the same, and does not care about the current number of connections and response speed of each slave node. When the request service interval time is changed greatly, the polling scheduling algorithm is easy to cause load imbalance among the slave nodes. This scheduling algorithm is suitable for the case where all slave nodes in the slave node group have the same hardware and software configuration and the average service request is relatively balanced.
As with the pi value estimation procedure, the ordinate of the line graph is the CPU utilization, and the abscissa is in units of intervals of every 10 seconds. As can be seen from fig. 7, when a WordCount (word occurrence frequency of a statistical data set) program is run by using a Hadoop original policy based on a round-robin scheduling algorithm, because a difference in performance between nodes cannot be considered, the ApplicationMaster applies and obtains resources to a ResourceManager through an RPC protocol in a simple round-robin manner. Load among the nodes is uneven, the load of the S1 node is low during operation and is not fully utilized by the cluster, and therefore the execution efficiency of the whole operation is low. When the improved policy (the weight polling scheduling policy described in this embodiment) is adopted, it can be seen that the load of each node is more balanced when the job runs, and the running efficiency is more efficient than before the improvement, as shown in fig. 8.
As can be seen from fig. 9, compared with the strategy before improvement, the improved scheduling strategy has a pi value estimation program running time shorter than that before improvement, which is 16.35% shorter than that before improvement. And (4) running a WordCount program, and comparing the running time before and after improvement, finding that the running time is shorter by adopting the improved strategy, and shortening by 14.65% before improvement.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (1)

1. A Hadoop cluster resource self-adaptive allocation method is characterized by comprising the following steps:
determining the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation;
if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; the weight value of the slave node is expressed as:
Wi=A·Y+B·D+C·F
wherein, WiRepresenting the weight of the slave node i, Y representing the hardware performance of the slave node i, D representing the operation performance of the slave node i, F representing the node failure rate, and A, B, C being coefficients of Y, D, F respectively;
distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node;
wherein, the determining the type of the job submitted by the user according to the preset job type classification rule comprises:
judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general;
if the operation is a common operation, judging whether the size of the operation is smaller than a preset size threshold value, and if so, judging that the operation is a small operation;
otherwise, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold value, if so, judging that the operation is a CPU type operation or an I/O type operation according to the resource occupation ratio consumed when the operation is executed; the determining that the job is a CPU type job or an I/O type job includes:
if the job satisfies a first formula, then it is marked as an I/O type job, wherein the first formula is expressed as:
Figure FDA0002746483560000011
if the job satisfies a second formula, then it is marked as a CPU type job, where the second formula is expressed as:
Figure FDA0002746483560000021
wherein n represents the number of tasks being executed in parallel in the slave node, ρ represents the ratio of the output data volume of the Map terminal to the input data volume of the Map terminal, MID represents the input data volume of the Map terminal, DIOR represents the IO transmission rate of a magnetic disk, and MTCT represents the time required by the completion of the Map task;
otherwise, judging whether the operation is Map type operation or Reduce type operation according to different load degrees of the Map stage and the Reduce stage of the operation; the determination that the job is a Map-type job or a Reduce-type job includes:
when the operation satisfies a third formula, it is determined that the operation is a Reduce-type operation, otherwise it is determined that the operation is a Map-type operation, wherein the third formula is represented as:
Figure FDA0002746483560000022
wherein S ismapRepresenting the total amount of data, S, input in the Map phasereduceExpress ReduceThe total data input in the stage td represents a preset proportional threshold;
the hardware performance Y of the slave node i is expressed as:
Figure FDA0002746483560000023
wherein S iscpuRepresenting CPU dominant frequency, SmemIndicating the memory capacity, SnetRepresenting the network bandwidth, SdiskIndicating maximum read-write speed, avg, of the diskcpu、avgmem、avgnet、avgdiskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster1、K2、K3、K4All represent coefficients;
the operating performance D of the slave node i is represented as:
Figure FDA0002746483560000024
wherein, tcmRun time of CPU type jobs in Map phase, t, expressed as unit sizecrRun time of CPU type jobs in Reduce phase, t, expressed as unit sizeiomRun time of I/O type jobs in Map phase, t, expressed as unit sizeiorRun time, avg, for I/O type jobs in Reduce phase expressed as unit sizecmAverage run time, avg, of CPU type jobs clustered at Map stage, expressed as unit sizecrAverage run time, avg, of CPU type jobs, expressed as unit size, clustered during Reduce phaseiomAverage run time, avg, of I/O type jobs, expressed as unit size, clustered in Map phaseiorAverage run time, G, of I/O type jobs, expressed as unit size, clustered during Reduce phase1、G2、G3、G4All represent coefficients;
the node failure rate F is expressed as:
Figure FDA0002746483560000031
wherein n isnumRepresenting the number of running tasks of each node read by the log file, nfailIndicating the number of failures in the operation of each node, tnumRepresenting the average number of running tasks, t, of the entire clusterrailIndicating the number of tasks that failed the average run;
if the job is a CPU type job, increasing a coefficient K describing the performance of the CPU of the slave node1、K2And G1、G3A value of (d);
if the operation is I/O type operation, increasing the coefficient K for describing the I/O performance of the slave node3、K4And G2、G4A value of (d);
if the operation is important, increasing the value of the coefficient C of the node failure rate F;
if the operation is a Map type operation, preferentially scheduling the slave nodes with more data storage for calculation;
and if the operation is Reduce type operation, preferentially scheduling the slave nodes with large output data volume of the Map task to calculate.
CN201711120624.7A 2017-11-14 2017-11-14 Hadoop cluster resource self-adaptive allocation method Active CN107832153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711120624.7A CN107832153B (en) 2017-11-14 2017-11-14 Hadoop cluster resource self-adaptive allocation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711120624.7A CN107832153B (en) 2017-11-14 2017-11-14 Hadoop cluster resource self-adaptive allocation method

Publications (2)

Publication Number Publication Date
CN107832153A CN107832153A (en) 2018-03-23
CN107832153B true CN107832153B (en) 2020-12-29

Family

ID=61655305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711120624.7A Active CN107832153B (en) 2017-11-14 2017-11-14 Hadoop cluster resource self-adaptive allocation method

Country Status (1)

Country Link
CN (1) CN107832153B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108551444A (en) * 2018-03-30 2018-09-18 新华三信息安全技术有限公司 A kind of log processing method, device and equipment
CN108958942A (en) * 2018-07-18 2018-12-07 郑州云海信息技术有限公司 A kind of distributed system distribution multitask method, scheduler and computer equipment
US11003686B2 (en) * 2018-07-26 2021-05-11 Roblox Corporation Addressing data skew using map-reduce
CN110971647B (en) * 2018-09-30 2023-12-05 南京工程学院 Node migration method of big data system
CN109309726A (en) * 2018-10-25 2019-02-05 平安科技(深圳)有限公司 Document generating method and system based on mass data
CN109947532B (en) * 2019-03-01 2023-06-09 中山大学 Big data task scheduling method in education cloud platform
CN110908796B (en) * 2019-11-04 2022-03-18 北京理工大学 Multi-operation merging and optimizing system and method in Gaia system
CN111459677A (en) * 2020-04-01 2020-07-28 北京顺达同行科技有限公司 Request distribution method and device, computer equipment and storage medium
CN111580950A (en) * 2020-06-15 2020-08-25 四川中电启明星信息技术有限公司 Self-adaptive feedback resource scheduling method for improving cloud reliability
CN111831418A (en) * 2020-07-14 2020-10-27 华东师范大学 Big data analysis job performance optimization method based on delay scheduling technology
CN112764906B (en) * 2021-01-26 2024-03-15 浙江工业大学 Cluster resource scheduling method based on user job type and node performance bias
CN113626098A (en) * 2021-07-21 2021-11-09 长沙理工大学 Data node dynamic configuration method based on information interaction
CN116302404B (en) * 2023-02-16 2023-10-03 北京大学 Resource decoupling data center-oriented server non-perception calculation scheduling method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793272A (en) * 2013-12-27 2014-05-14 北京天融信软件有限公司 Periodical task scheduling method and periodical task scheduling system
CN103902379A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Task scheduling method and device and server cluster
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9594801B2 (en) * 2014-03-28 2017-03-14 Akamai Technologies, Inc. Systems and methods for allocating work for various types of services among nodes in a distributed computing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902379A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Task scheduling method and device and server cluster
CN103793272A (en) * 2013-12-27 2014-05-14 北京天融信软件有限公司 Periodical task scheduling method and periodical task scheduling system
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
云计算环境下基于MapReduce的资源调度模型和算法研究;陶韬;《中国优秀硕士学位论文全文数据库 信息科技辑》;20121015;第3.1.3节 *
基于节点能力的Hadoop集群任务自适应调度方法;郑晓薇等;《计算机研究与发展》;20140331;第3.1节 *
基于负载均衡的Hadoop平台下作业调度算法研究;胡丹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131015;第3.1节 *

Also Published As

Publication number Publication date
CN107832153A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
CN107832153B (en) Hadoop cluster resource self-adaptive allocation method
US20200287961A1 (en) Balancing resources in distributed computing environments
US11487771B2 (en) Per-node custom code engine for distributed query processing
US7937493B2 (en) Connection pool use of runtime load balancing service performance advisories
CN112162865B (en) Scheduling method and device of server and server
Chaczko et al. Availability and load balancing in cloud computing
US7516221B2 (en) Hierarchical management of the dynamic allocation of resources in a multi-node system
US9460185B2 (en) Storage device selection for database partition replicas
CN109120715A (en) Dynamic load balancing method under a kind of cloud environment
AU2004266017B2 (en) Hierarchical management of the dynamic allocation of resources in a multi-node system
US9870269B1 (en) Job allocation in a clustered environment
US9438665B1 (en) Scheduling and tracking control plane operations for distributed storage systems
US10356150B1 (en) Automated repartitioning of streaming data
US10158709B1 (en) Identifying data store requests for asynchronous processing
Javadpour et al. Improving load balancing for data-duplication in big data cloud computing networks
US10102230B1 (en) Rate-limiting secondary index creation for an online table
US20160275412A1 (en) System and method for reducing state space in reinforced learning by using decision tree classification
US11816511B1 (en) Virtual partitioning of a shared message bus
CN110825704A (en) Data reading method, data writing method and server
Zacheilas et al. Dynamic load balancing techniques for distributed complex event processing systems
US20200065415A1 (en) System For Optimizing Storage Replication In A Distributed Data Analysis System Using Historical Data Access Patterns
US8819239B2 (en) Distributed resource management systems and methods for resource management thereof
US9898614B1 (en) Implicit prioritization to rate-limit secondary index creation for an online table
US9934268B2 (en) Providing consistent tenant experiences for multi-tenant databases
US11863675B2 (en) Data flow control in distributed computing systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant