CN107832153B

CN107832153B - Hadoop cluster resource self-adaptive allocation method

Info

Publication number: CN107832153B
Application number: CN201711120624.7A
Authority: CN
Inventors: 李林林; 张勇军
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2020-12-29
Anticipated expiration: 2037-11-14
Also published as: CN107832153A

Abstract

The invention provides a Hadoop cluster resource self-adaptive allocation method, which can enable the cluster to run more efficiently. The method comprises the following steps: determining the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node. The invention relates to the field of big data and cloud computing.

Description

Hadoop cluster resource self-adaptive allocation method

Technical Field

The invention relates to the field of big data and cloud computing, in particular to a Hadoop cluster resource self-adaptive distribution method.

Background

With the popularity of large-scale parallel distributed processing systems, especially the wide application of cluster systems, what effective scheduling strategy is adopted to balance the load of each node, and further improve the utilization rate of the whole system resources, which has become a research focus and a hotspot of people.

In recent years, a novel, tall and efficient load balancing algorithm has become one of the research hotspots of domestic and foreign research institutions. The distributed heterogeneous cluster generally has a load balancing problem, the Hadoop platform does not have the capability of detecting the performance of nodes, and although the resource management system YARN of the Hadoop cluster has a scheduling strategy aiming at load imbalance, the YARN is too simple and is not suitable for the complex heterogeneous cluster in reality, so that the load balancing problem is more prominent in the Hadoop platform built based on the heterogeneous cluster.

In the prior art, the adaptive dispatching method of the Hadoop cluster task based on the node capability does not consider the influence of different operation types on resource dispatching, so that the resource division is not refined enough.

Disclosure of Invention

The invention aims to provide a Hadoop cluster resource self-adaptive allocation method to solve the technical problem that resource division is not fine enough in the prior art.

In order to solve the above technical problem, an embodiment of the present invention provides a method for adaptively allocating Hadoop cluster resources, including:

determining the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation;

if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I;

and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node.

Further, the determining the type of the job submitted by the user according to the preset job type classification rule includes:

judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general;

if the operation is a common operation, judging whether the size of the operation is smaller than a preset size threshold value, and if so, judging that the operation is a small operation;

otherwise, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold value, if so, judging that the operation is a CPU type operation or an I/O type operation according to the resource occupation ratio consumed when the operation is executed;

otherwise, judging whether the operation is Map type operation or Reduce type operation according to different load degrees of the operation Map stage and the Reduce stage.

Further, the determining that the job is a CPU type job or an I/O type job includes:

if the job satisfies a first formula, then it is marked as an I/O type job, wherein the first formula is expressed as:

if the job satisfies a second formula, then it is marked as a CPU type job, where the second formula is expressed as:

wherein n represents the number of tasks being executed in parallel in the slave node, ρ represents the ratio of the output data volume of the Map terminal to the input data volume of the Map terminal, MID represents the input data volume of the Map terminal, DIOR represents the IO transmission rate of the disk, and MTCT represents the time required by the completion of the Map task.

Further, the determining that the job is a Map-type job or a Reduce-type job:

when the operation satisfies a third formula, it is determined that the operation is a Reduce-type operation, otherwise it is determined that the operation is a Map-type operation, wherein the third formula is represented as:

wherein S is_mapRepresenting the total amount of data, S, input in the Map phase_reduceRepresents the total amount of data input in the Reduce stage, and td represents a preset proportional threshold.

Further, the weight value of the slave node is expressed as:

W_i＝A·Y+B·D+C·F

wherein, W_iThe weight of the slave node i is shown, Y shows the hardware performance of the slave node i, D shows the operating performance of the slave node i, F shows the node failure rate, and A, B, C is a coefficient Y, D, F respectively.

Further, the hardware performance Y of the slave node i is expressed as:

wherein S is_cpuRepresenting CPU dominant frequency, S_memIndicating the memory capacity, S_netRepresenting the network bandwidth, S_diskIndicating maximum read-write speed, avg, of the disk_cpu、avg_mem、avg_net、avg_diskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster₁、K₂、K₃、K₄Both represent coefficients.

Further, the operation performance D of the slave node i is represented as:

wherein, t_cmRun time of CPU type jobs in Map phase, t, expressed as unit size_crRun time of CPU type jobs in Reduce phase, t, expressed as unit size_iomRun time of I/O type jobs in Map phase, t, expressed as unit size_iorRun time, avg, for I/O type jobs in Reduce phase expressed as unit size_cmAverage run time, avg, of CPU type jobs clustered at Map stage, expressed as unit size_crAverage run time, avg, of CPU type jobs, expressed as unit size, clustered during Reduce phase_iomAverage run time, avg, of I/O type jobs, expressed as unit size, clustered in Map phase_iorAverage run time, G, of I/O type jobs, expressed as unit size, clustered during Reduce phase₁、G₂、G₃、G₄Both represent coefficients.

Further, the node failure rate F is expressed as:

wherein n is_numRepresenting the number of running tasks of each node read by the log file, n_failIndicating the number of failures in the operation of each node, t_numRepresenting the average number of running tasks, t, of the entire cluster_failIndicating the number of tasks that failed the average run.

Further, if the job is a CPU type job, the coefficient K describing the CPU performance of the slave node is increased₁、K₂And G₁、G₃A value of (d);

if the operation is I/O type operation, increasing the coefficient K for describing the I/O performance of the slave node₃、K₄And G₂、G₄A value of (d);

and if the operation is important, increasing the value of the coefficient C of the node failure rate F.

Further, if the job is a Map type job, the slave nodes with more data storage are preferentially scheduled to perform calculation;

and if the operation is Reduce type operation, preferentially scheduling the slave nodes with large output data volume of the Map task to calculate.

The technical scheme of the invention has the following beneficial effects:

in the scheme, the types of the jobs submitted by the users are determined according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node, so that each slave node receives the task requests in corresponding proportion parameter. Therefore, the influence of the node performance difference and the job different types on job scheduling of the whole cluster is comprehensively considered, and compared with the original resource scheduling method only considering the node capacity difference, finer scheduling of cluster resources is realized, so that the cluster operation is more efficient.

Drawings

Fig. 1 is a schematic flow chart of a Hadoop cluster resource adaptive allocation method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a process for determining a type of a job submitted by a user according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a heterogeneous Hadoop cluster according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the detailed operation of YARN after submitting a job according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the operation of an improved pre-pi estimation process provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of an improved pi estimation process according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the operational status of a WordCount program before improvement according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an improved WordCount program operating scenario according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of the comparison between the improved pre-and post-pi estimation procedure and the WordCount procedure run time provided by the embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

The invention provides a Hadoop cluster resource self-adaptive allocation method aiming at the problem that the existing resource division is not fine enough.

In order to better understand the method for adaptively allocating the Hadoop cluster resources described in this embodiment, a simple description is first performed on the slave nodes, the master node, the Map task, and the Reduce task:

1. the slave node is equivalent to a computing point, has network connection and can independently process tasks issued by the master node and the resource manager. One server can be provided with one slave node or a plurality of slave nodes;

2. the master node is responsible for job classification, resource and task scheduling, and the execution of the tasks is carried out at the slave nodes;

3. each job can be split into N tasks to realize distributed parallel computation, the N tasks can be completed at one slave node or a plurality of different slave nodes, and N is more than or equal to 2;

4. map tasks represent tasks with large calculation amount in the Map stage;

5. reduce tasks represent tasks that are computationally intensive during the Reduce phase.

As shown in fig. 1, the method for adaptively allocating Hadoop cluster resources provided by the embodiment of the present invention includes:

s101, determining the types of the jobs submitted by a user according to a preset job type classification rule, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer;

s102, if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of a slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I;

s103, distributing task requests with corresponding proportions to each slave node according to the weight proportion parameter of each slave node.

The Hadoop cluster resource self-adaptive allocation method determines the types of the jobs submitted by users according to preset job type classification rules, wherein each job can be split into N tasks to realize distributed parallel computation, and N is a positive integer; if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; and distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node, so that each slave node receives the task requests in corresponding proportion parameter. Therefore, the influence of the node performance difference and the job different types on job scheduling of the whole cluster is comprehensively considered, and compared with the original resource scheduling method only considering the node capacity difference, finer scheduling of cluster resources is realized, so that the cluster operation is more efficient.

In a specific embodiment of the foregoing method for adaptively allocating Hadoop cluster resources, further, the determining, according to a preset job type classification rule, the type of a job submitted by a user includes:

In this embodiment, as shown in fig. 2, the type of the job submitted by the user may be determined according to a preset job type classification rule, and the specific steps may include:

a11, distinguishing important jobs from general property jobs, wherein the classification is to consider that some jobs have higher requirements on the reliability of the cluster in practice, so that the important jobs can be distributed to the slave nodes with higher reliability for calculation after the reliability of each slave node is quantified; the classification method comprises the following steps: judging whether the job submitted by the user is an important job or a general job in a label mode, wherein the label comprises the following components: important and general.

A12, if the job is a normal job, continuing to classify the jobs submitted by the user according to the job size. It is necessary to classify according to the job size, because if the large jobs are not distinguished in this way but the large jobs are mixed in a queue, it is likely that a situation occurs that a large job is submitted and then a resource is first obtained to be executed, and if a small job is submitted at this time, because the resource can be applied only after the previous large job is executed in the same queue, the waiting time of the small job becomes very long, and the job execution efficiency and the resource utilization rate of the whole cluster are low; the classification method specifically comprises the following steps: if the operation is a general operation, judging whether the size of the operation is smaller than a preset size threshold value, if so, judging that the operation is a small operation, otherwise, judging that the operation is a large operation.

A13, if the job is a big job, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold, if so, judging that the job is a CPU type job or an I/O type job according to the resource occupation ratio consumed when the job is executed; the CPU type operation is mainly performed in a memory, such as various scientific calculations and large-scale data modeling, and the I/O type operation is performed by frequently reading and writing a hard disk or other storage media, such as various data centers, network storage, and a cloud storage server. The classification is beneficial to realizing the fine scheduling of the operation, and the cluster resources are more efficiently and reasonably utilized.

A14, if the difference between the CPU resource and the I/O resource of the slave node in the cluster is not more than the preset difference threshold, judging that the operation is Map type operation or Reduce type operation according to different load degrees of the operation Map stage and the Reduce stage.

According to the steps A11-A14, the job classification work submitted by the user can be completed, and then corresponding queues, namely queues of CPU type jobs, I/O type jobs, Map type jobs, Reduce type jobs, important jobs, small jobs and the like can be respectively established in the fair scheduler according to the job types, so that different jobs can be conveniently and pertinently scheduled through the corresponding queues. The configuration file of the fair scheduler is located in a fair-scheduler xml file under a class path, which can be modified by a yarn. Each queue may be configured in a configuration file. Different scheduling strategies, such as a first-in first-out scheduling strategy, a weight round training scheduling strategy and the like, can still exist in each queue. And then a targeted resource scheduling strategy is adopted according to different operation types, so that more refined scheduling of resources is realized, the load balancing problem of a heterogeneous cluster (for example, a Hadoop cluster) is solved, the resource utilization rate of the whole heterogeneous cluster is improved, and the comprehensive performance of the heterogeneous cluster is improved.

In an embodiment of the foregoing method for Hadoop cluster resource adaptive allocation, the determining whether the job is a CPU type job or an I/O type job further includes:

In this embodiment, ρ × MID is MOD, where MOD represents the output data amount of the map terminal.

In an embodiment of the foregoing method for adaptively allocating resources in a Hadoop cluster, further, the determining that the job is a Map-type job or a Reduce-type job:

In the present embodiment, the first and second electrodes are,

where K represents the number of servers in the cluster.

In this embodiment, in the Hadoop cluster, the slave node weights are used to measure the performance of different slave nodes, specifically, three aspects of node hardware performance Y, operation performance D, and node failure rate F are measured. The weight of the slave node can be expressed as:

W_i＝A·Y+B·D+C·F

In this embodiment, the hardware performance of the slave node mainly considers the CPU master frequency, the memory capacity, the network bandwidth, the maximum disk read-write speed, and the like of the slave node, and these hardware parameters represent the resource conditions of the slave node itself and are basic indexes for measuring the node performance. The slave node hardware performance can be expressed as:

where Y represents the hardware performance of the slave i, S_cpuRepresenting CPU dominant frequency, S_memIndicating the memory capacity, S_netRepresenting the network bandwidth, S_diskIndicating maximum read of diskWrite speed, avg_cpu、avg_mem、avg_net、avg_diskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster₁、K₂、K₃、K₄Both represent coefficients.

In this embodiment, the hardware performance indexes of the cluster are all static indexes, so that each parameter of the hardware performance index can be directly obtained from the node.

In this embodiment, since the performance of a slave node cannot be completely expressed only by a part of the hardware performance indicators, the dynamic performance of the slave node needs to be described more accurately by introducing the runtime performance indicators. The running performance is dynamic performance and thus needs to be acquired by reading a log generated after the job is run.

In this embodiment, the operation performance D of the slave node i is represented as:

In this embodiment, the running time of the slave node in the two job types, i.e., the CPU type job and the I/O type job, is used as a description index for measuring the actual running performance of the slave node.

In this embodiment, a CPU type job and an I/O type job each taking 1GB data amount are used and run on each slave node, and at this time, t_cmRun time of CPU type jobs in Map phase, t, denoted 1GB_crRun time, t, of CPU type jobs in Reduce phase, denoted 1GB_iomRun time of an I/O type job, denoted 1GB, in Map phase, t_iorRun time, avg, for 1GB I/O type jobs in Reduce phase_cmAverage run time, avg, of CPU type jobs, denoted 1GB, clustered in Map phase_crAverage run time, avg, of CPU type jobs, denoted 1GB, clustered during Reduce phase_iomAverage run time, avg, for 1GB I/O jobs clustered in Map phase_iorThe average run time of a 1GB I/O job clustered during the Reduce phase. In order to avoid jitter as much as possible and ensure the credibility of data when collecting the index, 10 groups of data are respectively collected for each item of data, and the average value is taken as the index after the maximum value and the minimum value are removed.

In this embodiment, the performance index of the node failure rate is a parameter selected for measuring the reliability of the slave node, and considering that some jobs in actual operation have a high requirement on the reliability of the slave node, the performance index has an important meaning on the description of the performance of the whole slave node.

In this embodiment, calculating the performance index of node failure rate requires reading the number n of running tasks of each node through a log file_numAnd the number n of operation failures therein_failAnd calculating the average running task number t of the whole cluster_numAnd the average number of failed tasks t_failThe node failure rate F may be expressed as:

in this embodiment, each coefficient in the node weight calculation formula is determined according to different job types, specifically: if the job is a CPU type job, thenIncreasing coefficient K describing slave node CPU performance₁、K₂And G₁、G₃A value of (d); if the operation is I/O type operation, increasing the coefficient K for describing the I/O performance of the slave node₃、K₄And G₂、G₄A value of (d); and if the operation is important, increasing the value of the coefficient C of the node failure rate F. Therefore, the influence of the node performance difference and the different types of the operation on the operation scheduling of the whole cluster is comprehensively considered, the coefficient in the formula of the node performance is adjusted and measured according to the different types of the operation, and compared with the original scheduling method only considering the node performance difference, the cluster resource is scheduled more finely, so that the cluster operation is more efficient.

In this embodiment, considering that the small jobs consume less cluster resources and the running time is shorter, a queue is generally established separately to run the small jobs. If the operation is a Map type operation, preferentially scheduling the slave nodes with more data storage for calculation; and if the operation is Reduce type operation, the slave nodes with large output data volume of the Map task are preferentially scheduled for calculation, so that the resource scheduling is more refined.

In this embodiment, in order to perform scheduling allocation of cluster resources by using the node weight, the node weight needs to be converted into a shape similar to a weight ratio parameter. The weight ratio parameter is defined in such a way that m slave nodes are arranged in the cluster, and the weight of the slave node i is W_iThe weight sum of all the slave nodes in the cluster is W_sumThen, the weight ratio parameter P of the slave node i can be expressed as:

in this embodiment, if the type of the job submitted by the user is a CPU type job, an I/O type job, or an important job, a weight polling scheduling policy is adopted for scheduling, specifically: and distributing corresponding weight ratio list parameters to each slave node in the cluster according to the performance of the slave nodes, so that the master node can distribute task requests with corresponding proportions to each slave node according to the weight ratio list parameters of each slave node, and each slave node can receive the task requests with the corresponding proportion parameters.

In this embodiment, assuming that there is a group of slave nodes S ═ { S0, S1, …, Sn-1}, the initialized weight ratio column parameter is 0, and according to the performance of the slave nodes, a corresponding weight ratio column parameter is allocated to each slave node, when scheduling for the first time, the slave node with the largest weight ratio column parameter is taken, and by continuously decreasing the weight ratio column parameter, a suitable slave node is found to execute a task until polling is finished, and the weight ratio column parameter returns to 0. If there are 2 slave nodes a and B, a has 2 times the processing capacity of B, then a has 2 times the weight ratio column parameter/weight of B, and a also has 2 times the number of task requests it accepts. That is, the slave nodes with higher weight than the row parameter/weight receive the task requests first, the slave nodes with higher weight than the row parameter/weight process more task requests than the servers with lower weight than the row parameter/weight, and the slave nodes with the same weight than the row parameter/weight process the same number of task requests.

In this embodiment, as shown in fig. 3, fig. 3 is a schematic diagram of a heterogeneous Hadoop cluster, and a server may be divided into two roles, namely a master node (NameNode) and a slave node (DataNode), from two angles; specifically, the method comprises the following steps:

firstly, from the perspective of a distributed file system (HDFS), a server is divided into a master node and a slave node, in the HDFS, management of a directory is important, and the master node is a directory manager;

the NameNode is a master node and stores metadata of files such as file names, file directory structures, file attributes (generation time, copy number, file authority), and block lists of each file and the DataNode where the block is located. The primary node is a central server, which is responsible for managing the namespace (namespace) of the file system and the access of the client to the files, and maintains all the files and directories in each file system tree and the whole tree, and these information are permanently stored on the local disk in two file forms: named control image file (Fsimage) and Edit log (Edit log).

The DataNode stores the block data of the file and the checksum of the block data in the local file system. Files may be created, deleted, moved, or renamed that cannot be modified in their contents after they have been created, written, and closed. A data block is stored on a disk in a file in a DataNode, and comprises two files, wherein one file is data, and the other file is metadata comprising the length of the data block, the checksum of the data block and a time stamp. After the DataNode is started, registering the data to the NameNode, and periodically (1 hour) reporting all block information to the NameNode after the data is passed. The heartbeat is once every 3 seconds, and returns a result with a command from the NameNode to the DataNode, such as copying block data to another machine, or deleting a block. If no heartbeat is received from a DataNode for more than 10 minutes, the node is considered to be unavailable.

File operation, NameNode for operating file metadata, DataNode for processing read-write request of file content, data stream related to file content not passing through NameNode only asking it to contact with that DataNode, otherwise NameNode becoming bottleneck of system

Second, from the YARN perspective, the master node typically deploys a resource manager (ResourceManager) that is globally responsible for monitoring, allocation, and management of all resources, while the slave node deploys a node manager (NodeManager) that is responsible for maintenance of each slave node.

Fig. 4 is a schematic diagram of a work flow of YARN, which can implement weight polling scheduling when applying for resources for a job according to the obtained weight ratio parameter of each slave node, and the specific flow is as follows:

1) the user submits an application program including an application manager ApplicationMaster program, a command for starting the ApplicationMaster, a user program and the like to the resource management platform YARN. Labels can be added to frequently-running jobs and the frequently-running jobs are directly dispatched to corresponding queues, partial tasks of the jobs with undetermined types can be run in advance, and relevant information is collected and divided according to a job type classification method.

2) The resource manager assigns a first Container (computing resource unit) for the application program, and communicates with the corresponding node manager, requesting it to start the ApplicationMaster of the application program in this Container (computing resource unit).

3) The ApplicationMaster firstly registers to the ResourceManager, so that a user can directly check the running state of the application program through the ResourceManager, then the ApplicationMaster applies for resources for each task and monitors the running state of the application program until the running is finished, and the steps 4-7 are repeated.

4) The ApplicationMaster applies for and obtains resources from the Resourcemanager through the RPC protocol by adopting a weight polling mode.

5) Once the ApplicationMaster applies for the resource, it communicates with the corresponding NodeManager, asking it to start the task.

6) After the NodeManager sets an operation environment (comprising environment variables, JAR packages, binary programs and the like) for the task, a task starting command is written into a script, and the task is started by operating the script.

7) Each task reports the state and the progress of the task to the ApplicationMaster through a certain RPC protocol so that the ApplicationMaster can master the running state of each task at any time, and therefore the task can be restarted when the task fails.

8) After the application runs, the ApplicationMaster logs out to the ResourceManager and closes itself.

As can be seen from fig. 5, when a pi value estimation procedure is run by using a Hadoop original policy (before improvement) based on a round-robin scheduling algorithm, since a performance difference between nodes cannot be considered, the ApplicationMaster applies and acquires resources to the ResourceManager through an RPC protocol in a simple round-robin manner. Therefore, as the job is executed, the performance of the S1 node is the worst, the execution efficiency is the slowest, so that the CPU utilization rate is higher than that of other nodes in the later stage of S1, and the performance of the S3 node is the best, so that the CPU utilization rate is gradually lower than that of other nodes as the job is executed, and thus, the cluster resources are obviously not fully utilized, so that the load among the nodes is uneven in the later stage of the job execution, and the execution efficiency of the whole job is influenced to be low. When the improved policy (the weight polling scheduling policy described in this embodiment) is adopted, it can be seen that the load of each node is more balanced when the job runs, and the running efficiency is more efficient than before the improvement, as shown in fig. 6.

In the present embodiment, the principle of the polling scheduling algorithm used before improvement is to alternately allocate task requests from a user to slave nodes each time, starting from 1 until the last slave node, and then start the loop again. The round-robin scheduling algorithm assumes that the processing performance of all slave nodes is the same, and does not care about the current number of connections and response speed of each slave node. When the request service interval time is changed greatly, the polling scheduling algorithm is easy to cause load imbalance among the slave nodes. This scheduling algorithm is suitable for the case where all slave nodes in the slave node group have the same hardware and software configuration and the average service request is relatively balanced.

As with the pi value estimation procedure, the ordinate of the line graph is the CPU utilization, and the abscissa is in units of intervals of every 10 seconds. As can be seen from fig. 7, when a WordCount (word occurrence frequency of a statistical data set) program is run by using a Hadoop original policy based on a round-robin scheduling algorithm, because a difference in performance between nodes cannot be considered, the ApplicationMaster applies and obtains resources to a ResourceManager through an RPC protocol in a simple round-robin manner. Load among the nodes is uneven, the load of the S1 node is low during operation and is not fully utilized by the cluster, and therefore the execution efficiency of the whole operation is low. When the improved policy (the weight polling scheduling policy described in this embodiment) is adopted, it can be seen that the load of each node is more balanced when the job runs, and the running efficiency is more efficient than before the improvement, as shown in fig. 8.

As can be seen from fig. 9, compared with the strategy before improvement, the improved scheduling strategy has a pi value estimation program running time shorter than that before improvement, which is 16.35% shorter than that before improvement. And (4) running a WordCount program, and comparing the running time before and after improvement, finding that the running time is shorter by adopting the improved strategy, and shortening by 14.65% before improvement.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A Hadoop cluster resource self-adaptive allocation method is characterized by comprising the following steps:

if the type of the job submitted by the user is a CPU type job, an I/O type job or an important job, determining a weight ratio parameter of the slave node according to the type of the job submitted by the user, wherein the weight ratio parameter of the slave node I is equal to the ratio of the weight of the slave node I to the sum of the weights of all the slave nodes in the cluster, and the weight of the slave node I is used for measuring the performance of the slave node I; the weight value of the slave node is expressed as:

W_i＝A·Y+B·D+C·F

wherein, W_iRepresenting the weight of the slave node i, Y representing the hardware performance of the slave node i, D representing the operation performance of the slave node i, F representing the node failure rate, and A, B, C being coefficients of Y, D, F respectively;

distributing task requests in corresponding proportion to each slave node according to the weight ratio parameter of each slave node;

wherein, the determining the type of the job submitted by the user according to the preset job type classification rule comprises:

otherwise, judging whether the difference between the CPU resource and the I/O resource of the slave node in the cluster exceeds a preset difference threshold value, if so, judging that the operation is a CPU type operation or an I/O type operation according to the resource occupation ratio consumed when the operation is executed; the determining that the job is a CPU type job or an I/O type job includes:

wherein n represents the number of tasks being executed in parallel in the slave node, ρ represents the ratio of the output data volume of the Map terminal to the input data volume of the Map terminal, MID represents the input data volume of the Map terminal, DIOR represents the IO transmission rate of a magnetic disk, and MTCT represents the time required by the completion of the Map task;

otherwise, judging whether the operation is Map type operation or Reduce type operation according to different load degrees of the Map stage and the Reduce stage of the operation; the determination that the job is a Map-type job or a Reduce-type job includes:

wherein S is_mapRepresenting the total amount of data, S, input in the Map phase_reduceExpress ReduceThe total data input in the stage td represents a preset proportional threshold;

the hardware performance Y of the slave node i is expressed as:

wherein S is_cpuRepresenting CPU dominant frequency, S_memIndicating the memory capacity, S_netRepresenting the network bandwidth, S_diskIndicating maximum read-write speed, avg, of the disk_cpu、avg_mem、avg_net、avg_diskRespectively representing the average CPU main frequency, memory capacity, network bandwidth, maximum disk read-write speed, K of the cluster₁、K₂、K₃、K₄All represent coefficients;

the operating performance D of the slave node i is represented as:

wherein, t_cmRun time of CPU type jobs in Map phase, t, expressed as unit size_crRun time of CPU type jobs in Reduce phase, t, expressed as unit size_iomRun time of I/O type jobs in Map phase, t, expressed as unit size_iorRun time, avg, for I/O type jobs in Reduce phase expressed as unit size_cmAverage run time, avg, of CPU type jobs clustered at Map stage, expressed as unit size_crAverage run time, avg, of CPU type jobs, expressed as unit size, clustered during Reduce phase_iomAverage run time, avg, of I/O type jobs, expressed as unit size, clustered in Map phase_iorAverage run time, G, of I/O type jobs, expressed as unit size, clustered during Reduce phase₁、G₂、G₃、G₄All represent coefficients;

the node failure rate F is expressed as:

wherein n is_numRepresenting the number of running tasks of each node read by the log file, n_failIndicating the number of failures in the operation of each node, t_numRepresenting the average number of running tasks, t, of the entire cluster_railIndicating the number of tasks that failed the average run;

if the job is a CPU type job, increasing a coefficient K describing the performance of the CPU of the slave node₁、K₂And G₁、G₃A value of (d);

if the operation is important, increasing the value of the coefficient C of the node failure rate F;

if the operation is a Map type operation, preferentially scheduling the slave nodes with more data storage for calculation;