CN112764906B

CN112764906B - Cluster resource scheduling method based on user job type and node performance bias

Info

Publication number: CN112764906B
Application number: CN202110100907.5A
Authority: CN
Inventors: 胡亚红; 吴寅超
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2024-03-15
Anticipated expiration: 2041-01-26
Also published as: CN112764906A

Abstract

The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias, which comprises the following steps: 1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; 2) The Master node calculates the type of the user job to be run currently in the job queue; 3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; 4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; 5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation; 6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2). The invention has the beneficial effects that: the cluster performance is effectively improved, and the execution time of user operation is shortened.

Description

Cluster resource scheduling method based on user job type and node performance bias

Technical Field

The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias.

Background

With the advent of the big data age, user-generated data has grown exponentially. Single nodes and traditional computational models have failed to meet the performance and efficiency requirements of large data processing. Apache Spark is the most popular big data processing platform due to its excellent performance and rich application support. In the Spark standby mode, two scheduling algorithms, namely a spreadOut scheduling algorithm and a non-spreadOut scheduling algorithm, are adopted by default for the multi-job resource scheduling mode. Both of these resource scheduling algorithms employ very simple logic, i.e., scheduling by comparing the resource requirements of the job with the available resources of the node. Such resource scheduling algorithms do not take into account the type of job and the processing power characteristics of the node (node performance bias) when allocating cluster resources for user jobs. The user job type UAT is determined according to the memory resources and the CPU core number required by the operation, and the two parameters are given by the user when the operation is submitted. When UAT is less than or equal to 1.1, the operation is computationally intensive, whereas it is memory intensive. The node performance bias NPT is determined by the static and dynamic factors of the node. When multiple jobs are running, some nodes are caused to run jobs of which the nodes are not good at the type, so that the cluster has lower execution efficiency and longer job execution time.

Aiming at the relation between node performance bias and user operation types, a series of researches are developed, for example, patent document with publication number 107038069 provides a scheduling method for dynamically matching node performance labels and operation type labels. In the method, each node in the cluster runs a certain task, and the node is divided into a CPU type node, a disk IO type node and a common type node according to the relation between the time of running a single task by the node and the running time average value of all the nodes in the cluster. When the node label is updated, only the CPU and IO utilization rate of the node label are considered, and the factors such as the number of cores of the node CPU, the memory size, the disk capacity, the real-time read-write speed of the disk and the like are not considered. Since determining the job type tag requires pre-running part of the job and calculating using a naive bayes algorithm, the method is suitable for repeated jobs that are large in data file or require frequent running. It is therefore still a matter of urgent resolution how to quickly and accurately determine the types of nodes and user jobs in a cluster, and thus assign the jobs to the most suitable nodes.

Disclosure of Invention

The present invention is directed to a cluster resource scheduling method (ATNPA) based on user operation type and node performance bias, which can rapidly and accurately allocate the most suitable node to user operation.

The present invention achieves the above objective by the following desensitization scheme: a cluster resource scheduling method based on user operation type and node performance bias comprises the following steps:

1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes;

2) The Master node calculates the type of the user job to be run currently in the job queue;

3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node;

4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node;

5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation;

6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2).

Preferably, when a new node is added to the cluster, the Master node calculates performance bias of the new node, and step 5 is executed.

Preferably, the step 1) specifically includes the following steps:

1.1 Collecting static performance indexes of all nodes in the cluster by using a Master node, wherein the static performance indexes comprise CPU core number, CPU speed, disk capacity and memory size;

1.2 A Master node calculates static performance of each node in the cluster:

StaticResource＝α ₁ Cores+α ₂ Memory+α ₃ Store+α ₄ CpuSpeed (1)

wherein alpha is ₁ ，α ₂ ，α ₃ ，α ₄ Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha ₁ +α ₂ +α ₃ +α ₄ ＝1。α ₁ ，α ₂ ，α ₃ And alpha ₄ The values of (2) are calculated using analytic hierarchy process.

Preferably, the step 2) specifically includes the following steps:

2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;

2.2 A Master node determines a user job type UAT:

preferably, the step 3) specifically includes the following steps:

3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;

3.2 The Master node collects dynamic performance indexes of all nodes through heartbeat information;

3.3 Master node calculates dynamic resource of each node in the cluster:

DynamicResource＝β ₁ AvaiCores+β ₂ AvaiMemory+β ₃ AvaiSSdSpd+β ₄ AvaiSSd (3)

wherein beta is ₁ ，β ₂ ，β ₃ ，β ₄ Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is ₁ +β ₂ +β ₃ +β ₄ ＝1,β ₁ ，β ₂ ，β ₃ And beta ₄ The initial value of (2) is calculated by using analytic hierarchy process and beta ₁ And beta ₂ The value of (2) is adjusted according to the type of the user operation.

Preferably, the calculation formula of the performance bias NPT of the node in the step 4) is as follows:

NPT＝αStaticResource+βDynamicResource (4)

wherein alpha and beta are weights of statics resource and dynamics resource, respectively, and are calculated by using a hierarchical analysis method.

Preferably, the step 5) specifically includes the following steps: and sequencing the performance deflection values of the nodes, and distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.

The invention has the beneficial effects that: the invention finds the node most suitable for the characteristics of the user operation through analyzing the type of the user operation and calculating the performance bias of the cluster node in real time. The algorithm can complete resource scheduling according to the characteristics of the user operation and the real-time performance bias of the nodes, and effectively improve the performance of the cluster and shorten the execution time of the user operation by fully playing the performance advantages of the nodes.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an ATNPA algorithm implementation of the present invention;

FIG. 3 is a schematic diagram showing a comparison of the completion time of an ATNPA algorithm and Spark default algorithm of the present invention when executing user jobs of different data volumes on a Wordcount load;

FIG. 4 is a schematic diagram showing the comparison of the completion time of executing user jobs of different data volumes at the Sort load by the ATNPA algorithm and Spark default algorithm of the present invention;

fig. 5 is a schematic diagram showing the comparison of the completion time of the tasks of different types of users with the same data volume executed in parallel by the ATNPA algorithm and Spark default algorithm of the present invention.

Detailed Description

The invention will be further described with reference to the following examples of embodiments, but the scope of the invention is not limited thereto:

examples: as shown in fig. 1 and fig. 2, a cluster resource scheduling method based on user job type and node performance bias includes the following steps:

(1) When the cluster is in an idle state, the Master node collects static index values of all nodes in the cluster, and calculates the static performance of the nodes:

(1.1) the Master node collects static performance indexes of all nodes in the cluster, including CPU core number, CPU speed, disk capacity and memory size;

(1.2) Master node calculates the static Performance of each node in the Cluster

StaticResource＝α ₁ Cores+α ₂ Memory+α ₃ Store+α ₄ CpuSpeed

Wherein alpha is ₁ ，α ₂ ，α ₃ ，α ₄ Weights of static factors such as CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha ₁ +α ₂ +α ₃ +α ₄ ＝1。α ₁ ，α ₂ ，α ₃ And alpha ₄ The values of (2) are calculated by using an analytic hierarchy process, and the values are 0.113,0.641,0.073 and 0.173 respectively;

(2) The Master node calculates the type of user job currently to be run in the job queue:

(2.1) the Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation;

(2.2) Master node determines user job type UAT (User Application Type):

when UAT is less than or equal to 1.1, the operation is computationally intensive, otherwise, is memory intensive;

(3) The Master node collects dynamic state data of all nodes in the cluster and calculates dynamic performance of the nodes:

(3.1) each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;

(3.2) the Master node collects dynamic performance indexes of all nodes through heartbeat information;

(3.3) Master node computes dynamic Performance of each node in the Cluster

DynamicResource＝β ₁ AvaiCores+β ₂ AvaiMemory+β ₃ AvaiSSdSpd+β ₄ AvaiSSd

Wherein beta is ₁ ，β ₂ ，β ₃ ，β ₄ Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is ₁ +β ₂ +β ₃ +β ₄ ＝1,β ₁ ，β ₂ ，β ₃ And beta ₄ The initial value of (2) is calculated by using analytic hierarchy process and beta ₁ And beta ₂ The value of (2) is adjusted according to the type of the user operation. Beta for CPU intensive ₁ ，β ₂ ，β ₃ ，β ₄ Corresponding values are 0.442,0.344,0.156 and 0.078, respectively; for memory intensive, beta ₁ ，β ₂ ，β ₃ ，β ₄ Then take the values 0.344,0.442,0 respectively156 and 0.078;

(4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of the node: and (3) calculating the performance deflection NPT=alpha static resource+beta dynamic resource of each node by using the static performance static resource and the dynamic performance dynamic resource of each node in the cluster obtained in the steps (1) and (3), wherein alpha and beta are weights of the static performance static resource and the dynamic performance dynamic resource respectively, and the weights are calculated by using a hierarchical analysis method and are respectively 0.5 and 0.5.

(5) The Master node allocates appropriate node resources to the user job according to the user job type and the performance bias of the node: and according to the performance bias value sequencing of the nodes, distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.

(6) When the job is completed, the Master node returns an execution result of the job;

(7) When a new node is added into the cluster, the Master node calculates the performance bias of the node;

(8) If all user jobs are executed, ending the system operation; otherwise, returning to the step (2).

In summary, the invention analyzes the type of each user job, and calculates the performance bias of the node in real time according to the running state of the node in the cluster, so as to allocate the most suitable node for the user job. As shown in fig. 3 to 5, experiments show that compared with the default scheduling algorithm of Spark, the algorithm provided by the invention can effectively improve the performance of the cluster system. When the same task with different data volume is executed, the cluster performance is averagely improved by 8.56% by using the ATNPA algorithm; when different tasks are executed in parallel, the cluster performance is improved by 8.33% by using the ATNPA algorithm.

The foregoing is considered as illustrative of the principles of the present invention, and has been described herein before with reference to the accompanying drawings, in which the invention is not limited to the specific embodiments shown.

Claims

1. A cluster resource scheduling method based on user operation type and node performance bias is characterized by comprising the following steps:

1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; the method specifically comprises the following steps:

1.2 A Master node calculates static performance of each node in the cluster:

StaticResource＝α ₁ Cores+α ₂ Memory+α _s Store+α ₄ CpuSpeed (1)

wherein alpha is ₁ ，α ₂ ，α ₃ ，α ₄ Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha ₁ +α ₂ +α ₃ +α ₄ ＝1；α ₁ ，α ₂ ，α ₃ And alpha ₄ The value of (2) is calculated by using an analytic hierarchy process;

2) The Master node calculates the type of the user job to be run currently in the job queue; the method specifically comprises the following steps: 2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;

2.2 A Master node determines a user job type UAT:

3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; the method specifically comprises the following steps: 3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;

3.3 Master node calculates dynamic resource of each node in the cluster:

wherein beta is ₁ ，β ₂ ，β ₃ ，β ₄ Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is ₁ +β ₂ +β ₃ +β ₄ ＝1，β ₁ ，β ₂ ，β ₃ And beta ₄ The initial value of (2) is calculated by using analytic hierarchy process and beta ₁ And beta ₂ The numerical value of (2) is adjusted according to the type of the user operation;

4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; the calculation formula of the performance bias NPT of the above node is as follows:

NPT＝αStaticResource+βDynamicResource (4)

wherein alpha and beta are weights of static resource and dynamic resource respectively, and are calculated by using an analytic hierarchy process;

5) The Master node sorts the performance deflection values of all the nodes, allocates proper nodes for the user operation from the node with high priority to meet the memory and CPU core number requirements of the operation, completes the operation, and returns the execution result of the operation;

2. The method for scheduling cluster resources based on user job types and node performance bias according to claim 1, wherein when a new node is added to the cluster, a Master node calculates the performance bias of the new node, and step 5 is executed.