CN112764906B - Cluster resource scheduling method based on user job type and node performance bias - Google Patents

Cluster resource scheduling method based on user job type and node performance bias Download PDF

Info

Publication number
CN112764906B
CN112764906B CN202110100907.5A CN202110100907A CN112764906B CN 112764906 B CN112764906 B CN 112764906B CN 202110100907 A CN202110100907 A CN 202110100907A CN 112764906 B CN112764906 B CN 112764906B
Authority
CN
China
Prior art keywords
node
performance
cluster
master node
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110100907.5A
Other languages
Chinese (zh)
Other versions
CN112764906A (en
Inventor
胡亚红
吴寅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110100907.5A priority Critical patent/CN112764906B/en
Publication of CN112764906A publication Critical patent/CN112764906A/en
Application granted granted Critical
Publication of CN112764906B publication Critical patent/CN112764906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5055Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias, which comprises the following steps: 1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; 2) The Master node calculates the type of the user job to be run currently in the job queue; 3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; 4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; 5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation; 6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2). The invention has the beneficial effects that: the cluster performance is effectively improved, and the execution time of user operation is shortened.

Description

Cluster resource scheduling method based on user job type and node performance bias
Technical Field
The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias.
Background
With the advent of the big data age, user-generated data has grown exponentially. Single nodes and traditional computational models have failed to meet the performance and efficiency requirements of large data processing. Apache Spark is the most popular big data processing platform due to its excellent performance and rich application support. In the Spark standby mode, two scheduling algorithms, namely a spreadOut scheduling algorithm and a non-spreadOut scheduling algorithm, are adopted by default for the multi-job resource scheduling mode. Both of these resource scheduling algorithms employ very simple logic, i.e., scheduling by comparing the resource requirements of the job with the available resources of the node. Such resource scheduling algorithms do not take into account the type of job and the processing power characteristics of the node (node performance bias) when allocating cluster resources for user jobs. The user job type UAT is determined according to the memory resources and the CPU core number required by the operation, and the two parameters are given by the user when the operation is submitted. When UAT is less than or equal to 1.1, the operation is computationally intensive, whereas it is memory intensive. The node performance bias NPT is determined by the static and dynamic factors of the node. When multiple jobs are running, some nodes are caused to run jobs of which the nodes are not good at the type, so that the cluster has lower execution efficiency and longer job execution time.
Aiming at the relation between node performance bias and user operation types, a series of researches are developed, for example, patent document with publication number 107038069 provides a scheduling method for dynamically matching node performance labels and operation type labels. In the method, each node in the cluster runs a certain task, and the node is divided into a CPU type node, a disk IO type node and a common type node according to the relation between the time of running a single task by the node and the running time average value of all the nodes in the cluster. When the node label is updated, only the CPU and IO utilization rate of the node label are considered, and the factors such as the number of cores of the node CPU, the memory size, the disk capacity, the real-time read-write speed of the disk and the like are not considered. Since determining the job type tag requires pre-running part of the job and calculating using a naive bayes algorithm, the method is suitable for repeated jobs that are large in data file or require frequent running. It is therefore still a matter of urgent resolution how to quickly and accurately determine the types of nodes and user jobs in a cluster, and thus assign the jobs to the most suitable nodes.
Disclosure of Invention
The present invention is directed to a cluster resource scheduling method (ATNPA) based on user operation type and node performance bias, which can rapidly and accurately allocate the most suitable node to user operation.
The present invention achieves the above objective by the following desensitization scheme: a cluster resource scheduling method based on user operation type and node performance bias comprises the following steps:
1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes;
2) The Master node calculates the type of the user job to be run currently in the job queue;
3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node;
4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node;
5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation;
6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2).
Preferably, when a new node is added to the cluster, the Master node calculates performance bias of the new node, and step 5 is executed.
Preferably, the step 1) specifically includes the following steps:
1.1 Collecting static performance indexes of all nodes in the cluster by using a Master node, wherein the static performance indexes comprise CPU core number, CPU speed, disk capacity and memory size;
1.2 A Master node calculates static performance of each node in the cluster:
StaticResource=α 1 Cores+α 2 Memory+α 3 Store+α 4 CpuSpeed (1)
wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1234 =1。α 1 ,α 2 ,α 3 And alpha 4 The values of (2) are calculated using analytic hierarchy process.
Preferably, the step 2) specifically includes the following steps:
2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;
2.2 A Master node determines a user job type UAT:
preferably, the step 3) specifically includes the following steps:
3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
3.2 The Master node collects dynamic performance indexes of all nodes through heartbeat information;
3.3 Master node calculates dynamic resource of each node in the cluster:
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd (3)
wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1234 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The value of (2) is adjusted according to the type of the user operation.
Preferably, the calculation formula of the performance bias NPT of the node in the step 4) is as follows:
NPT=αStaticResource+βDynamicResource (4)
wherein alpha and beta are weights of statics resource and dynamics resource, respectively, and are calculated by using a hierarchical analysis method.
Preferably, the step 5) specifically includes the following steps: and sequencing the performance deflection values of the nodes, and distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.
The invention has the beneficial effects that: the invention finds the node most suitable for the characteristics of the user operation through analyzing the type of the user operation and calculating the performance bias of the cluster node in real time. The algorithm can complete resource scheduling according to the characteristics of the user operation and the real-time performance bias of the nodes, and effectively improve the performance of the cluster and shorten the execution time of the user operation by fully playing the performance advantages of the nodes.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of an ATNPA algorithm implementation of the present invention;
FIG. 3 is a schematic diagram showing a comparison of the completion time of an ATNPA algorithm and Spark default algorithm of the present invention when executing user jobs of different data volumes on a Wordcount load;
FIG. 4 is a schematic diagram showing the comparison of the completion time of executing user jobs of different data volumes at the Sort load by the ATNPA algorithm and Spark default algorithm of the present invention;
fig. 5 is a schematic diagram showing the comparison of the completion time of the tasks of different types of users with the same data volume executed in parallel by the ATNPA algorithm and Spark default algorithm of the present invention.
Detailed Description
The invention will be further described with reference to the following examples of embodiments, but the scope of the invention is not limited thereto:
examples: as shown in fig. 1 and fig. 2, a cluster resource scheduling method based on user job type and node performance bias includes the following steps:
(1) When the cluster is in an idle state, the Master node collects static index values of all nodes in the cluster, and calculates the static performance of the nodes:
(1.1) the Master node collects static performance indexes of all nodes in the cluster, including CPU core number, CPU speed, disk capacity and memory size;
(1.2) Master node calculates the static Performance of each node in the Cluster
StaticResource=α 1 Cores+α 2 Memory+α 3 Store+α 4 CpuSpeed
Wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static factors such as CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1234 =1。α 1 ,α 2 ,α 3 And alpha 4 The values of (2) are calculated by using an analytic hierarchy process, and the values are 0.113,0.641,0.073 and 0.173 respectively;
(2) The Master node calculates the type of user job currently to be run in the job queue:
(2.1) the Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation;
(2.2) Master node determines user job type UAT (User Application Type):
when UAT is less than or equal to 1.1, the operation is computationally intensive, otherwise, is memory intensive;
(3) The Master node collects dynamic state data of all nodes in the cluster and calculates dynamic performance of the nodes:
(3.1) each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
(3.2) the Master node collects dynamic performance indexes of all nodes through heartbeat information;
(3.3) Master node computes dynamic Performance of each node in the Cluster
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd
Wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1234 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The value of (2) is adjusted according to the type of the user operation. Beta for CPU intensive 1 ,β 2 ,β 3 ,β 4 Corresponding values are 0.442,0.344,0.156 and 0.078, respectively; for memory intensive, beta 1 ,β 2 ,β 3 ,β 4 Then take the values 0.344,0.442,0 respectively156 and 0.078;
(4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of the node: and (3) calculating the performance deflection NPT=alpha static resource+beta dynamic resource of each node by using the static performance static resource and the dynamic performance dynamic resource of each node in the cluster obtained in the steps (1) and (3), wherein alpha and beta are weights of the static performance static resource and the dynamic performance dynamic resource respectively, and the weights are calculated by using a hierarchical analysis method and are respectively 0.5 and 0.5.
(5) The Master node allocates appropriate node resources to the user job according to the user job type and the performance bias of the node: and according to the performance bias value sequencing of the nodes, distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.
(6) When the job is completed, the Master node returns an execution result of the job;
(7) When a new node is added into the cluster, the Master node calculates the performance bias of the node;
(8) If all user jobs are executed, ending the system operation; otherwise, returning to the step (2).
In summary, the invention analyzes the type of each user job, and calculates the performance bias of the node in real time according to the running state of the node in the cluster, so as to allocate the most suitable node for the user job. As shown in fig. 3 to 5, experiments show that compared with the default scheduling algorithm of Spark, the algorithm provided by the invention can effectively improve the performance of the cluster system. When the same task with different data volume is executed, the cluster performance is averagely improved by 8.56% by using the ATNPA algorithm; when different tasks are executed in parallel, the cluster performance is improved by 8.33% by using the ATNPA algorithm.
The foregoing is considered as illustrative of the principles of the present invention, and has been described herein before with reference to the accompanying drawings, in which the invention is not limited to the specific embodiments shown.

Claims (2)

1. A cluster resource scheduling method based on user operation type and node performance bias is characterized by comprising the following steps:
1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; the method specifically comprises the following steps:
1.1 Collecting static performance indexes of all nodes in the cluster by using a Master node, wherein the static performance indexes comprise CPU core number, CPU speed, disk capacity and memory size;
1.2 A Master node calculates static performance of each node in the cluster:
StaticResource=α 1 Cores+α 2 Memory+α s Store+α 4 CpuSpeed (1)
wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1234 =1;α 1 ,α 2 ,α 3 And alpha 4 The value of (2) is calculated by using an analytic hierarchy process;
2) The Master node calculates the type of the user job to be run currently in the job queue; the method specifically comprises the following steps: 2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;
2.2 A Master node determines a user job type UAT:
3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; the method specifically comprises the following steps: 3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
3.2 The Master node collects dynamic performance indexes of all nodes through heartbeat information;
3.3 Master node calculates dynamic resource of each node in the cluster:
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd (3)
wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1234 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The numerical value of (2) is adjusted according to the type of the user operation;
4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; the calculation formula of the performance bias NPT of the above node is as follows:
NPT=αStaticResource+βDynamicResource (4)
wherein alpha and beta are weights of static resource and dynamic resource respectively, and are calculated by using an analytic hierarchy process;
5) The Master node sorts the performance deflection values of all the nodes, allocates proper nodes for the user operation from the node with high priority to meet the memory and CPU core number requirements of the operation, completes the operation, and returns the execution result of the operation;
6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2).
2. The method for scheduling cluster resources based on user job types and node performance bias according to claim 1, wherein when a new node is added to the cluster, a Master node calculates the performance bias of the new node, and step 5 is executed.
CN202110100907.5A 2021-01-26 2021-01-26 Cluster resource scheduling method based on user job type and node performance bias Active CN112764906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110100907.5A CN112764906B (en) 2021-01-26 2021-01-26 Cluster resource scheduling method based on user job type and node performance bias

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110100907.5A CN112764906B (en) 2021-01-26 2021-01-26 Cluster resource scheduling method based on user job type and node performance bias

Publications (2)

Publication Number Publication Date
CN112764906A CN112764906A (en) 2021-05-07
CN112764906B true CN112764906B (en) 2024-03-15

Family

ID=75707360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110100907.5A Active CN112764906B (en) 2021-01-26 2021-01-26 Cluster resource scheduling method based on user job type and node performance bias

Country Status (1)

Country Link
CN (1) CN112764906B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN108563497A (en) * 2018-04-11 2018-09-21 中译语通科技股份有限公司 A kind of efficient various dimensions algorithmic dispatching method, task server
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN110413389A (en) * 2019-07-24 2019-11-05 浙江工业大学 A kind of task schedule optimization method under the unbalanced Spark environment of resource

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672064B2 (en) * 2015-07-13 2017-06-06 Palo Alto Research Center Incorporated Dynamically adaptive, resource aware system and method for scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN108563497A (en) * 2018-04-11 2018-09-21 中译语通科技股份有限公司 A kind of efficient various dimensions algorithmic dispatching method, task server
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN110413389A (en) * 2019-07-24 2019-11-05 浙江工业大学 A kind of task schedule optimization method under the unbalanced Spark environment of resource

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
资源不均衡Spark环境任务调度优化算法研究;胡亚红;盛夏;毛家发;;计算机工程与科学;20200215(第02期);全文 *

Also Published As

Publication number Publication date
CN112764906A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN110096349B (en) Job scheduling method based on cluster node load state prediction
Ekanayake et al. Twister: a runtime for iterative mapreduce
US20070016558A1 (en) Method and apparatus for dynamically associating different query execution strategies with selective portions of a database table
US20060074874A1 (en) Method and apparatus for re-evaluating execution strategy for a database query
US11893014B2 (en) Method and database system for initiating execution of a query and methods for use therein
WO2023179415A1 (en) Machine learning computation optimization method and platform
US20070250517A1 (en) Method and Apparatus for Autonomically Maintaining Latent Auxiliary Database Structures for Use in Executing Database Queries
CN113157421B (en) Distributed cluster resource scheduling method based on user operation flow
CN110347515B (en) Resource optimization allocation method suitable for edge computing environment
CN116089414B (en) Time sequence database writing performance optimization method and device based on mass data scene
US20060074875A1 (en) Method and apparatus for predicting relative selectivity of database query conditions using respective cardinalities associated with different subsets of database records
CN116302574B (en) Concurrent processing method based on MapReduce
CN108984298A (en) A kind of resource regulating method and system of cloud computing platform
Liu et al. Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining
US20230325235A1 (en) Training task queuing cause analysis method and system, device and medium
CN112764906B (en) Cluster resource scheduling method based on user job type and node performance bias
CN112800020A (en) Data processing method and device and computer readable storage medium
CN111988412A (en) Intelligent prediction system and method for multi-tenant service resource demand
Piao et al. Computing resource prediction for mapreduce applications using decision tree
CN115391047A (en) Resource scheduling method and device
Ismaeel et al. A systematic cloud workload clustering technique in large scale data centers
CN111813512B (en) High-energy-efficiency Spark task scheduling method based on dynamic partition
CN113343040A (en) Automatic incremental method, device, equipment and storage medium for graph algorithm
CN113886289A (en) Abnormal data cleaning method based on k-means clustering under Spark platform
CN109947530B (en) Multi-dimensional virtual machine mapping method for cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant