CN112764906B - Cluster resource scheduling method based on user job type and node performance bias - Google Patents
Cluster resource scheduling method based on user job type and node performance bias Download PDFInfo
- Publication number
- CN112764906B CN112764906B CN202110100907.5A CN202110100907A CN112764906B CN 112764906 B CN112764906 B CN 112764906B CN 202110100907 A CN202110100907 A CN 202110100907A CN 112764906 B CN112764906 B CN 112764906B
- Authority
- CN
- China
- Prior art keywords
- node
- performance
- cluster
- master node
- static
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000003068 static effect Effects 0.000 claims abstract description 31
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5055—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias, which comprises the following steps: 1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; 2) The Master node calculates the type of the user job to be run currently in the job queue; 3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; 4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; 5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation; 6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2). The invention has the beneficial effects that: the cluster performance is effectively improved, and the execution time of user operation is shortened.
Description
Technical Field
The invention relates to the technical field of cluster scheduling, in particular to a cluster resource scheduling method based on user operation type and node performance bias.
Background
With the advent of the big data age, user-generated data has grown exponentially. Single nodes and traditional computational models have failed to meet the performance and efficiency requirements of large data processing. Apache Spark is the most popular big data processing platform due to its excellent performance and rich application support. In the Spark standby mode, two scheduling algorithms, namely a spreadOut scheduling algorithm and a non-spreadOut scheduling algorithm, are adopted by default for the multi-job resource scheduling mode. Both of these resource scheduling algorithms employ very simple logic, i.e., scheduling by comparing the resource requirements of the job with the available resources of the node. Such resource scheduling algorithms do not take into account the type of job and the processing power characteristics of the node (node performance bias) when allocating cluster resources for user jobs. The user job type UAT is determined according to the memory resources and the CPU core number required by the operation, and the two parameters are given by the user when the operation is submitted. When UAT is less than or equal to 1.1, the operation is computationally intensive, whereas it is memory intensive. The node performance bias NPT is determined by the static and dynamic factors of the node. When multiple jobs are running, some nodes are caused to run jobs of which the nodes are not good at the type, so that the cluster has lower execution efficiency and longer job execution time.
Aiming at the relation between node performance bias and user operation types, a series of researches are developed, for example, patent document with publication number 107038069 provides a scheduling method for dynamically matching node performance labels and operation type labels. In the method, each node in the cluster runs a certain task, and the node is divided into a CPU type node, a disk IO type node and a common type node according to the relation between the time of running a single task by the node and the running time average value of all the nodes in the cluster. When the node label is updated, only the CPU and IO utilization rate of the node label are considered, and the factors such as the number of cores of the node CPU, the memory size, the disk capacity, the real-time read-write speed of the disk and the like are not considered. Since determining the job type tag requires pre-running part of the job and calculating using a naive bayes algorithm, the method is suitable for repeated jobs that are large in data file or require frequent running. It is therefore still a matter of urgent resolution how to quickly and accurately determine the types of nodes and user jobs in a cluster, and thus assign the jobs to the most suitable nodes.
Disclosure of Invention
The present invention is directed to a cluster resource scheduling method (ATNPA) based on user operation type and node performance bias, which can rapidly and accurately allocate the most suitable node to user operation.
The present invention achieves the above objective by the following desensitization scheme: a cluster resource scheduling method based on user operation type and node performance bias comprises the following steps:
1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes;
2) The Master node calculates the type of the user job to be run currently in the job queue;
3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node;
4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node;
5) The Master node distributes proper node resources to the user operation according to the user operation type and the performance bias of the node, the operation is completed, and the Master node returns an execution result of the operation;
6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2).
Preferably, when a new node is added to the cluster, the Master node calculates performance bias of the new node, and step 5 is executed.
Preferably, the step 1) specifically includes the following steps:
1.1 Collecting static performance indexes of all nodes in the cluster by using a Master node, wherein the static performance indexes comprise CPU core number, CPU speed, disk capacity and memory size;
1.2 A Master node calculates static performance of each node in the cluster:
StaticResource=α 1 Cores+α 2 Memory+α 3 Store+α 4 CpuSpeed (1)
wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1 +α 2 +α 3 +α 4 =1。α 1 ,α 2 ,α 3 And alpha 4 The values of (2) are calculated using analytic hierarchy process.
Preferably, the step 2) specifically includes the following steps:
2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;
2.2 A Master node determines a user job type UAT:
preferably, the step 3) specifically includes the following steps:
3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
3.2 The Master node collects dynamic performance indexes of all nodes through heartbeat information;
3.3 Master node calculates dynamic resource of each node in the cluster:
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd (3)
wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1 +β 2 +β 3 +β 4 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The value of (2) is adjusted according to the type of the user operation.
Preferably, the calculation formula of the performance bias NPT of the node in the step 4) is as follows:
NPT=αStaticResource+βDynamicResource (4)
wherein alpha and beta are weights of statics resource and dynamics resource, respectively, and are calculated by using a hierarchical analysis method.
Preferably, the step 5) specifically includes the following steps: and sequencing the performance deflection values of the nodes, and distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.
The invention has the beneficial effects that: the invention finds the node most suitable for the characteristics of the user operation through analyzing the type of the user operation and calculating the performance bias of the cluster node in real time. The algorithm can complete resource scheduling according to the characteristics of the user operation and the real-time performance bias of the nodes, and effectively improve the performance of the cluster and shorten the execution time of the user operation by fully playing the performance advantages of the nodes.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of an ATNPA algorithm implementation of the present invention;
FIG. 3 is a schematic diagram showing a comparison of the completion time of an ATNPA algorithm and Spark default algorithm of the present invention when executing user jobs of different data volumes on a Wordcount load;
FIG. 4 is a schematic diagram showing the comparison of the completion time of executing user jobs of different data volumes at the Sort load by the ATNPA algorithm and Spark default algorithm of the present invention;
fig. 5 is a schematic diagram showing the comparison of the completion time of the tasks of different types of users with the same data volume executed in parallel by the ATNPA algorithm and Spark default algorithm of the present invention.
Detailed Description
The invention will be further described with reference to the following examples of embodiments, but the scope of the invention is not limited thereto:
examples: as shown in fig. 1 and fig. 2, a cluster resource scheduling method based on user job type and node performance bias includes the following steps:
(1) When the cluster is in an idle state, the Master node collects static index values of all nodes in the cluster, and calculates the static performance of the nodes:
(1.1) the Master node collects static performance indexes of all nodes in the cluster, including CPU core number, CPU speed, disk capacity and memory size;
(1.2) Master node calculates the static Performance of each node in the Cluster
StaticResource=α 1 Cores+α 2 Memory+α 3 Store+α 4 CpuSpeed
Wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static factors such as CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1 +α 2 +α 3 +α 4 =1。α 1 ,α 2 ,α 3 And alpha 4 The values of (2) are calculated by using an analytic hierarchy process, and the values are 0.113,0.641,0.073 and 0.173 respectively;
(2) The Master node calculates the type of user job currently to be run in the job queue:
(2.1) the Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation;
(2.2) Master node determines user job type UAT (User Application Type):
when UAT is less than or equal to 1.1, the operation is computationally intensive, otherwise, is memory intensive;
(3) The Master node collects dynamic state data of all nodes in the cluster and calculates dynamic performance of the nodes:
(3.1) each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
(3.2) the Master node collects dynamic performance indexes of all nodes through heartbeat information;
(3.3) Master node computes dynamic Performance of each node in the Cluster
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd
Wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1 +β 2 +β 3 +β 4 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The value of (2) is adjusted according to the type of the user operation. Beta for CPU intensive 1 ,β 2 ,β 3 ,β 4 Corresponding values are 0.442,0.344,0.156 and 0.078, respectively; for memory intensive, beta 1 ,β 2 ,β 3 ,β 4 Then take the values 0.344,0.442,0 respectively156 and 0.078;
(4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of the node: and (3) calculating the performance deflection NPT=alpha static resource+beta dynamic resource of each node by using the static performance static resource and the dynamic performance dynamic resource of each node in the cluster obtained in the steps (1) and (3), wherein alpha and beta are weights of the static performance static resource and the dynamic performance dynamic resource respectively, and the weights are calculated by using a hierarchical analysis method and are respectively 0.5 and 0.5.
(5) The Master node allocates appropriate node resources to the user job according to the user job type and the performance bias of the node: and according to the performance bias value sequencing of the nodes, distributing proper nodes for user operation from the node with high priority to meet the memory and CPU core number requirements of operation requirements.
(6) When the job is completed, the Master node returns an execution result of the job;
(7) When a new node is added into the cluster, the Master node calculates the performance bias of the node;
(8) If all user jobs are executed, ending the system operation; otherwise, returning to the step (2).
In summary, the invention analyzes the type of each user job, and calculates the performance bias of the node in real time according to the running state of the node in the cluster, so as to allocate the most suitable node for the user job. As shown in fig. 3 to 5, experiments show that compared with the default scheduling algorithm of Spark, the algorithm provided by the invention can effectively improve the performance of the cluster system. When the same task with different data volume is executed, the cluster performance is averagely improved by 8.56% by using the ATNPA algorithm; when different tasks are executed in parallel, the cluster performance is improved by 8.33% by using the ATNPA algorithm.
The foregoing is considered as illustrative of the principles of the present invention, and has been described herein before with reference to the accompanying drawings, in which the invention is not limited to the specific embodiments shown.
Claims (2)
1. A cluster resource scheduling method based on user operation type and node performance bias is characterized by comprising the following steps:
1) The Master node collects static index values of all nodes in the cluster and calculates the static performance of all nodes; the method specifically comprises the following steps:
1.1 Collecting static performance indexes of all nodes in the cluster by using a Master node, wherein the static performance indexes comprise CPU core number, CPU speed, disk capacity and memory size;
1.2 A Master node calculates static performance of each node in the cluster:
StaticResource=α 1 Cores+α 2 Memory+α s Store+α 4 CpuSpeed (1)
wherein alpha is 1 ,α 2 ,α 3 ,α 4 Weights of static indexes of CPU core number, memory capacity, disk capacity and CPU speed respectively, and alpha 1 +α 2 +α 3 +α 4 =1;α 1 ,α 2 ,α 3 And alpha 4 The value of (2) is calculated by using an analytic hierarchy process;
2) The Master node calculates the type of the user job to be run currently in the job queue; the method specifically comprises the following steps: 2.1 The Master node obtains the CPU core number requiredMemory and the memory number requiredMemory required for completing the user operation to be run currently;
2.2 A Master node determines a user job type UAT:
3) The Master node collects dynamic state data of each node in the cluster and calculates dynamic performance of each node; the method specifically comprises the following steps: 3.1 Each node of the cluster collects own dynamic performance indexes including CPU residual rate, memory residual rate, disk capacity residual rate and current disk read-write speed;
3.2 The Master node collects dynamic performance indexes of all nodes through heartbeat information;
3.3 Master node calculates dynamic resource of each node in the cluster:
DynamicResource=β 1 AvaiCores+β 2 AvaiMemory+β 3 AvaiSSdSpd+β 4 AvaiSSd (3)
wherein beta is 1 ,β 2 ,β 3 ,β 4 Weights of CPU residual rate, memory residual rate, current read-write speed of disk and disk residual rate are respectively set, and beta is 1 +β 2 +β 3 +β 4 =1,β 1 ,β 2 ,β 3 And beta 4 The initial value of (2) is calculated by using analytic hierarchy process and beta 1 And beta 2 The numerical value of (2) is adjusted according to the type of the user operation;
4) According to the dynamic and static performances of each node, the Master node calculates the performance bias of each node; the calculation formula of the performance bias NPT of the above node is as follows:
NPT=αStaticResource+βDynamicResource (4)
wherein alpha and beta are weights of static resource and dynamic resource respectively, and are calculated by using an analytic hierarchy process;
5) The Master node sorts the performance deflection values of all the nodes, allocates proper nodes for the user operation from the node with high priority to meet the memory and CPU core number requirements of the operation, completes the operation, and returns the execution result of the operation;
6) If all user jobs are executed, ending the system operation; otherwise, returning to the step 2).
2. The method for scheduling cluster resources based on user job types and node performance bias according to claim 1, wherein when a new node is added to the cluster, a Master node calculates the performance bias of the new node, and step 5 is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110100907.5A CN112764906B (en) | 2021-01-26 | 2021-01-26 | Cluster resource scheduling method based on user job type and node performance bias |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110100907.5A CN112764906B (en) | 2021-01-26 | 2021-01-26 | Cluster resource scheduling method based on user job type and node performance bias |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112764906A CN112764906A (en) | 2021-05-07 |
CN112764906B true CN112764906B (en) | 2024-03-15 |
Family
ID=75707360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110100907.5A Active CN112764906B (en) | 2021-01-26 | 2021-01-26 | Cluster resource scheduling method based on user job type and node performance bias |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112764906B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832153A (en) * | 2017-11-14 | 2018-03-23 | 北京科技大学 | A kind of Hadoop cluster resources self-adapting distribution method |
CN108563497A (en) * | 2018-04-11 | 2018-09-21 | 中译语通科技股份有限公司 | A kind of efficient various dimensions algorithmic dispatching method, task server |
CN109960585A (en) * | 2019-02-02 | 2019-07-02 | 浙江工业大学 | A kind of resource regulating method based on kubernetes |
CN110413389A (en) * | 2019-07-24 | 2019-11-05 | 浙江工业大学 | A kind of task schedule optimization method under the unbalanced Spark environment of resource |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9672064B2 (en) * | 2015-07-13 | 2017-06-06 | Palo Alto Research Center Incorporated | Dynamically adaptive, resource aware system and method for scheduling |
-
2021
- 2021-01-26 CN CN202110100907.5A patent/CN112764906B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832153A (en) * | 2017-11-14 | 2018-03-23 | 北京科技大学 | A kind of Hadoop cluster resources self-adapting distribution method |
CN108563497A (en) * | 2018-04-11 | 2018-09-21 | 中译语通科技股份有限公司 | A kind of efficient various dimensions algorithmic dispatching method, task server |
CN109960585A (en) * | 2019-02-02 | 2019-07-02 | 浙江工业大学 | A kind of resource regulating method based on kubernetes |
CN110413389A (en) * | 2019-07-24 | 2019-11-05 | 浙江工业大学 | A kind of task schedule optimization method under the unbalanced Spark environment of resource |
Non-Patent Citations (1)
Title |
---|
资源不均衡Spark环境任务调度优化算法研究;胡亚红;盛夏;毛家发;;计算机工程与科学;20200215(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112764906A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096349B (en) | Job scheduling method based on cluster node load state prediction | |
Ekanayake et al. | Twister: a runtime for iterative mapreduce | |
US20070016558A1 (en) | Method and apparatus for dynamically associating different query execution strategies with selective portions of a database table | |
US20060074874A1 (en) | Method and apparatus for re-evaluating execution strategy for a database query | |
US11893014B2 (en) | Method and database system for initiating execution of a query and methods for use therein | |
WO2023179415A1 (en) | Machine learning computation optimization method and platform | |
US20070250517A1 (en) | Method and Apparatus for Autonomically Maintaining Latent Auxiliary Database Structures for Use in Executing Database Queries | |
CN113157421B (en) | Distributed cluster resource scheduling method based on user operation flow | |
CN110347515B (en) | Resource optimization allocation method suitable for edge computing environment | |
CN116089414B (en) | Time sequence database writing performance optimization method and device based on mass data scene | |
US20060074875A1 (en) | Method and apparatus for predicting relative selectivity of database query conditions using respective cardinalities associated with different subsets of database records | |
CN116302574B (en) | Concurrent processing method based on MapReduce | |
CN108984298A (en) | A kind of resource regulating method and system of cloud computing platform | |
Liu et al. | Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining | |
US20230325235A1 (en) | Training task queuing cause analysis method and system, device and medium | |
CN112764906B (en) | Cluster resource scheduling method based on user job type and node performance bias | |
CN112800020A (en) | Data processing method and device and computer readable storage medium | |
CN111988412A (en) | Intelligent prediction system and method for multi-tenant service resource demand | |
Piao et al. | Computing resource prediction for mapreduce applications using decision tree | |
CN115391047A (en) | Resource scheduling method and device | |
Ismaeel et al. | A systematic cloud workload clustering technique in large scale data centers | |
CN111813512B (en) | High-energy-efficiency Spark task scheduling method based on dynamic partition | |
CN113343040A (en) | Automatic incremental method, device, equipment and storage medium for graph algorithm | |
CN113886289A (en) | Abnormal data cleaning method based on k-means clustering under Spark platform | |
CN109947530B (en) | Multi-dimensional virtual machine mapping method for cloud platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |