WO2022028059A1 - 一种基于服务器集群的调度节点的选择抢占方法及系统 - Google Patents

一种基于服务器集群的调度节点的选择抢占方法及系统 Download PDF

Info

Publication number
WO2022028059A1
WO2022028059A1 PCT/CN2021/096403 CN2021096403W WO2022028059A1 WO 2022028059 A1 WO2022028059 A1 WO 2022028059A1 CN 2021096403 W CN2021096403 W CN 2021096403W WO 2022028059 A1 WO2022028059 A1 WO 2022028059A1
Authority
WO
WIPO (PCT)
Prior art keywords
preemption
task
node
combination
task combination
Prior art date
Application number
PCT/CN2021/096403
Other languages
English (en)
French (fr)
Inventor
陈天石
王小珂
刘黎
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2022028059A1 publication Critical patent/WO2022028059A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Definitions

  • the invention belongs to the technical field of server resource allocation, and in particular relates to a method and system for selecting and preempting scheduling nodes based on server clusters.
  • the system will preempt the resources of the task with lower priority by default, but if the number of tasks in the task combination with lower priority is much higher than other combinations, If the combination is preempted from time to time, it is easy to cause unstable operation of the overall task of the cluster.
  • the present invention provides a method for selecting and preempting scheduling nodes based on a server cluster, aiming to solve the problem in the prior art that when a submitted urgent task is received in the cluster, the system preempts the smaller priority by default. resources of high-level tasks, it is easy to cause the problem of unstable operation of the overall task of the cluster.
  • the technical solution provided by the present invention is: a method for selecting and preempting a scheduling node based on a server cluster, the method comprising the following steps:
  • the most suitable preemptive task combination corresponding to each selected node is optimally sorted, the final most suitable preemptive task combination is obtained, and the obtained final most suitable preemptive task combination is fed back to the scheduler.
  • the step of performing resource monitoring and analysis on all nodes in the cluster, and acquiring all preselected nodes in the cluster specifically includes the following steps:
  • the current current node is a pre-selected node, otherwise it is determined to be a non-pre-selected node;
  • the step of calculating the arithmetic mean of the priorities of all preemptible combinations of each of the preselected nodes, and combining the number of tasks in the combination to determine the most suitable task combination for preemption in the node specifically includes the following steps :
  • all preemptible combinations in the currently preselected nodes are compared pairwise, and a task combination that is most suitable for preemption is selected.
  • the step of comparing all preemptible combinations in the current pre-selected node pairwise according to the determined first judgment threshold, and selecting the most suitable task combination for preemption specifically includes the following steps:
  • the arithmetic mean value of the two task combinations is calculated to obtain the distance between the priorities of the two task combinations;
  • a task combination with a larger arithmetic mean value is selected as a task combination that is more suitable for preemption
  • a task combination with a lower number of tasks is selected as a task combination that is more suitable for preemption
  • the task combination obtained is the most suitable task combination for preemption.
  • the selected task combination that is most suitable for preemption corresponding to each node is optimally sorted, the final most suitable task combination for preemption is obtained, and the obtained final most suitable preemption task combination is obtained.
  • the step of feeding back the task combination to the scheduler specifically includes the following steps:
  • the second judgment threshold is determined, and the most suitable task combination for preemption corresponding to each node in the cluster is compared pair-wise, and the final most suitable task combination in the cluster is selected.
  • the obtained task combination that is most suitable for preemption is fed back to the scheduler.
  • the second judgment threshold is determined according to the task combination that is most suitable for preemption of each node, and the task combination that is most suitable for preemption corresponding to each node in the cluster is compared pair-wise, and the selected
  • the steps to determine the combination of tasks that are ultimately most suitable for preemption in the cluster include the following steps:
  • the arithmetic mean of the two most suitable task combinations for preemption is calculated to obtain the distance between the priorities of the two most suitable task combinations;
  • the task combination that is most suitable for preemption with a larger arithmetic mean value is selected as the final most suitable task combination to be preempted;
  • the task combination that is most suitable for preemption with a lower number of tasks is selected as the final most suitable task combination to be preempted;
  • the task combination that is most suitable for preemption of all preselected nodes is compared, the task combination obtained after completion is the final most suitable task combination for preemption.
  • the preselected node acquisition module is used to monitor and analyze the resources of all nodes in the cluster, and obtain all the preselected nodes in the cluster;
  • the most suitable preemptive task combination determination module in the node is used to calculate the arithmetic average of all preemptible combination priorities of each of the preselected nodes, and combine the number of tasks in the combination to determine the most suitable preemptible task combination in the node;
  • the final most suitable preemption task combination determination module is used to optimally sort the most suitable preemptive task combination corresponding to each selected node, obtain the final most suitable preemptive task combination, and obtain the final most suitable preemption task combination.
  • the preempted task combination is fed back to the scheduler.
  • the preselected node acquisition module specifically includes:
  • an attribute index collection module configured to collect attribute indexes of all nodes in the cluster, where the attribute indexes of the nodes include the total amount of resources owned by the nodes and the total amount of unoccupied resources on the nodes;
  • a total resource comparison module configured to compare the total resources owned by the nodes of all nodes in the cluster with the total resources requested by the emergency task, and determine whether the node is a preselected node;
  • the node determination module is used to determine that the current current node is a preselected node when the total amount of resources owned by the node is greater than or equal to the total amount of resources requested by the emergency task, otherwise it is determined to be a non-preselected node;
  • the preselected node import module is used to import the acquired preselected nodes into the resource manager.
  • the most suitable preemption task combination determination module in the node specifically includes:
  • the first arithmetic mean calculation module is used to obtain the priority data of all tasks from the preselected nodes, and calculate the arithmetic mean of all preemptible combined priorities;
  • a first judgment threshold determination module configured to determine a first judgment threshold according to the arithmetic mean value obtained by calculation, and the first judgment threshold is used for judging the most suitable task combination for preemption in the current node;
  • a first pairwise comparison module configured to perform pairwise comparison of all preemptible combinations in the current preselected node according to the determined first judgment threshold, and select a task combination that is most suitable for preemption;
  • the first pairwise comparison module specifically includes:
  • the first difference operation module is used to perform a difference operation on the arithmetic mean value of the two task combinations in the current preselected node to obtain the distance between the priorities of the two task combinations;
  • a first judgment threshold comparison module configured to compare the calculated distance between the priorities of the two task combinations with the first judgment threshold, and judge whether the distance between the priorities is greater than or equal to the first judgment threshold ;
  • a first selection module configured to select a task combination with a larger arithmetic mean value as a task combination more suitable for preemption when the distance between the judgment priorities is greater than or equal to the first judgment threshold;
  • a second selection module configured to select a task combination with a lower number of tasks as a task combination more suitable for preemption when the distance between the judgment priorities is less than the first judgment threshold;
  • the task combination obtained is the most suitable task combination for preemption.
  • the final most suitable preemption task combination determination module specifically includes:
  • the task combination acquisition module is used to obtain the most suitable task combination in the node determined by each node;
  • the comparison and screening module is used to determine the second judgment threshold according to the task combination most suitable for preemption of each node, and compare the most suitable task combination for preemption corresponding to each node in the cluster pairwise, and select the final task combination in the cluster. The best combination of tasks for preemption;
  • a task combination feedback module used to feed back the obtained task combination that is most suitable for preemption to the scheduler
  • the comparison and screening module specifically includes:
  • the second arithmetic zero mean calculation module is used to obtain the priority data of all tasks of the task combination most suitable for preemption of each node, and calculate the arithmetic mean of the priorities of all preemptible combinations;
  • the second judgment threshold determination module is configured to determine a second judgment threshold according to the arithmetic mean value obtained by calculation, and the second judgment threshold is used for judging the final task combination that is most suitable for preemption in the cluster;
  • the second difference operation module is used to perform a difference operation on the arithmetic mean value of the two most suitable task combinations for preemption in the cluster to obtain the distance between the priorities of the two most suitable task combinations;
  • the second judgment threshold comparison module is configured to compare the calculated distance between the priorities of the two most suitable task combinations with the second judgment threshold, and judge whether the distance between the priorities is greater than or equal to the the second judgment threshold;
  • the third selection module is configured to select the most suitable task combination for preemption with a larger arithmetic mean value as the final most suitable task combination to be preempted when the distance between the judgment priorities is greater than or equal to the second judgment threshold;
  • the fourth selection module is used to select the most suitable task combination for preemption with a lower number of tasks as the final most suitable task combination to be preempted when the distance between the judgment priorities is less than the second judgment threshold;
  • the task combination that is most suitable for preemption of all preselected nodes is compared, the task combination obtained after completion is the final most suitable task combination for preemption.
  • resource monitoring and analysis are performed on all nodes in the cluster, and all preselected nodes in the cluster are obtained;
  • the number of tasks determines the most suitable task combination for preemption in the node;
  • the most suitable task combination for preemption corresponding to each selected node is optimally sorted to obtain the final most suitable task combination for preemption, and the obtained final most suitable task combination is obtained.
  • the task combination suitable for preemption is fed back to the scheduler, so that the urgent tasks submitted in the cluster can reasonably preempt the running task resources according to the number and priority of the tasks, and enhance the stability of the cluster operation.
  • Fig. 1 is the realization flow chart of the method for selecting and preempting the scheduling node based on the server cluster provided by the present invention
  • Fig. 2 is the realization flow chart of carrying out resource monitoring and analysis to all nodes in the cluster provided by the present invention, and obtaining all preselected nodes in the cluster;
  • Fig. 3 is the realization flow chart of calculating the arithmetic mean value of all preemptible combination priorities of each of the preselected nodes provided by the present invention, and determining the most suitable task combination for preemption in the node in combination with the number of tasks in the combination;
  • Fig. 4 is the first judgment threshold value determined according to the present invention, and compares all preemptible combinations in the current preselected node pairwise, and selects the most suitable task combination for preemption.
  • Fig. 5 shows the optimal sorting of the most suitable preemptive task combination corresponding to each selected node provided by the present invention, obtaining the final most suitable preempting task combination, and sorting the obtained final most suitable preempting task combination
  • Fig. 6 is the task combination that is most suitable for preemption of each node provided by the present invention, the second judgment threshold is determined, and the task combination that is most suitable for preemption corresponding to each node in the cluster is compared pairwise.
  • FIG. 7 is a structural block diagram of a server cluster-based scheduling node selection preemption system provided by the present invention.
  • FIG. 8 is a structural block diagram of a preselected node acquisition module provided by the present invention.
  • Fig. 9 is the structural block diagram of the most suitable preemption task combination determination module in the node provided by the present invention.
  • FIG. 10 is a structural block diagram of a first pairwise comparison module provided by the present invention.
  • Fig. 11 is the structural block diagram of the final most suitable preemption task combination determination module provided by the present invention.
  • FIG. 12 is a structural block diagram of the comparison screening module provided by the present invention.
  • Fig. 1 is the realization flow chart of the method for selecting and preempting the scheduling node based on the server cluster provided by the present invention, which specifically includes the following steps:
  • step S101 resource monitoring and analysis are performed on all nodes in the cluster, and all preselected nodes in the cluster are obtained;
  • step S102 calculate the arithmetic mean of all preemptible combination priorities of each of the preselected nodes, and determine the most suitable task combination for preemption in the node in combination with the number of tasks in the combination;
  • step S103 optimally sort the selected task combination that is most suitable for preemption corresponding to each node, obtain the final task combination most suitable for preemption, and feed back the acquired task combination that is most suitable for preemption. scheduler.
  • the embodiment of the present invention mainly improves the preemption method of the YARN scheduler.
  • the number and priority of preempted tasks are considered to ensure that urgent tasks (tasks with higher priority) submitted in the cluster can be Preempt other running task resources more reasonably to avoid the problem of excessive task loss caused by a single preemption mechanism.
  • the scheduler will build a decision model based on the two factors of priority and the number of tasks to decide which node's resources to preempt in the end, so as to ensure that the scheduler can preempt a smaller number of tasks first when the task priorities of different nodes are not much different. Improve the overall task running stability of the cluster.
  • the steps of monitoring and analyzing the resources of all nodes in the cluster, and obtaining all the pre-selected nodes in the cluster specifically include the following steps:
  • step S201 the attribute indexes of all nodes in the cluster are collected, and the attribute indexes of the nodes include the total amount of resources owned by the nodes and the total amount of unoccupied resources on the nodes;
  • the attribute indicators are for example: real-time collection of the total resources owned by cluster node 1 (the total memory is 64G, the total number of CPU cores is 6) and the idle rate of node 1 resources (the unused resources of the node account for the total number of nodes in the node). percentage of resources), etc.
  • step S202 the total amount of resources owned by the nodes of all nodes in the cluster is compared with the total amount of resources requested by the emergency task, and it is judged whether the node is a pre-selected node;
  • the submitted task is a network service, and the requested task resource is 10G memory. If the total resource of node 1 of the cluster is 20G (greater than the requested resource), it means that the resource requirements of the emergency task are met, and node 1 can be considered as preselected If the total resource of node 1 is 7G memory, it is judged that the node is a non-preselected node.
  • step S203 when the total amount of resources owned by the node is greater than or equal to the total amount of resources requested by the emergency task, it is determined that the current current node is a preselected node, otherwise it is determined to be a non-preselected node;
  • step S204 the acquired preselected nodes are imported into the resource manager, that is, the list of preselected nodes including node 1 is directly loaded into the resource manager (RM).
  • the steps of calculating the arithmetic mean of all preemptible combination priorities of each of the preselected nodes, and determining the most suitable task combination for preemption in the node in combination with the number of tasks in the combination specifically include the following steps:
  • step S301 the priority data of all tasks are obtained from the preselected nodes, and the arithmetic mean of all preemptible combined priorities is calculated;
  • step S302 a first judgment threshold is determined according to the arithmetic mean value obtained by calculation, and the first judgment threshold is used to judge the task combination most suitable for preemption in the current node;
  • combination A includes tasks 1, 2, and combination B includes tasks 3, 4, and 5, and any single combination of them can be preempted to satisfy the resources of the emergency task need;
  • the average priority of the two tasks of combination A is 5, and the average priority of B is 3, and 70% of the larger average priority is selected as the first judgment threshold, that is, 3.5 (5*0.7).
  • step S303 according to the determined first judgment threshold, all preemptible combinations in the currently preselected node are compared pair-wise, and a task combination that is most suitable for preemption is selected.
  • the average priority difference between A and B is 2, which is less than the threshold of 3.5, so it is determined that A is more suitable for preemption.
  • the step of selecting the most suitable task combination for preemption specifically includes the following steps: Describe the steps:
  • step S401 in the current preselected node, the arithmetic mean value of the two task combinations is performed difference operation to obtain the distance between the priorities of the two task combinations;
  • step S402 the calculated distance between the priorities of the two task combinations is compared with the first judgment threshold, and it is judged whether the distance between the priorities is greater than or equal to the first judgment threshold;
  • step S403 when it is determined that the distance between the priorities is greater than or equal to the first judgment threshold, a task combination with a larger arithmetic mean value is selected as a task combination that is more suitable for preemption;
  • step S404 when it is determined that the distance between the priorities is less than the first judgment threshold, a task combination with a lower number of tasks is selected as a task combination that is more suitable for preemption;
  • the task combination obtained is the most suitable task combination for preemption.
  • combination A has 2 tasks
  • combination B has 3 tasks
  • the average priority difference between combination A and combination B is 2, which is less than the threshold of 3.5, so it is determined that combination A is more suitable for preemption.
  • the most suitable task combination for preemption corresponding to each selected node is optimally sorted, the final most suitable task combination for preemption is obtained, and the obtained final most suitable for preemption task combination is fed back
  • the steps given to the scheduler specifically include the following steps:
  • step S501 obtain the task combination most suitable for preemption in the node determined by each node
  • step S502 the second judgment threshold is determined according to the task combination most suitable for preemption of each node, and the most suitable task combination for preemption corresponding to each node in the cluster is compared pair-wise, and the final most suitable task combination in the cluster is selected.
  • step S503 the obtained task combination that is most suitable for preemption is fed back to the scheduler.
  • the cluster has three preselected nodes, node1, node2, and node3, the average priority of the optimal combination of node1 is 5, and the number of combined tasks is 3; the average priority of node3 is 7, and the number of combined tasks is 2 ; The average priority of node2 is 8, and the number of combined tasks is 3.
  • compare the node3 node with the node1 node and determine that the node3 node is the optimal node of the cluster.
  • the second judgment threshold is determined, and the most suitable task combination for preemption corresponding to each node in the cluster is compared pair-wise, and the final task combination in the cluster is selected.
  • the steps for the best combination of tasks for preemption include the following steps:
  • step S601 the priority data of all tasks of the task combination that is most suitable for preemption of each node is obtained, and the arithmetic average of the priorities of all preemptible combinations is calculated;
  • step S602 a second judgment threshold is determined according to the calculated arithmetic mean value, and the second judgment threshold is used for judging the final task combination that is most suitable for preemption in the cluster;
  • step S603 in the cluster, the arithmetic mean value of the two most suitable task combinations for preemption is performed difference operation to obtain the distance between the priorities of the two most suitable task combinations;
  • step S604 the calculated distance between the priorities of the two most suitable task combinations for preemption is compared with the second judgment threshold, and it is judged whether the distance between the priorities is greater than or equal to the second judgment threshold ;
  • step S605 when it is determined that the distance between the priorities is greater than or equal to the second judgment threshold, the most suitable task combination for preemption with a larger arithmetic mean value is selected as the final most suitable task combination to be preempted;
  • step S606 when it is determined that the distance between the priorities is less than the second judgment threshold, the task combination that is most suitable for preemption with a lower number of tasks is selected as the final most suitable task combination to be preempted;
  • the task combination obtained after the comparison of the most suitable preemptive task combinations of all preselected nodes is the final most suitable preemptive task combination.
  • the monitoring system collects information such as node resource status and identifies preselected nodes suitable for preemption; all combinations are sorted according to the priority of preemptible task combinations and the number of tasks, so as to select the most suitable combination for preemption and send To the YARN scheduler; according to the priority of the optimal combination of all preselected nodes and the number of tasks, select the most suitable scheduling node and send the optimal combination of the node to the YARN scheduler, so as to ensure as little preemption on the cluster as possible
  • the number of tasks enhances the overall task running stability of the cluster.
  • FIG. 7 shows a structural block diagram of a system for selecting and preempting a scheduling node based on a server cluster provided by the present invention. For convenience of description, only the parts related to the embodiments of the present invention are shown in the figure.
  • the preemption system for selecting scheduling nodes based on server clusters includes:
  • the preselected node acquisition module 11 is used to monitor and analyze the resources of all the nodes in the cluster, and acquire all the preselected nodes in the cluster;
  • the most suitable preemptive task combination determination module 12 in the node is used to calculate the arithmetic mean value of all preemptible combination priorities of each of the preselected nodes, and determine the most suitable preemptible task combination in the node in combination with the number of tasks in the combination;
  • the final most suitable preemption task combination determination module 13 is used to optimally sort the most suitable preemptive task combination corresponding to each selected node, obtain the final most suitable preemptive task combination, and use the obtained final most suitable preemption task combination.
  • the task combination suitable for preemption is fed back to the scheduler.
  • the preselected node acquisition module 11 specifically includes:
  • the attribute index collection module 14 is configured to collect attribute indexes of all nodes in the cluster, where the attribute indexes of the nodes include the total amount of resources owned by the nodes and the total amount of unoccupied resources on the nodes;
  • the total resource comparison module 15 is configured to compare the total resources owned by the nodes of all nodes in the cluster with the total resources requested by the emergency task, and determine whether the node is a preselected node;
  • the node determination module 16 is configured to determine that the current current node is a pre-selected node when the total amount of resources owned by the node is greater than or equal to the total amount of resources requested by the emergency task, otherwise it is determined to be a non-pre-selected node;
  • the preselected node import module 17 is used for importing the acquired preselected node into the resource manager.
  • the most suitable preemption task combination determination module 12 in the node specifically includes:
  • the first arithmetic mean calculation module 18 is used to obtain the priority data of all tasks from the preselected nodes, and calculate the arithmetic mean of all preemptible combined priorities;
  • the first judgment threshold determination module 19 is configured to determine a first judgment threshold according to the arithmetic mean value obtained by calculation, and the first judgment threshold is used to judge the task combination most suitable for preemption in the current node;
  • the first pairwise comparison module 20 is configured to perform pairwise comparison of all preemptible combinations in the current preselected node according to the determined first judgment threshold, and select the most suitable task combination for preemption;
  • the first pairwise comparison module 20 specifically includes:
  • the first difference operation module 21 is used to perform a difference operation on the arithmetic mean value of the two task combinations in the current preselected node to obtain the distance between the priorities of the two task combinations;
  • the first selection module 23 is used to select a task combination with a larger arithmetic mean value as a task combination more suitable for preemption when the distance between the judgment priorities is greater than or equal to the first judgment threshold;
  • the second selection module 24 is configured to select a task combination with a lower number of tasks as a task combination more suitable for preemption when the distance between the judgment priorities is less than the first judgment threshold;
  • the task combination obtained is the most suitable task combination for preemption.
  • the final most suitable preemption task combination determination module 13 specifically includes:
  • the task combination acquisition module 25 is used to acquire the task combination most suitable for preemption in the node determined by each node;
  • the comparison and screening module 26 is used to determine the second judgment threshold according to the task combination most suitable for preemption of each node, and compare the task combination most suitable for preemption corresponding to each node in the cluster. The final task combination that is most suitable for preemption;
  • a task combination feedback module 27, configured to feed back the acquired task combination that is most suitable for preemption to the scheduler;
  • the comparison screening module 26 specifically includes:
  • the second arithmetic zero mean calculation module 28 is used to obtain the priority data of all tasks of the task combination most suitable for preemption of each node, and calculate the arithmetic mean of the priorities of all preemptible combinations;
  • the second judgment threshold determination module 29 is configured to determine a second judgment threshold according to the arithmetic mean value obtained by calculation, and the second judgment threshold is used for judging the final task combination that is most suitable for preemption in the cluster;
  • the second difference operation module 30 is configured to perform a difference operation on the arithmetic mean value of the two most suitable task combinations for preemption in the cluster to obtain the distance between the priorities of the two most suitable task combinations;
  • the second judgment threshold comparison module 31 is configured to compare the calculated distance between the priorities of the two most suitable task combinations with the second judgment threshold, and judge whether the distance between the priorities is greater than or equal to the predetermined distance. the second judgment threshold;
  • the third selection module 32 is used to select the most suitable task combination for preemption with a larger arithmetic mean value as the final most suitable task combination to be preempted when the distance between the judgment priorities is greater than or equal to the second judgment threshold;
  • the fourth selection module 33 is used to select the most suitable task combination for preemption with a lower number of tasks as the final most suitable task combination to be preempted when the distance between the judgment priorities is less than the second judgment threshold;
  • the task combination that is most suitable for preemption of all preselected nodes is compared, the task combination obtained after completion is the final most suitable task combination for preemption.
  • resource monitoring and analysis are performed on all nodes in the cluster, and all preselected nodes in the cluster are obtained;
  • the number of tasks determines the most suitable task combination for preemption in the node;
  • the most suitable task combination for preemption corresponding to each selected node is optimally sorted to obtain the final most suitable task combination for preemption, and the obtained final most suitable task combination is obtained.
  • the task combination suitable for preemption is fed back to the scheduler, so that the urgent tasks submitted in the cluster can reasonably preempt the running task resources according to the number and priority of the tasks, and enhance the stability of the cluster operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

一种基于服务器集群的调度节点的选择抢占方法及系统,涉及服务器资源分配技术领域,方法包括:对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点(S101);计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合(S102);将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的最终最适合抢占的任务组合反馈给调度器(S103),从而实现结合任务数量和优先级将集群内提交的紧急任务能够合理的抢占运行的任务资源,增强集群运行的稳定性。

Description

一种基于服务器集群的调度节点的选择抢占方法及系统
本申请要求于2020年8月7日提交中国国家知识产权局,申请号为202010788342.X,发明名称为“一种基于服务器集群的调度节点的选择抢占方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明属于服务器资源分配技术领域,尤其涉及一种基于服务器集群的调度节点的选择抢占方法及系统。
背景技术
另一种资源协调者(YetAnother Resource Negotiator,YARN)是Hadoop(hadoop是一个由Apache基金会所开发的分布式系统基础架构)中的资源管理系统,是一个通用的资源管理模块,可为各类应用程序进行资源管理和调度,其主要包括资源管理器ResourceManager、应用程序ApplicationMaster以及节点控制器NodeManager。YARN不仅限于MapReduce一种框架使用,也可以供其他框架使用,比如Spark(一种通用并行框架)等。资源管理器ResourceManager在YARN内负责集群中所有资源的统一管理和分配,它接受来自各个节点的资源汇报信息,并把这些信息按照一定的策略分配给各个应用程序(实际上是ApplicationMaster)。
在服务器正常运转过程中,当集群内接收到提交的紧急任务时,系统就默认抢占较小优先级任务的资源,但如果优先级较小的任务组合内任务数量远高于其他组合,则此时抢占该组合则容易造成集群整体任务运行不稳定。
发明内容
针对现有技术中的缺陷,本发明提供了一种基于服务器集群的调度节点的选择抢占方法,旨在解决现有技术中当集群内接收到提交的紧急任务时,系统就默认抢占较小优先级任务的资源,容易造成集群整体任务运行不稳定的问题。
本发明所提供的技术方案是:一种基于服务器集群的调度节点的选择抢占方法,所述方法包括下述步骤:
对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
作为一种改进的方案,所述对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点的步骤具体包括下述步骤:
对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
将获取到的预选节点导入资源管理器。
作为一种改进的方案,所述计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合的步骤具体包括下述步骤:
从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
根据计算得到的所述算数平均值,确定第一判断阈值,所述第一判断阈值用于判断当前节点内最适合抢占的任务组合;
依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合。
作为一种改进的方案,所述依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合的步骤具体包括下述步骤:
在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
作为一种改进的方案,所述将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器的步骤具体包括下述步骤:
获取每一个节点确定的节点内最适合抢占的任务组合;
根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
将获取到的所述最终最适合抢占的任务组合反馈给调度器。
作为一种改进的方案,所述根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合的步骤具体包括下述步骤:
获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
本发明的另一目的在于提供一种基于服务器集群的调度节点的选择抢占系统,所述系统包括:
预选节点获取模块,用于对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
节点内最适合抢占任务组合确定模块,用于计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
最终最适合抢占任务组合确定模块,用于将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
作为一种改进的方案,所述预选节点获取模块具体包括:
属性指标采集模块,用于对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
资源总量比较模块,用于将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
节点判定模块,用于当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
预选节点导入模块,用于将获取到的预选节点导入资源管理器。
作为一种改进的方案,所述节点内最适合抢占任务组合确定模块具体包括:
第一算数平均值计算模块,用于从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
第一判断阈值确定模块,用于根据计算得到的所述算数平均值,确定第一判断阈值,所述第一判断阈值用于判断当前节点内最适合抢占的任务组合;
第一两两比较模块,用于依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合;
所述第一两两比较模块具体包括:
第一差运算模块,用于在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
第一判断阈值比较模块,用于将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
第一选取模块,用于当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
第二选取模块,用于当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
作为一种改进的方案,所述最终最适合抢占任务组合确定模块具体包括:
任务组合获取模块,用于获取每一个节点确定的节点内最适合抢占的任务组合;
比较筛选模块,用于根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
任务组合反馈模块,用于将获取到的所述最终最适合抢占的任务组合反馈给调度器;
所述比较筛选模块具体包括:
第二算数零均值计算模块,用于获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
第二判断阈值确定模块,用于根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
第二差运算模块,用于在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
第二判断阈值比较模块,用于将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
第三选取模块,用于当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
第四选取模块,用于当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
在本发明实施例中,对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器,从而实现结合任务数量和优先级将集群内提交的紧急任务能够合理的抢占运行的任务资源,增强集群运行的稳定性。
附图说明
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍。在所有附图中,类似的元件或部分一般由类似的附图标记标识。附图中,各元件或部分并不一定按照实际的比例绘制。
图1是本发明提供的基于服务器集群的调度节点的选择抢占方法的实现流程图;
图2是本发明提供的对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点的实现流程图;
图3是本发明提供的计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合的实现流程图;
图4是本发明提供的依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合的实现流程图;
图5是本发明提供的将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器的实现流程图;
图6是本发明提供的根据每一个节点的最适合抢占的任务组合,确定第二 判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合的实现流程图;
图7是本发明提供的基于服务器集群的调度节点的选择抢占系统的结构框图;
图8是本发明提供的预选节点获取模块的结构框图;
图9是本发明提供的节点内最适合抢占任务组合确定模块的结构框图;
图10是本发明提供的第一两两比较模块的结构框图;
图11是本发明提供的最终最适合抢占任务组合确定模块的结构框图;
图12是本发明提供的比较筛选模块的结构框图。
具体实施方式
下面将结合附图对本发明技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本发明的、技术方案,因此只作为示例,而不能以此来限制本发明的保护范围。
图1是本发明提供的基于服务器集群的调度节点的选择抢占方法的实现流程图,其具体包括下述步骤:
在步骤S101中,对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
在步骤S102中,计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
在步骤S103中,将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
本发明实施例主要通过改进YARN调度器的抢占方式,在评估抢占引发的损失阶段,将被抢占任务数量与优先级考虑在内,保证集群内提交的紧急任务(优先级较高的任务)能够更加合理的抢占其他运行中的任务资源,避免由于单一的抢占机制导致的任务损失过大的问题。调度器将根据优先级与任务数量 两个因素构建决策模型来决定最后抢占哪个节点的资源,确保在不同节点任务优先级相差不大的情况下,调度器能优先抢占较少数量的任务,从而增强集群整体的任务运行稳定性。
如图2所示,对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点的步骤具体包括下述步骤:
在步骤S201中,对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
在该步骤中,该属性指标例如:实时采集集群节点1所拥有的总资源量(总内存量为64G、CPU总核数为6个)和节点1资源空闲率(节点未使用资源占节点总资源的百分比)等信息。
在步骤S202中,将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
在该步骤中,提交任务为网络服务,申请的任务资源为10G内存,若集群的节点1资源总量为20G(大于申请资源),则表示满足紧急任务的资源需求,可以认定节点1为预选节点,若节点1总资源为7G内存,则判断该节点为非预选节点。
在步骤S203中,当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
在步骤S204中,将获取到的预选节点导入资源管理器,即将包含节点1的预选节点名单直接载入资源管理器(RM)。
如图3所示,计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合的步骤具体包括下述步骤:
在步骤S301中,从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
在步骤S302中,根据计算得到的所述算数平均值,确定第一判断阈值, 所述第一判断阈值用于判断当前节点内最适合抢占的任务组合;
在该步骤中,假设节点1上有两个可抢占组合,具体为:组合甲包括任务1、2,组合乙包括任务3、4、5,将其中任意单个组合抢占都可满足紧急任务的资源需求;
经计算组合甲两个任务的平均优先级为5,乙的平均优先级为3,选择较大的平均优先级的70%为第一判断阈值,也就是3.5(5*0.7)。
在步骤S303中,依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合。
在该步骤中,例如上述,甲组合与乙组合的平均优先级差值为2,小于阈值3.5,故判定甲组合更适合被抢占。
在该实施例中,如图4所示,依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合的步骤具体包括下述步骤:
在步骤S401中,在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
在步骤S402中,将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
在步骤S403中,当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
在步骤S404中,当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
其中,组合甲有2个任务,组合乙有3个任务,组合甲与组合乙的平均优先级差值为2,小于阈值3.5,故判定组合甲更适合被抢占。
如图5所示,将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器的步骤具体包括下述步骤:
在步骤S501中,获取每一个节点确定的节点内最适合抢占的任务组合;
在步骤S502中,根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
在步骤S503中,将获取到的所述最终最适合抢占的任务组合反馈给调度器。
在该实施例中,假设集群有node1、node2、node3三个预选节点,node1节点最优组合的平均优先级为5,组合任务数为3;node3节点平均优先级为7,组合任务数为2;node2节点平均优先级为8,组合任务数为3。先将node2节点与node3节点比较,判定node3更优。然后,将node3节点与node1节点进行比较,判定node3节点为集群最优节点。此时,将node3节点上的hdfs和mysql服务所占用15G内存资源放掉,然后将该15G内存分配给紧急任务(如网络服务等)。
如图6所示,根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合的步骤具体包括下述步骤:
在步骤S601中,获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
在步骤S602中,根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
在步骤S603中,在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
在步骤S604中,将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
在步骤S605中,当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
在步骤S606中,当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
该实施例的实现过程与上述图4所示的方案相近,在此不再赘述。
在本发明实施例中,通过监控系统采集节点资源状况等信息并识别适合抢占的预选节点;根据可抢占任务组合的优先级和任务数量对所有组合进行排序,从而选择最适合抢占的组合并发送给YARN调度器;根据所有预选节点最优组合的优先级和任务数量,选择出最适合抢占的调度节点并将该节点的最优组合发送给YARN调度器,从而尽可能保证集群上抢占较少数量的任务,增强集群整体的任务运行稳定性。
图7示出了本发明提供的基于服务器集群的调度节点的选择抢占系统的结构框图,为了便于说明,图中仅给出了与本发明实施例相关的部分。
基于服务器集群的调度节点的选择抢占系统包括:
预选节点获取模块11,用于对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
节点内最适合抢占任务组合确定模块12,用于计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
最终最适合抢占任务组合确定模块13,用于将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
其中,如图8所示,预选节点获取模块11具体包括:
属性指标采集模块14,用于对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
资源总量比较模块15,用于将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
节点判定模块16,用于当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
预选节点导入模块17,用于将获取到的预选节点导入资源管理器。
如图9所示,节点内最适合抢占任务组合确定模块12具体包括:
第一算数平均值计算模块18,用于从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
第一判断阈值确定模块19,用于根据计算得到的所述算数平均值,确定第一判断阈值,所述第一判断阈值用于判断当前节点内最适合抢占的任务组合;
第一两两比较模块20,用于依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合;
如图10所示,第一两两比较模块20具体包括:
第一差运算模块21,用于在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
第一判断阈值比较模块22,用于将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
第一选取模块23,用于当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
第二选取模块24,用于当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
如图11所示,所述最终最适合抢占任务组合确定模块13具体包括:
任务组合获取模块25,用于获取每一个节点确定的节点内最适合抢占的任务组合;
比较筛选模块26,用于根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
任务组合反馈模块27,用于将获取到的所述最终最适合抢占的任务组合反馈给调度器;
如图12所示,比较筛选模块26具体包括:
第二算数零均值计算模块28,用于获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
第二判断阈值确定模块29,用于根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
第二差运算模块30,用于在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
第二判断阈值比较模块31,用于将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
第三选取模块32,用于当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
第四选取模块33,用于当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
其中,上述各个模块的功能如上述方法实施例所记载,在此不再赘述。
在本发明实施例中,对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器,从而实现结合任务数量和优先级将集群内提交的紧急任务能够合理的抢占运行的任务资源,增强集群运行的稳定性。
以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围,其均应涵盖在本发明的权利要求和说明书的范围当中。

Claims (10)

  1. 一种基于服务器集群的调度节点的选择抢占方法,其特征在于,所述方法包括下述步骤:
    对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
    计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
    将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
  2. 根据权利要求1所述的基于服务器集群的调度节点的选择抢占方法,其特征在于,所述对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点的步骤具体包括下述步骤:
    对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
    将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
    当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
    将获取到的预选节点导入资源管理器。
  3. 根据权利要求2所述的基于服务器集群的调度节点的选择抢占方法,其特征在于,所述计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合的步骤具体包括下述步骤:
    从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
    根据计算得到的所述算数平均值,确定第一判断阈值,所述第一判断阈值 用于判断当前节点内最适合抢占的任务组合;
    依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合。
  4. 根据权利要求3所述的基于服务器集群的调度节点的选择抢占方法,其特征在于,所述依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合的步骤具体包括下述步骤:
    在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
    将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
    当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
    当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
    其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
  5. 根据权利要求4所述的基于服务器集群的调度节点的选择抢占方法,其特征在于,所述将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器的步骤具体包括下述步骤:
    获取每一个节点确定的节点内最适合抢占的任务组合;
    根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
    将获取到的所述最终最适合抢占的任务组合反馈给调度器。
  6. 根据权利要求5所述的基于服务器集群的调度节点的选择抢占方法,其特征在于,所述根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合的步骤具体包括下述步骤:
    获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
    根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
    在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
    将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
    当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
    当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
    其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
  7. 一种基于服务器集群的调度节点的选择抢占系统,其特征在于,所述系统包括:
    预选节点获取模块,用于对集群内的所有节点进行资源监控和分析,获取集群内所有的预选节点;
    节点内最适合抢占任务组合确定模块,用于计算每一个所述预选节点的所有可抢占组合优先级的算数平均值,并结合组合内任务数量确定节点内最适合抢占的任务组合;
    最终最适合抢占任务组合确定模块,用于将选取到的每一个节点对应的最适合抢占的任务组合进行最优排序,获取最终最适合抢占的任务组合,并将获取到的所述最终最适合抢占的任务组合反馈给调度器。
  8. 根据权利要求7所述的基于服务器集群的调度节点的选择抢占系统,其特征在于,所述预选节点获取模块具体包括:
    属性指标采集模块,用于对集群内的所有节点的属性指标进行采集,所述节点的属性指标包括节点所拥有的资源总量和节点上未被占用的资源总量;
    资源总量比较模块,用于将集群内所有节点的节点所拥有的资源总量与紧急任务申请的资源总量进行比较,判断所述节点是否为预选节点;
    节点判定模块,用于当节点所拥有的资源总量大于等于紧急任务申请的资源总量时,判定当前当前节点为预选节点,否则判定为非预选节点;
    预选节点导入模块,用于将获取到的预选节点导入资源管理器。
  9. 根据权利要求8所述的基于服务器集群的调度节点的选择抢占系统,其特征在于,所述节点内最适合抢占任务组合确定模块具体包括:
    第一算数平均值计算模块,用于从预选节点中获取所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
    第一判断阈值确定模块,用于根据计算得到的所述算数平均值,确定第一判断阈值,所述第一判断阈值用于判断当前节点内最适合抢占的任务组合;
    第一两两比较模块,用于依据确定的所述第一判断阈值,将当前预选节点内的所有可抢占组合进行两两比较,选出最适合抢占的任务组合;
    所述第一两两比较模块具体包括:
    第一差运算模块,用于在当前预选节点内,将其中两个任务组合的算数平均值进行做差运算,得到其中两个任务组合的优先级之间的距离;
    第一判断阈值比较模块,用于将计算得到的两个任务组合的优先级之间的距离与所述第一判断阈值进行比较,判断优先级之间的距离是否大于等于所述第一判断阈值;
    第一选取模块,用于当判定优先级之间的距离大于等于所述第一判断阈值时,则选取算数平均值较大的任务组合为更适合被抢占的任务组合;
    第二选取模块,用于当判定优先级之间的距离小于所述第一判断阈值时,则选取任务数量较低的任务组合作为更适合被抢占的任务组合;
    其中,当所有的任务组合均比较完成后得到的任务组合即最适合抢占的任务组合。
  10. 根据权利要求7所述的基于服务器集群的调度节点的选择抢占,其特征在于,所述最终最适合抢占任务组合确定模块具体包括:
    任务组合获取模块,用于获取每一个节点确定的节点内最适合抢占的任务组合;
    比较筛选模块,用于根据每一个节点的最适合抢占的任务组合,确定第二判断阈值,并将集群内的每一个节点对应的最适合抢占的任务组合进行两两比较,选出集群中最终最适合抢占的任务组合;
    任务组合反馈模块,用于将获取到的所述最终最适合抢占的任务组合反馈给调度器;
    所述比较筛选模块具体包括:
    第二算数零均值计算模块,用于获取每一个节点的最适合抢占的任务组合的所有任务的优先级数据,并计算所有可抢占组合优先级的算数平均值;
    第二判断阈值确定模块,用于根据计算得到的所述算数平均值,确定第二判断阈值,所述第二判断阈值用于判断集群内最终最适合抢占的任务组合;
    第二差运算模块,用于在集群内,将其中两个最适合抢占的任务组合的算数平均值进行做差运算,得到其中两个最适合抢占的任务组合的优先级之间的距离;
    第二判断阈值比较模块,用于将计算得到的两个最适合抢占的任务组合的优先级之间的距离与所述第二判断阈值进行比较,判断优先级之间的距离是否大于等于所述第二判断阈值;
    第三选取模块,用于当判定优先级之间的距离大于等于所述第二判断阈值时,则选取算数平均值较大的最适合抢占的任务组合为最终最适合被抢占的任务组合;
    第四选取模块,用于当判定优先级之间的距离小于所述第二判断阈值时,则选取任务数量较低的最适合抢占的任务组合作为最终最适合被抢占的任务组合;
    其中,当所有预选节点的最适合抢占的任务组合均比较完成后得到的任务组合即最终最适合抢占的任务组合。
PCT/CN2021/096403 2020-08-07 2021-05-27 一种基于服务器集群的调度节点的选择抢占方法及系统 WO2022028059A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010788342.XA CN112015549B (zh) 2020-08-07 2020-08-07 一种基于服务器集群的调度节点的选择抢占方法及系统
CN202010788342.X 2020-08-07

Publications (1)

Publication Number Publication Date
WO2022028059A1 true WO2022028059A1 (zh) 2022-02-10

Family

ID=73500125

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096403 WO2022028059A1 (zh) 2020-08-07 2021-05-27 一种基于服务器集群的调度节点的选择抢占方法及系统

Country Status (2)

Country Link
CN (1) CN112015549B (zh)
WO (1) WO2022028059A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666334A (zh) * 2022-04-28 2022-06-24 深圳嘉业产业发展有限公司 一种节点管理方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015549B (zh) * 2020-08-07 2023-01-06 苏州浪潮智能科技有限公司 一种基于服务器集群的调度节点的选择抢占方法及系统
CN114466022B (zh) * 2021-12-31 2023-07-21 苏州浪潮智能科技有限公司 一种集群中获取服务器种子节点的方法、装置及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070044102A1 (en) * 2005-08-22 2007-02-22 Andrea Casotto A Method and System for Performing Fair-Share Preemption
CN105847891A (zh) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 一种资源抢占方法和装置
CN108769254A (zh) * 2018-06-25 2018-11-06 星环信息科技(上海)有限公司 基于抢占式调度的资源共享使用方法、系统及设备
CN109213594A (zh) * 2017-07-06 2019-01-15 阿里巴巴集团控股有限公司 资源抢占的方法、装置、设备和计算机存储介质
CN110134521A (zh) * 2019-05-28 2019-08-16 北京达佳互联信息技术有限公司 资源分配的方法、装置、资源管理器及存储介质
CN112015549A (zh) * 2020-08-07 2020-12-01 苏州浪潮智能科技有限公司 一种基于服务器集群的调度节点的选择抢占方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104320854B (zh) * 2014-10-21 2018-03-06 中国联合网络通信集团有限公司 资源调度方法及装置
CN110597621A (zh) * 2019-08-09 2019-12-20 苏宁金融科技(南京)有限公司 集群资源的调度方法与系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070044102A1 (en) * 2005-08-22 2007-02-22 Andrea Casotto A Method and System for Performing Fair-Share Preemption
CN105847891A (zh) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 一种资源抢占方法和装置
CN109213594A (zh) * 2017-07-06 2019-01-15 阿里巴巴集团控股有限公司 资源抢占的方法、装置、设备和计算机存储介质
CN108769254A (zh) * 2018-06-25 2018-11-06 星环信息科技(上海)有限公司 基于抢占式调度的资源共享使用方法、系统及设备
CN110134521A (zh) * 2019-05-28 2019-08-16 北京达佳互联信息技术有限公司 资源分配的方法、装置、资源管理器及存储介质
CN112015549A (zh) * 2020-08-07 2020-12-01 苏州浪潮智能科技有限公司 一种基于服务器集群的调度节点的选择抢占方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666334A (zh) * 2022-04-28 2022-06-24 深圳嘉业产业发展有限公司 一种节点管理方法及系统
CN114666334B (zh) * 2022-04-28 2024-01-26 深圳嘉业共创供应链管理有限公司 一种节点管理方法及系统

Also Published As

Publication number Publication date
CN112015549B (zh) 2023-01-06
CN112015549A (zh) 2020-12-01

Similar Documents

Publication Publication Date Title
WO2022028059A1 (zh) 一种基于服务器集群的调度节点的选择抢占方法及系统
US11132224B2 (en) Fine granularity real-time supervision system based on edge computing
CN111950988B (zh) 分布式工作流调度方法、装置、存储介质及电子设备
CN103838621B (zh) 用于调度例行作业的方法和系统、调度节点
CN110213363B (zh) 基于软件定义网络的云资源动态分配系统及方法
Zhang et al. The real-time scheduling strategy based on traffic and load balancing in storm
CN107193655B (zh) 一种基于效用函数的面向大数据处理的公平资源调度方法
CN110740079B (zh) 一种面向分布式调度系统的全链路基准测试系统
WO2017092377A1 (zh) 一种移动通信系统内动态资源分配方法和装置
JP2008152618A (ja) ジョブ割当プログラム、方法及び装置
US11438271B2 (en) Method, electronic device and computer program product of load balancing
Ahmad et al. Performance prediction of distributed load balancing on multicomputer systems
US8972579B2 (en) Resource sharing in computer clusters according to objectives
CN112148381A (zh) 一种基于软件定义的边缘计算优先级卸载决策方法及系统
Shukla et al. Fault tolerance based load balancing approach for web resources in cloud environment.
CN113242304B (zh) 边缘侧多能源数据采集调度控制方法、装置、设备和介质
CN114077486A (zh) 一种MapReduce任务调度方法及系统
CN117234733A (zh) 一种分布式系统任务分配方法、系统、存储介质及设备
WO2020082518A1 (zh) 一种识别带宽需求突发的方法和装置
CN107360483B (zh) 一种用于软件定义光网络的控制器负载均衡算法
CN110928649A (zh) 资源调度的方法和装置
Kargahi et al. Utility accrual dynamic routing in real-time parallel systems
CN112948092A (zh) 批量作业的调度方法、装置、电子设备及存储介质
Sun et al. Quality of service of grid computing: resource sharing
Wei et al. Topology-aware task allocation for distributed stream processing with latency guarantee

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21852504

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21852504

Country of ref document: EP

Kind code of ref document: A1