CN117369941A - Pod scheduling method and system - Google Patents

Pod scheduling method and system Download PDF

Info

Publication number
CN117369941A
CN117369941A CN202311181421.4A CN202311181421A CN117369941A CN 117369941 A CN117369941 A CN 117369941A CN 202311181421 A CN202311181421 A CN 202311181421A CN 117369941 A CN117369941 A CN 117369941A
Authority
CN
China
Prior art keywords
node
scheduling
pod
index
utilization rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311181421.4A
Other languages
Chinese (zh)
Inventor
郎高一
刘垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Technologies Co Ltd
Original Assignee
New H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd filed Critical New H3C Technologies Co Ltd
Priority to CN202311181421.4A priority Critical patent/CN117369941A/en
Publication of CN117369941A publication Critical patent/CN117369941A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a Pod scheduling method and system, which are used for solving the technical defects of low scheduling efficiency and insufficient index dimension of the existing Pod scheduling scheme. The invention expands and registers a scheduling plug-in a K8S scheduler, the scheduling plug-in calculates the scheduling score of each node based on the collected index data and index weight of each node, and the scheduler performs Pod scheduling according to the scheduling score calculated by the scheduling plug-in, so that the Pod is scheduled to the node with lower actual resource utilization rate. According to the technical scheme, the use of an externally hung load equalizer is avoided, multi-dimensional index data is supported, and the efficiency and the balance of scheduling Pod can be improved.

Description

Pod scheduling method and system
Technical Field
The invention relates to the technical field of cloud computing, in particular to a Pod scheduling method and system.
Background
Kubernetes (K8S for short) provides a basic platform for container microservices, removes the burden of orchestrating physical/virtual computing, networking and storage infrastructure, and enables application operators and developers to fully focus on container-centric self-service operations, so that the use of Kubernetes in various fields is becoming more and more common. However, in the practical application process, the resource utilization rate of each node in the Kubernetes cluster is found to be unbalanced, so that the service Pod on the node with very high load has the risk of unstable operation, and the node resource with very low load is wasted in a large amount. Therefore, how to implement balanced scheduling of service Pod on each node, so that resources of each node in the cluster can be efficiently utilized becomes more and more important.
Starting from the Kubernetes version 1.16, the K8S scheduling framework (Scheduling Framework) has an expansion mechanism, and a user can customize and develop each stage in the graph according to the needs to realize a customized scheduling plug-in.
The K8S community also provides some common scheduling plugins, such as a real load-based K8S scheduling plugin Trimaran, which consists of a Metrics Provider (Metrics Provider), load detector (load-watch), database, and scheduling plugins. Trimaran aggregates target load packaging (TargetLoadPackling) scheduling plug-ins and load variation risk balance (LoadVariationRiskBalaning) scheduling plug-ins.
TargetLoadPacking: the method is characterized in that the nodes are scored according to the actual resource utilization rate of the nodes, the scoring algorithm is actually a knapsack problem in mathematics, and the best matching (best fit) approximation algorithm (only supporting CPU resources) is adopted.
LoadVariationRiskBalancing: it orders the nodes according to the average and standard deviation of the node resource utilization (only supporting CPU and memory resources).
A load detector (load-watch) of the Trimaran scheduling plug-in acquires resource usage indexes such as CPU, memory and the like in the cluster through a monitoring index Provider (Metrics Provider).
The Trimaran scheduling plugin has certain limitations:
1. it may conflict with the K8S default scoring plugin node resource minimum allocation (noderesponesleastallocated) plugin and node resource balanced allocation (noderesponesbasland allocation) plugin, so the default scoring plugin needs to be turned off when in use.
2. The node calculates the index dimension of the scoring without considering the utilization rate of important resources such as memory, network, disk and the like.
3. The cluster ideal value needs to be preset, and obvious service peaks and valleys exist in the actual production environment, so that a fixed ideal value cannot be configured.
Disclosure of Invention
In view of the above, the present invention provides a Pod scheduling method and system, which are used for solving the technical defects of low scheduling efficiency and insufficient index dimension of the existing Pod scheduling scheme.
Based on an aspect of the embodiment of the present invention, the present invention provides a Pod scheduling system, which includes:
the monitoring service is used for monitoring each Node and container in the cluster and providing a query interface of index data of each monitored Node and container;
the node data collector is used for collecting index data of each node in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service;
the container data collector is used for collecting index data of each container in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service;
and the scheduling plug-in is used for scheduling and scoring each node according to index data of each node and the container provided by the monitoring service, the scheduling score is used for carrying out balanced scheduling on Pod in the cluster by a scheduler of Kubernets, and the scheduling score represents the comprehensive score of unused resources on the node.
Further, the scheduling plug-in includes:
the first data acquisition sub-module is used for acquiring index data of each node from the monitoring service; the index data comprise CPU utilization rate, memory utilization rate, disk I/O utilization rate and work load rate of each node in the cluster, and are provided in a mode of collecting period average value;
the node scheduling score is 1, and the comprehensive index utilization rate of the node is the sum of products of various index data average values of the node and the index weight of the index in the percentage of the total index weight.
Further, the scheduling plug-in further includes:
and the score adjustment sub-module is used for acquiring the newly increased Pod number of each node in a preset period from the monitoring service, and reducing the scheduling score of the node when the newly increased Pod number of the node is larger than a preset adjustment threshold.
Further, the monitoring service acquires the number of the Pod newly added by each node in a preset period by monitoring the number of Pod binding events generated on the node.
Further, the system further comprises:
and the balanced scheduling controller is used for acquiring index data of each node from the monitoring service, monitoring the comprehensive index utilization rate of each node in the cluster in real time, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the nodes with the difference value exceeding a preset eviction threshold.
Further, the balanced scheduling controller includes:
a second data acquisition sub-module for acquiring index data from the monitoring service;
and the balanced scheduling sub-module is used for calculating the comprehensive index utilization rate of each node in the cluster according to the acquired index data, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the node with the difference value exceeding a preset eviction threshold.
Further, before performing Pod eviction, the balanced scheduling sub-module filters pods in the node, which meet screening conditions, where the screening conditions include:
pod with node binding is not evicted;
the daemon type Pod does not evict;
the system level Pod does not evict;
the manifest configuration file marks the eviction level and the Pod that is evicted by priority does not evict.
Based on another aspect of the embodiment of the present invention, the present invention further provides a Pod scheduling method, where the method includes:
collecting index data of each node and container in the cluster; the index data comprise CPU utilization rate, memory utilization rate, disk I/O utilization rate and work load rate of each node in the cluster, and are provided in a mode of collecting period average value;
scheduling and scoring the nodes according to the collected index data of each node and container, wherein the scheduling and scoring is used for carrying out balanced scheduling on Pod in a cluster by a scheduler of Kubernets, and the scheduling and scoring represents the comprehensive scoring of unused resources on the nodes;
the scheduling score of a node is the difference value between 1 and the comprehensive index utilization rate of the node, and the comprehensive index utilization rate of the node is the sum of products of various index data average values of the node and the percentage of the index weight of the index to the total index weight.
Further, the method further comprises: and acquiring the newly increased Pod number of each node in a preset period from the monitoring service, and reducing the scheduling score of the node when the newly increased Pod number of the node is larger than a preset adjustment threshold.
Further, the method further comprises: and acquiring index data of each node from the monitoring service, monitoring the comprehensive index utilization rate of each node in the cluster in real time, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the nodes with the difference value exceeding a preset eviction threshold.
The invention also provides electronic equipment, which comprises a processor, a communication interface, a storage medium and a communication bus, wherein the processor, the communication interface and the storage medium are communicated with each other through the communication bus;
a storage medium storing a computer program;
and the processor is used for executing the computer program stored on the storage medium to implement the Pod scheduling method.
The invention expands and registers a scheduling plug-in a K8S scheduler, the scheduling plug-in calculates the scheduling score of each node based on the collected index data and index weight of each node, and the scheduler performs Pod scheduling according to the scheduling score calculated by the scheduling plug-in, so that the Pod is scheduled to the node with lower actual resource utilization rate. According to the technical scheme, the use of an externally hung load equalizer is avoided, multi-dimensional index data is supported, and the efficiency and the balance of scheduling Pod can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly describe the drawings required to be used in the embodiments of the present invention or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings of the embodiments of the present invention for a person having ordinary skill in the art.
Fig. 1 is a schematic structural diagram of a Pod scheduling system in Kubernets according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device for implementing the Pod scheduling method according to an embodiment of the present invention.
Detailed Description
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used in this embodiment of the invention, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present invention to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one from another or similar information, entity or step, but not to describe a particular sequence or order. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present invention. Furthermore, the word "if" as used may be interpreted as "at … …" or "at … …" or "in response to a determination". The "and/or" in the present invention is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present invention, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
The Kubernetes is a container cluster management system, and can realize functions of automatic deployment, automatic capacity expansion and contraction, maintenance and the like of container clusters. Node is the working Node in Kubernetes. One Node may be a virtual machine VM or a physical machine. Each Node has some of the necessary services to run the pod and is managed by the Master component. Pod is the smallest resource management component that can be created and managed in Kubernetes, pod is also the resource object that minimizes running containerized applications, one Pod represents one process running in a cluster, and one Pod can run one or more containers. Most other components in Kubernetes support and extend the Pod functionality around the Pod.
The Kubernetes native scheduler (kube scheduler) is responsible for scheduling containerized applications onto the appropriate nodes Node in the Kubernetes cluster. And the Kube scheduler selects an optimal node for scheduling according to a scheduling strategy configured by a user and in combination with the use condition of node resources and the requirements of application programs. The scheduler is one of the core components of Kubernetes, which makes scheduling decisions in real time by monitoring Pod and node state information on the API server.
The inventor provides a Pod scheduling scheme for balancing the resource utilization rate of a K8S node in order to solve the technical defects of conflict with a K8S default scoring plug-in, insufficient index dimension and the like in the existing Pod scheduling scheme by analyzing a K8S resource scheduling principle and an open source scheduling plug-in. The core idea of the invention is that: and expanding and registering a scheduling plug-in a K8S scheduler, calculating the scheduling score of each node by the scheduling plug-in based on the acquired index data and index weight of each node, performing Pod scheduling by the scheduler according to the scheduling score calculated by the scheduling plug-in, and scheduling the Pod to the node with lower actual resource utilization rate. The technical scheme provided by the invention improves the Pod scheduling method of the scheduler by utilizing the expansion characteristic of the K8S scheduler, does not use a plug-in load equalizer, and can avoid the problems of multiple Pod restarting and increased system complexity caused by using the plug-in load equalizer. According to the technical scheme provided by the invention, the scheduling plug-in supports multi-dimensional index data, a scoring model can be established based on the node index and the Pod related index, and the scheduler performs Pod scheduling according to the scheduling score which is calculated by the newly added scheduling plug-in and contains rich index dimensions, so that the resource utilization rate of each node after the Pod is scheduled is more reasonable and balanced.
Based on the basic idea of the invention, it is to be explained that the steps shown in the flowcharts of the figures can be performed in a computer system such as a set of computer executable instructions, and that, although a logical order is shown in the flowcharts, in some cases the steps shown or described can be performed in an order different from that here.
Fig. 1 is a schematic structural diagram of a Pod scheduling system in Kubernets according to an embodiment of the present invention, where the Kubernets scheduling system mainly includes the following components:
and the monitoring service is used for monitoring each Node and container in the K8S cluster and providing a query interface of index data of each monitored Node and container. The monitoring service may use a multidimensional time series data model to store index data for monitored nodes and containers, providing flexible data querying and aggregation operations. Prometaus is an open source monitoring service, which is an example of the monitoring service in the present invention. The monitoring service (promethaus) obtains index data of the nodes and containers from a Node data collector (Node exporter) and a container data collector (cadivisor), respectively. The monitoring service can filter and screen the collected index data, perform exception handling or remedy on the bad index data, enable the index data to be persistent locally, realize high availability HA among clusters, and enable quick recovery when part of node data are lost. The monitoring service may provide a configuration interface for historical index data periods, and the monitoring service may cache collected index data according to configured historical index data periods (e.g., 15 minutes, 1 hour, 1 day, 1 week, 1 month, etc.).
The node data collector is used for collecting index data of each node in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service. For example, node exporter may serve as a Node data collector to provide Node-related index data for monitoring service Prometaheus. The Node exporter is a Node data collector of Prometheus, and is responsible for collecting index data of nodes and converting the index data into a time sequence data format supported by Prometheus.
And the container data collector is used for collecting the index data of each container in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service. The container index data includes container resource usage and performance statistics, etc. For example, cAdvisor is one of the open source components of Kubernetes, which is a daemon for monitoring and collecting container resource usage and performance statistics in the container operating environment Docker. The container-related index data may be provided to the monitoring service Prometaheus using cAdvisor as a container data collector. The cAdvisor can automatically discover a new container generated during the running of the container and collect index data such as CPU, memory, network, disk I/O and system load related to the container. And simultaneously, the method can also provide indexes such as CPU, memory and the like, CPU memory utilization rate per second, network transmit/receive (Tx/Rx) and other flow information when the container is started. The use of the cAdvisor can provide real-time monitoring of container resource usage for Kubernetes, and key indicators for functions such as management and allocation of automated container resources.
And the scheduling plug-in (Scheduler plugins) is used for scheduling and scoring each node according to index data of each node and container provided by the monitoring service, and the scheduling and scoring is used for carrying out balanced scheduling on Pod in the cluster by a K8S scheduler. The scheduling plugin is a newly added plugin, and the scheduling plugin is registered and loaded into a Scheduler (kube Scheduler) of the K8S by using an expansion mechanism of a scheduling framework (Scheduler-frame) of the K8S, and the Scheduler realizes balanced scheduling of Pod in the K8S cluster according to the scheduling scores of all nodes calculated by the scheduling plugin.
The scheduling plug-in mainly comprises two sub-modules:
and the first data acquisition sub-module is used for acquiring index data of each node from a monitoring service (Prometaus). The module may obtain, from the monitoring service, index data for each node via a timed task for use in the calculation of a next node scheduling score, where the index data may include CPU usage, memory usage, disk I/O usage, workload rate, etc. for each node in the cluster. The index data may be provided in the form of an average of the acquisition period, which may be user configurable, for example, to 15 minutes, 1 hour, 1 day, etc.
And the score calculation sub-module is used for calculating the scheduling score of the node according to the acquired index data and the index weight. The module provides the calculated scheduling score to a K8S Scheduler (kube Scheduler), and the K8S Scheduler performs Pod balanced scheduling based on the scheduling score of each node and schedules the Pod to the node with lower actual resource utilization rate.
The following is the ith node in the cluster i For example, how the scheduling plug-in calculates the node's scheduling score:
s11, acquiring a node by a first data acquisition submodule i Index data means reflecting the usage of the used resources over a specified acquisition period (e.g., 15 minutes, 1 hour, 1 day, etc.). The average value of various index data of the node comprises, but is not limited to, the average value node of CPU utilization rate i Cpu (statistical average value of node CPU utilization rate in preset acquisition period), and memory utilization rate average value node i Mem (statistical average value of node memory utilization rate in preset acquisition period), disk I/O utilization rate average value node i DiskIo (statistical average value of disk I/O utilization rate of node in preset acquisition period), and workload rate average node i load (statistical average of operating system load rate of node in preset acquisition period).
S12, calculating a node according to the acquired index data mean value and a preset index weight i Comprehensive index utilization N of (2) i The method comprises the steps of carrying out a first treatment on the surface of the The comprehensive index utilization rate of the ith node is the sum of products of the average value of various index data of the node and the percentage of the index weight of the index to the total index weight.
The comprehensive index utilization rate of a node characterizes the comprehensive utilization rate of various resources on the node, and can be regarded as the comprehensive score of the used resources on the node.
S13, calculating the comprehensive index residual rate according to the comprehensive index utilization rate, and taking the comprehensive index residual rate as a node i I.e. the scheduling score of a node is 1 and the difference in the comprehensive index usage of that node.
The scheduling score of a node characterizes the composite score of unused resources on that node, with higher scheduling scores indicating that the lower the composite load of that node, the higher the priority of scheduling Pod to that node.
score i =100×(1-N i ) (arithmetic two)
score i For the scheduling score of the ith node, cpuWeight, memWeight, diskIoWeight, loadWeight is the index weight of the CPU, the memory, the disk I/O and the workload, respectively, and the index weight is configurable.
In an embodiment of the present invention, considering that a large number of Pod may be scheduled in batches in a short time during actual production, for example, the node host computer may restart to cause batch Pod migration due to a failure, and the index obtained by promethaus is inaccurate at this time. For this case, the number of Pod to be scheduled needs to be counted, and a certain safeguard measure is provided for the node receiving Pod scheduling in the cluster, for example, a buffer or fusing protection measure is provided.
In order to implement the above node safeguard measure, in an embodiment of the present invention, the scheduling plugin further includes:
a score adjustment sub-module for acquiring node from monitoring service i The number of Pod newly increased in the preset period, and when the number of Pod newly increased by the node is larger than the preset adjustment threshold, the node of the node is reduced i Is a scheduling score for (a). The main purpose of the score adjustment sub-module is to perform special treatment (hot spot score for short) on the scheduling score of the hot spot node so as to reduce the scheduling score of the hot spot node, thereby avoiding the situation that a large number of Pods are scheduled to the same node in a short time and the local overheat load is unbalanced.
The method for reducing the dispatch score of the overheated node can be that the calculated dispatch isSubtracting a preset adjustment value on the basis of the degree score. For example, in an embodiment of the present invention, the container data collector cAdvisor may collect data related to a Pod binding node, where when a Pod is bound to a certain node, the node may be considered as a new Pod. The monitoring service Prometheus acquires time sequence data of the newly added Pod of the node from the cAdvisor in a pull mode and stores related data according to persistence. The score adjustment sub-module obtains the node from Prometheus i Newly increasing M Pods within a preset period (for example, within the past N minutes, which is configurable), and exceeding a preset adjustment threshold M (which is configurable), the node calculated by the score calculation sub-module i Subtracting a preset adjustment value (e.g., 10%) from the scheduling score of (a) to avoid the occurrence of a short time to schedule Pod to node in large amounts i The node causes the problem of failure of scheduling equalization.
A Pod is bound to a Node from generation and is mainly divided into two large periods, namely a scheduling period and a binding period, wherein the previous Node scheduling scoring process is completed in the scheduling period, after the Pod is completed in the scheduling period, the Pod is bound to the Node next, at the moment, K8S can generate a binding event (binding event) of the Pod, the binding event of the Pod can be monitored, the number of the binding events of the Pod newly added in the Node within 1 minute can be counted, if a preset adjustment threshold is exceeded, the preset adjustment value is subtracted from the current scheduling score of the Node, so that the Node scheduling priority is influenced, and the situation that a large number of Pods are scheduled in batches in a short time is avoided.
When the service container is deployed and installed, the service container is scheduled through node actual resource utilization rate and hot spot scoring, so that unbalance caused by the fact that Pod resource allocation (Pod request) is larger is avoided as much as possible. When Pod in the cluster runs for a long time, the actual resource usage of Pod may rise, so that the resource usage of individual nodes is unbalanced.
In an embodiment of the present invention, in order to realize the balance of the actual resource usage rate of the nodes in the K8s cluster, an balance scheduling controller (Balance Scheduler Controller) is further provided, where the balance scheduling controller monitors the comprehensive index usage rate of each node in real time, and when it is observed that the comprehensive index usage rate of one or some nodes is higher or the node fails, the balance scheduling controller can automatically intervene to expel the Pod on the nodes, that is, migrate the Pod on the node to other nodes.
The balance scheduling controller is connected with the monitoring service, acquires index data of each node from the monitoring service, monitors the comprehensive index utilization rate of each node in the cluster in real time, and performs Pod balance scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster.
The balanced scheduling controller may further include two sub-modules, where the two sub-modules cooperate to perform balanced scheduling of Pod in the cluster, and the two sub-modules are:
and a second data acquisition sub-module for acquiring index data from a monitoring service (Prometaus). The module may obtain the node index data from the monitoring service via a timed task. The index data may be provided by means of a collection period mean.
And the balanced scheduling sub-module is used for calculating the cluster and the comprehensive index utilization rate of each node in the cluster according to the acquired index data, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the node with the difference value exceeding a preset eviction threshold.
The method for carrying out Pod balanced scheduling by the balanced scheduling controller according to the acquired index data comprises the following steps:
s21, a second data acquisition submodule acquires index data of each node in the cluster, and comprehensive index utilization rate N of each node is calculated i
Referring to step S11, the second data acquisition sub-module acquires a node i Index data means reflecting the usage of the used resources within a given acquisition period. The index data mean includes, but is not limited to, CPU usage mean node i Cpu, memory usage mean node i Mem, disk I/O usage average node i DiskIo, workload rate mean node i load. The acquisition period in this step may be configurable and may be the same as or different from the acquisition period in step S11.
After index data of each node in the cluster is acquired, the method comprises the steps ofCalculating the comprehensive index utilization rate N of each node by using the same formula as that in the step S11 i
S22, calculating the average comprehensive index utilization rate A of all nodes in the cluster;
wherein n is the number of nodes in the cluster, and the cluster average comprehensive index utilization rate A is the arithmetic average value of the comprehensive index utilization rates of all the nodes in the cluster.
S23, comparing the comprehensive index utilization N of each Node i The difference between the cluster average comprehensive index utilization rate A is D i =N i -a, pod eviction is performed for nodes whose difference exceeds a preset eviction threshold.
The preset eviction threshold may be configured, for example, the preset eviction threshold may be configured to be 0.3, when the comprehensive index utilization N of the ith node in the cluster i The average integrated index usage rate of cluster a=60% is 92%, the difference D i When the difference is greater than 0.3, the i-th node is Pod evicted, and the eviction policy may use a K8S native eviction policy.
In an embodiment of the present invention, before performing Pod eviction on a node, the method further includes a step of screening pods in the node, where the Pod meeting the following screening conditions is not evicted or deferred for eviction:
screening condition 1. Pod with node binding does not evict.
Some Pod may not bind to nodes, e.g., UI class Pod may not bind to nodes in general, and may be re-pulled by k8s and scheduled to other nodes after an exception. Some Pod will typically bind to a node, such as a database Pod, where the node to which the Pod binds has specific data stored, and therefore cannot migrate other nodes.
Screening condition 2. Daemon type Pod does not evict.
For example, a Pod of the daemon set type is used to run a copy of the daemon as a background process in each Kubernetes node. The Daemonset type Pod requires each node to run one, so such Pod is not evicted.
Screening condition 3. System level Pod does not evict.
For example, critical Pod is a system level Pod in Kubernetes to ensure high availability and stability for critical applications. When the nodes in the cluster fail or other problems cause Pod to operate normally, kubernetes can preferentially ensure the operation of the CriticalPod.
Screening condition 4. Inventory profile marks the level of eviction, pod with priority for eviction does not evict.
The manifest configuration file, yaml file, is a configuration file of important management resources in Kubernetes that defines a manifest of resources such as container workloads, services, configurations, etc., running in the cluster. Yaml files may be submitted into Kubernetes clusters by kubectl command line tool, which may manage the operation of container workloads, deployment, expansion, rollback, etc., according to instructions of the manifest file.
The Pod scheduling scheme provided by the invention is an improvement on the K8S native Pod scheduling method, and does not adopt a load balancing feedback mode outside the cluster to carry out Pod scheduling, so that the scheduling calculation efficiency is higher, an additional server is not required, the Pod copy is not required to be contracted/expanded, and an additional load balancing service is not required to be deployed.
According to the Pod scheduling scheme provided by the invention, the scheduling score is calculated by acquiring the history and the current value of the multi-dimensional index data, the K8s scheduler performs Pod scheduling based on the scheduling score, so that the problem of inaccurate scheduling caused by the fact that a user configures a Pod request or takes instant index data to calculate node scheduling score can be well solved, meanwhile, the problem of scheduling hysteresis caused by the fact that the prediction of a hot node is added, current limiting is performed in time, and the problem of recalculation after index data acquisition is avoided.
According to the Pod scheduling scheme provided by the invention, through monitoring the deviation of the comprehensive utilization rate of the nodes and the clusters, the Pod on the node with larger deviation is automatically interfered and expelled, and the Pod is screened under the screening condition before interference, so that the scheduling of the key service Pod is ensured to be as few as possible, and the stability of the service is maintained.
Fig. 2 is a schematic structural diagram of an electronic device for implementing the Pod scheduling method provided by the present invention in an embodiment of the present invention, where the device 200 includes: a processor 210 such as a Central Processing Unit (CPU), a communication bus 220, a communication interface 240, and a memory 230. Wherein the processor 210 and the memory 230 may communicate with each other via a communication bus 220. The memory 230 has stored therein a computer program which, when executed by the processor 210, performs the functions of one or more steps of the Pod scheduling method provided by the present invention.
Memory refers to a device for storing computer programs and/or data based on some storage medium, which may be a Volatile Memory (VM) or a Non-Volatile Memory (NVM). The memory is an internal memory for directly exchanging data with the processor, and can read and write data at any time, and has high speed, and is used as a storage medium for temporary data of an operating system and other running programs. The memory may be synchronous dynamic random access memory (Synchronous Dynamic Random Access Memory, SDRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), or the like. The nonvolatile memory is a memory using a persistent storage medium, and has a large capacity and can store data permanently, and may be a storage class memory (Storage Class Memory, SCM), a Solid State Disk (SSD), a NAND flash memory, a magnetic Disk, or the like. SCM is a common name for new storage medium between memory and flash memory, and is a composite storage technology combining persistent storage characteristic and memory characteristic, and has access speed slower than that of DRAM and SSD hard disk.
The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
It should be appreciated that embodiments of the invention may be implemented or realized by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory (or referred to as non-persistent) memory. The method may be implemented in a computer program using standard programming techniques, including a non-transitory storage medium configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose. Furthermore, the operations of the processes described in the present invention may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, collectively executing on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and/or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described herein includes these and other different types of non-transitory computer-readable storage media. The invention also includes the computer itself when programmed according to the methods and techniques of the present invention.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A Pod scheduling system, the system comprising:
the monitoring service is used for monitoring each Node and container in the cluster and providing a query interface of index data of each monitored Node and container;
the node data collector is used for collecting index data of each node in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service;
the container data collector is used for collecting index data of each container in the cluster and converting the collected index data into a time sequence data format supported by the monitoring service;
and the scheduling plug-in is used for scheduling and scoring each node according to index data of each node and the container provided by the monitoring service, the scheduling score is used for carrying out balanced scheduling on Pod in the cluster by a scheduler of Kubernets, and the scheduling score represents the comprehensive score of unused resources on the node.
2. The system of claim 1, wherein the scheduling plugin comprises:
the first data acquisition sub-module is used for acquiring index data of each node from the monitoring service; the index data comprise CPU utilization rate, memory utilization rate, disk I/O utilization rate and work load rate of each node in the cluster, and are provided in a mode of collecting period average value;
the node scheduling score is 1, and the comprehensive index utilization rate of the node is the sum of products of various index data average values of the node and the index weight of the index in the percentage of the total index weight.
3. The system of claim 2, wherein the scheduling plug-in further comprises:
and the score adjustment sub-module is used for acquiring the newly increased Pod number of each node in a preset period from the monitoring service, and reducing the scheduling score of the node when the newly increased Pod number of the node is larger than a preset adjustment threshold.
4. The system of claim 3, wherein the system further comprises a controller configured to control the controller,
the monitoring service acquires the number of Pod newly added by each node in a preset period by monitoring the number of Pod binding events generated on the node.
5. The system of claim 2, wherein the system further comprises:
and the balanced scheduling controller is used for acquiring index data of each node from the monitoring service, monitoring the comprehensive index utilization rate of each node in the cluster in real time, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the nodes with the difference value exceeding a preset eviction threshold.
6. The system of claim 5, wherein the balanced scheduling controller comprises:
a second data acquisition sub-module for acquiring index data from the monitoring service;
and the balanced scheduling sub-module is used for calculating the comprehensive index utilization rate of each node in the cluster according to the acquired index data, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the node with the difference value exceeding a preset eviction threshold.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
before performing Pod eviction, the balanced scheduling sub-module filters pods meeting screening conditions in the nodes, wherein the screening conditions comprise:
pod with node binding is not evicted;
the daemon type Pod does not evict;
the system level Pod does not evict;
the manifest configuration file marks the eviction level and the Pod that is evicted by priority does not evict.
8. A Pod scheduling method, the method comprising:
collecting index data of each node and container in the cluster; the index data comprise CPU utilization rate, memory utilization rate, disk I/O utilization rate and work load rate of each node in the cluster, and are provided in a mode of collecting period average value;
scheduling and scoring the nodes according to the collected index data of each node and container, wherein the scheduling and scoring is used for carrying out balanced scheduling on Pod in a cluster by a scheduler of Kubernets, and the scheduling and scoring represents the comprehensive scoring of unused resources on the nodes;
the scheduling score of a node is the difference value between 1 and the comprehensive index utilization rate of the node, and the comprehensive index utilization rate of the node is the sum of products of various index data average values of the node and the percentage of the index weight of the index to the total index weight.
9. The method of claim 8, wherein the method further comprises:
and acquiring the newly increased Pod number of each node in a preset period from the monitoring service, and reducing the scheduling score of the node when the newly increased Pod number of the node is larger than a preset adjustment threshold.
10. The method of claim 8, wherein the method further comprises:
and acquiring index data of each node from the monitoring service, monitoring the comprehensive index utilization rate of each node in the cluster in real time, carrying out Pod balanced scheduling according to the difference value between the comprehensive index utilization rate of the node and the average comprehensive index utilization rate of the cluster, and carrying out Pod eviction on the nodes with the difference value exceeding a preset eviction threshold.
11. An electronic device is characterized by comprising a processor, a communication interface, a storage medium and a communication bus, wherein the processor, the communication interface and the storage medium are communicated with each other through the communication bus;
a storage medium storing a computer program;
a processor for implementing the method of any of claims 8-10 when executing a computer program stored on a storage medium.
12. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 8 to 10.
CN202311181421.4A 2023-09-13 2023-09-13 Pod scheduling method and system Pending CN117369941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311181421.4A CN117369941A (en) 2023-09-13 2023-09-13 Pod scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311181421.4A CN117369941A (en) 2023-09-13 2023-09-13 Pod scheduling method and system

Publications (1)

Publication Number Publication Date
CN117369941A true CN117369941A (en) 2024-01-09

Family

ID=89397295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311181421.4A Pending CN117369941A (en) 2023-09-13 2023-09-13 Pod scheduling method and system

Country Status (1)

Country Link
CN (1) CN117369941A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117729204A (en) * 2024-02-06 2024-03-19 山东大学 K8S container scheduling method and system based on monitoring perception

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117729204A (en) * 2024-02-06 2024-03-19 山东大学 K8S container scheduling method and system based on monitoring perception

Similar Documents

Publication Publication Date Title
US20200287961A1 (en) Balancing resources in distributed computing environments
US9552231B2 (en) Client classification-based dynamic allocation of computing infrastructure resources
US10331469B2 (en) Systems and methods of host-aware resource management involving cluster-based resource pools
Jalaparti et al. Network-aware scheduling for data-parallel jobs: Plan when you can
US10789102B2 (en) Resource provisioning in computing systems
AU2011320763B2 (en) System and method of active risk management to reduce job de-scheduling probability in computer clusters
KR101815148B1 (en) Techniques to allocate configurable computing resources
US9319281B2 (en) Resource management method, resource management device, and program product
CN106452818B (en) Resource scheduling method and system
US11188561B2 (en) Prioritizing microservices on a container platform for a restore operation
CN113010260A (en) Elastic expansion method and system for container quantity
US9870269B1 (en) Job allocation in a clustered environment
WO2017186123A1 (en) System and method for distributed resource management
JP2010244181A (en) Virtual machine management system, and virtual machine arrangement setting method and program
EP3011474A1 (en) Monitoring a computing network
CN117369941A (en) Pod scheduling method and system
US20210406053A1 (en) Rightsizing virtual machine deployments in a cloud computing environment
CN116149846A (en) Application performance optimization method and device, electronic equipment and storage medium
JP7368143B2 (en) Service deployment control system, service deployment control method, and storage medium
CN116467082A (en) Big data-based resource allocation method and system
CN107203256B (en) Energy-saving distribution method and device under network function virtualization scene
CN112000460A (en) Service capacity expansion method based on improved Bayesian algorithm and related equipment
CN111352726A (en) Streaming data processing method and device based on containerized micro-service
JP2012141671A (en) Method for migrating virtual computer, virtual computer system and control server
CN113127289B (en) Resource management method, computer equipment and storage medium based on YARN cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination