CN112052068A

CN112052068A - Method and device for binding CPU (central processing unit) of Kubernetes container platform

Info

Publication number: CN112052068A
Application number: CN202010825344.1A
Authority: CN
Inventors: 陈林祥; 蒋玉玲; 田松; 彭天彬
Original assignee: Fiberhome Telecommunication Technologies Co Ltd
Current assignee: Fiberhome Telecommunication Technologies Co Ltd
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-12-08

Abstract

The invention discloses a method and a device for CPU (central processing unit) core binding of a Kubernetes container platform, wherein the method comprises the following steps: the DevicePlugin mechanism based on the Kubernetes container platform reports the available CPU list to Kubelet; designating CPU binding information when creating the container Pod, and the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information and records allocation information; and writing the allocated CPU core into a CPU set of the Linux system according to the allocation record, and binding the container Pod to the allocated CPU core, thereby realizing affinity binding of the CPU. The scheme makes full use of a DevicePlugin mechanism, and issues the available CPU as schedulable device resource to the Kubelet, so that the CPU core binding is specified, and the stable operation of the CPU sensitive application is met.

Description

Method and device for binding CPU (central processing unit) of Kubernetes container platform

Technical Field

The invention belongs to the technical field of container platform software products, and particularly relates to a Kubernetes container platform CPU core binding method and device.

Background

Kubernetes is a container orchestration engine for Google open sources that supports automated deployment, large-scale scalable, application containerization management. In recent years, a Kubernets-based container arrangement system has become a de facto standard, extending from the traditional IT industry to the CT industry, and all large enterprises actively embrace container technology. The container technology reduces IT operation and maintenance cost, shortens service delivery cycle, and the PaaS platform integrating functions of agile cloud development, application management, elastic expansion, resource monitoring, micro-service management and the like gradually becomes an important bearing platform for service cloud.

With the development of 5G, both core Network telecommunication cloud platforms and edge computing cloud platforms are gradually evolving from NFV (Network Functions Virtualization) Virtualization cloud platforms to a state where a container platform and a virtual machine cloud platform coexist. More and more telecommunication applications adopt cloud-native representative technologies such as containers, service grids and the like to construct telecommunication network services, and compared with IT applications, a telecommunication application CT network element has more CPU-sensitive applications and higher requirements on performance and reliability, and often needs to realize a scene that a network element is bound to a certain CPU core to operate so as to monopolize CPU resources to operate and improve the performance of the network element. There are several types of CPU-sensitive applications, such as sensitivity to CPU cache misses, sensitivity to latency introduced by Memory accesses across NUMA (Non Uniform Memory Access Architecture) slots, and the like.

The CPU binding is to set a binding relationship between a configuration process and a CPU core, and allow a process to run on a designated CPU core. Although the container application binding core can reduce the CPU utilization rate of the whole cluster to a certain extent, the service performance and the stability can be ensured. By utilizing the affinity binding mode of the CPU, a specific task can be specified to work on a certain CPU core, so that frequent switching among different cores among threads is reduced, and the inter-core thread switching easily causes a large amount of performance loss caused by cache miss (namely cache miss) and cache write back (namely cache write back). A soft interrupt binding function isolcpus (CPU isolation, which indicates that a Linux process or thread monopolizes a certain CPU or some CPUs) may be used to further restrict some CPU cores from participating in Linux system scheduling, so that the thread monopolizes the core, and multi-task switching overhead in the same core is avoided while more cache hits (i.e., cache hits) are ensured. On a container cloud platform, when a plurality of CPU-intensive pods are operated on a node, a workload may be migrated to different CPU cores, and the influence of CPU cache affinity and scheduling delay can be sensed for CPU sensitive applications, which may cause on-line service jitter; where Pod is the basic unit of Kubernetes creation or deployment of containers, one Pod encapsulates one or more application containers.

Kubernets defaults to provide a CPU-bound implementation of certain capabilities, with its Kubelet component using CFS (complete Fair schedule algorithm for Linux) quotas to enforce the CPU constraints of Pod. In this regard, Kubelet provides an optional CPU Manager Policy to determine some distribution preferences on a node. CPUManager is a module in Kubelet and aims to improve the performance of CPU-sensitive tasks by binding CPUs to certain containers. Currently, CPU Manager supports two strategies: one is none, does not provide affinity policy beyond the default behavior of the operating system scheduler, i.e. no binding is found; the other is static, allowing statically assigned CPU affinity implementations for Pod with certain resource characteristics on a node. If the bound core needs to be started and needs to be set to static, the Kubelet allocates a bound cgroup cpuiset before the container is started, CPU affinity and exclusivity are given to Pod on the node, and the exclusivity is realized by using a cgroup cpuiset controller; wherein, cgroup, i.e. Control Groups, is a mechanism for limiting, controlling and separating the resources of the process group by the Linux kernel; cpuiset is a type of CPU group controller used to limit CPU usage by processes. When the guarded Pod is dispatched to the node, if the container of the guarded Pod meets the static allocation requirement, the corresponding CPU is removed from the shared pool and is placed in the CPU set of the container; the Guaranteed Pod represents Guaranteed qos management, and is a type of qos management for Kubernetes to implement effective resource scheduling. Because the CPUs used by these containers are limited by the dispatch domain itself, there is no need to use CFS quotas for CPU binding. This static allocation is a fixed policy of the Kubernetes system and will be allocated in the pool of available CPUs outside the reserved CPU of the Kubelet configuration according to the following policy:

1) if the number of logical CPU requested by the container is not less than the number of logical CPU cores in the single block of CPUSocket, the logical CPU cores in the single block of CPUSocket are preferentially allocated to the container.

2) If the number of logical CPU cores requested by the container remaining is not less than the number of logical CPU cores provided by the single physical CPU, the container is preferentially allocated to the logical CPU cores on the entire physical CPU.

3) And selecting the logic CPU core requested by the container under the condition other than the two conditions from the logic CPU core list in a regular sequence, and distributing the logic CPU core to the same CPU Socket preferentially and distributing the logic CPU core to the same physical CPU secondly.

Therefore, the biggest problem of the fixed allocation strategy of the CPU Manager Policy function at present is that a CPU core cannot be specified for core binding, and only the static allocation of a background for automatically selecting a proper CPU according to the fixed strategy is supported. Further, the current CPU core binding also has the following technical problems: NUMA can not be supported to distribute CPU binding cores, and only background random distribution of server nodes is supported; NUMA topology perception and NUMA affinity distribution of PCI devices such as a tera optical card cannot be supported; the kernel parameters of the isoLCpus of the Linux system cannot be supported, and the default scheduling mode does not support the existing CPU isolation configuration on the Linux system, so that other processes on the Linux system can run on the CPU core bound with the affinity; when the container Pod needs to replace the CPU core for binding, dynamic allocation cannot be supported, and the CPU set parameter is directly changed in the operation process of the container to realize core binding modification.

Disclosure of Invention

Aiming at the defects or the improvement requirements in the prior art, the invention provides a Kubernets container platform CPU core binding method and a Kubernets container platform CPU core binding device, aiming at realizing the specified CPU binding of a container by taking CPU resources as equipment resources to manage, thereby solving the technical problem that the traditional fixed allocation strategy can only realize the core binding after random scheduling and can not specify the CPU core binding.

To achieve the above object, according to one aspect of the present invention, there is provided a kubernets container platform CPU binding method, comprising:

the DevicePlugin mechanism based on the Kubernetes container platform reports the available CPU list to Kubelet; the available CPU list comprises available CPU cores of each server node and corresponding CPU topology information;

when a container Pod is created on a Kubernetes container platform, CPU binding information is designated, so that a Kubelet allocates a corresponding CPU core to the container Pod according to a reported available CPU list and the CPU binding information, and allocation information is recorded;

and writing the allocated CPU core into a CPU set of the Linux system according to the allocation record of the Kubelet, and binding the container Pod to the allocated CPU core, thereby realizing the affinity binding of the CPU.

Preferably, the Kubernetes container platform based devicepugin mechanism reports the available CPU list to Kubelet, specifically:

acquiring an available CPU core and corresponding CPU topology information of each server node, and generating an available CPU list based on the acquired information; the CPU topology information comprises CPU information, NUMA topology information and NUMA node information to which the PCI equipment belongs;

the DevicePlugin plug-in registers with the Kubelet when being started, acquires the available CPU list after registering, and reports the acquired available CPU list to the Kubelet.

Preferably, the specifying CPU binding information when creating the container Pod on the kubernets container platform so that the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and records allocation information, specifically includes:

when a container Pod is created on a Kubernetes container platform, CPU binding information is appointed according to service requirements, and a corresponding CPU binding request is submitted based on the CPU binding information;

verifying whether the CPU core binding request is legal or not, and if so, generating a CPU core binding list according to the appointed CPU core binding information;

and calling a DevicePlugin plug-in by the Kubelet according to the CPU binding list to select a CPU core meeting the conditions from the available CPU list to be allocated to the container Pod, and recording allocation information.

Preferably, when a service application corresponding to a container Pod is started, if the service application needs to perform CPU binding on the container Pod, a specific request identifier is set and attached to a corresponding CPU binding request after the container Pod is created;

the specific step of verifying whether the CPU core binding request is legal is: verifying whether the CPU core binding request carries a specific request identifier or not; if yes, the CPU core binding request is proved to be legal; if not, the CPU core binding request is proved to be illegal.

Preferably, the Kubelet selects a CPU core meeting the condition from the available CPU list according to the CPU binding list, and allocates the CPU core meeting the condition to the container Pod, specifically:

a scheduling program on the Kubernetes container platform schedules the container Pod to a server node meeting the conditions according to the CPU binding list and the available CPU list;

and the scheduling program calls DevicePlugin to select a CPU core meeting the conditions on the server node to be allocated to the container Pod according to the CPU binding list and the available CPU list.

Preferably, when creating a container Pod on the kubernets container platform, the manner of specifying CPU binding information is specifically:

specifying the specific number of the CPU core to be bound by the container Pod; alternatively, the first and second electrodes may be,

specifying the number of CPU cores to be bound by the Pod; alternatively, the first and second electrodes may be,

appointing NUMA nodes where CPU cores needing to be bound by the container Pod are located; alternatively, the first and second electrodes may be,

specifying that the container Pod be bound on a NUMA node aligned with a PCI device.

Preferably, when a specific number of a CPU core that the container Pod needs to bind is specified, the Kubelet selects a CPU core with a corresponding number from the available CPU list to allocate to the container Pod;

when the number of CPU cores needing to be bound by the container Pod is specified, the Kubelet selects a corresponding number of CPU cores from the available CPU list to allocate to the container Pod;

when the NUMA node where the CPU core needing to be bound by the container Pod is located is specified, the Kubelet selects the CPU core on the corresponding NUMA node from the available CPU list to be allocated to the container Pod;

when the container Pod is bound to the NUMA node aligned with the PCI device, the Kubelet finds the NUMA node aligned with the PCI device to be used by the container Pod, and selects the CPU core on the NUMA node from the available CPU list to be allocated to the container Pod.

Preferably, after the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and records allocation information, the method further includes:

and according to the allocation record of the Kubelet, removing the allocated CPU core from the available CPU list, and completing the updating of the available CPU list, so that when the next container Pod is created, the CPU allocation is carried out according to the updated available CPU list.

Preferably, before the kubernets container platform based deviceplug mechanism reports the available CPU list to Kubelet, the method further comprises: when a Linux system kernel is started, CPU kernels in a cluster are divided into an exclusive type and a non-exclusive type; after the container Pod is bound with any exclusive type of CPU core, the scheduling task of the Linux system prohibits other processes from running on the CPU core, so that the processes on the container Pod are exclusively run on the CPU core.

According to another aspect of the present invention, there is provided a kubernets container platform CPU binding apparatus, characterized in that, the method for performing kubernets container platform CPU binding according to the first aspect includes a CPU sensing component, a CPU binding deviceplipugin component, a CPU binding setting component, and a CPU binding request verifying component;

the CPU perception component is used for acquiring an available CPU core of each server node and corresponding CPU topology information, and further generating an available CPU list based on the acquired information;

the CPU core binding DevicePlugin component is used for acquiring the available CPU list from the CPU perception component, reporting the available CPU list to a Kubelet based on a DevicePlugin mechanism, so that the Kubelet allocates a corresponding CPU core to a container Pod according to the specified CPU core binding information when the container Pod is created and the reported available CPU list, and records allocation information;

the CPU core binding setting component is used for writing the distributed CPU core into a CPU set of the Linux system according to the distribution record of the Kubelet, so that the container Pod is bound to the distributed CPU core, and the affinity binding of the CPU is realized;

the CPU core binding request verifying component is used for verifying whether the CPU core binding request is legal or not after receiving the CPU core binding request submitted based on the CPU core binding information, and if so, a CPU core binding list is generated by a Kubernetes container platform according to the CPU core binding information.

Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects: in the CPU binding scheme of the Kubernetes container platform, the DevicePlugin mechanism of Kubernetes is fully utilized, and the available CPU on each server node is used as a schedulable device resource to be issued to a Kubelet, so that the user-defined scheduling of the CPU is realized; after the container Pod is created and the bound CPU is specified, the Kubelet can allocate the corresponding CPU to the container Pod from the available device resource pool according to the specified bound requirement, and the affinity binding of the container Pod and the specified CPU is realized. By the scheme, the CPU core can be appointed to perform core binding, and stable operation of the CPU sensitive application is met.

Drawings

FIG. 1 is a flow chart of a method for CPU binding of a Kubernets container platform according to an embodiment of the present invention;

FIG. 2 is a CPU topology under a NUMA architecture provided by embodiments of the present invention;

FIG. 3 is a flowchart of a method for assigning CPU cores according to an embodiment of the present invention;

FIG. 4 is a block diagram of a device for CPU binding of a Kubernets container platform according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the interaction between a CPU core binding DevicePlugin component and a Kubelet according to an embodiment of the present invention;

fig. 6 is a flowchart for performing kubernets container platform CPU binding based on each core binding component according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

In order to solve the technical problem that the traditional fixed allocation strategy can only realize core binding after random scheduling and cannot specify CPU core binding, the embodiment of the invention provides a method for binding CPU core of a Kubernetes container platform, as shown in FIG. 1, which mainly comprises the following steps:

step 10, reporting an available CPU list to Kubelet based on a DevicePlugin mechanism of a Kubernetes container platform; the available CPU list comprises available CPU cores of each server node and corresponding CPU topology information.

The step mainly utilizes a DevicePlugin mechanism (namely a device plug-in mechanism) to release available CPU resources in the Kubernetes cluster to Kubelet as schedulable device resources, thereby realizing the self-defined scheduling of the CPU. The method is specifically implemented in the following two steps:

the method comprises the steps of firstly, acquiring an available CPU core and corresponding CPU topology information of each server node, and generating an available CPU list based on the acquired information; the CPU topology information comprises CPU information, NUMA topology information and NUMA node information to which the PCI equipment belongs. The CPU topology resource, NUMA topology and PCI topology automatic discovery which are available in the cluster is realized through a CPU topology resource automatic discovery mechanism, and a finally formed available CPU list can be temporarily stored in an ETCD database; among them, the ETCD is a highly available distributed key-value storage system of Kubernetes.

Wherein, the CPU topology of each server node can refer to fig. 2: in a topology structure under a server NUMA architecture, a server motherboard generally has a plurality of CPU slots corresponding to a plurality of physical CPUs, i.e., CPUs 0-3 in fig. 2; one CPU slot corresponds to one logical NUMA node and is respectively marked as NUMA0-NUMA 3; there are again multiple logical CPU Cores, Cores in fig. 2, on each physical CPU. It should be noted that the CPU core binding according to the present invention refers to binding with a logical CPU core, and the CPU core mentioned herein is an abbreviation of the logical CPU core, and is not a physical CPU. With continued reference to FIG. 2, under NUMA architecture, PCI devices are directly bound to physical CPUs or NUMA nodes through I/O controllers; the memory is directly bound on the physical CPU through the memory controller. Among them, a physical CPU has a short response time when accessing a memory physical address managed by itself, and needs to access the memory address managed by another physical CPU through a QuickPath Interconnect (QPI), so that the response time is increased. As can be seen from fig. 2, the CPU topology information includes not only available CPU information but also NUMA topology information and NUMA node information to which the PCI device belongs.

And secondly, registering the DevicePlugin plug-in to the Kubelet when the DevicePlugin plug-in is started, acquiring the available CPU list after the registration, and reporting the acquired available CPU list to the Kubelet. Specifically, the devicelugin plug-in will register itself to Kubelet through kubelelet. sock in the form of gRPC after starting up, so that the registered devicelugin plug-in is called when Kubelet creates container Pod later; the gPC is a remote procedure call method, the Kubelet is an agent of a Kubernetes cluster running on each server node, and the Kubelet is a file of a Kubelet process communication unix domain socket. And then, an available CPU list is obtained from the ETCD database, and the obtained available CPU list is reported to the Kubelet, so that a schedulable CPU equipment pool is formed in the Kubelet. That is, there is a need to implement two-way registration reporting to Kubelet, namely the devicepugin plug-in itself and the list of available CPUs.

The method mainly uses the List-Watch and allocation functions of the DevicePlugin plug-in, monitors the change of CPU equipment information in the available CPU List through the List-Watch interface, and acquires the updated available CPU List; and the designated allocation of the CPU core is realized through an allocation interface.

And 20, when the container Pod is created on the Kubernetes container platform, CPU binding information is designated, so that the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and allocation information is recorded.

The step is mainly to call a DevicePlugin plug-in through Kubelet and realize the appointed distribution of the CPU according to the appointed CPU binding requirement. The CPU cores on each server node are numbered in sequence from 0, the specific CPU core can be determined according to the number, and the number of each CPU core can be checked through a command line on a Linux system. When a container Pod is created on a kubernets container platform, CPU binding information is specified according to service requirements, and the specification can be specifically specified in the following four ways:

the first method is as follows: specifying the specific number of CPU cores to which the container Pod needs to be bound. If the service has specific requirements and the container Pod needs to be bound on a certain specified CPU core or certain CPU cores, the specific number of the CPU core needing to be bound can be manually filled or selected from a drop-down list of the available CPU list. During allocation, the Kubelet can select the CPU core with the corresponding number from the available CPU list to allocate to the container Pod, specifically, a background scheduler of the Kubernetes container platform automatically schedules the container Pod to the CPU core with the corresponding number on a certain server node to run.

For example, assuming CPU core number 10 is designated for binding, the background scheduler will determine which server node has CPU core number 10 available for use. If the CPU core No. 10 on the first server node is not bound, the dispatcher can directly dispatch the container Pod to the CPU core No. 10 of the node for running; if the CPU core No. 10 on the first server node is already bound, the second server node can be continuously searched for the corresponding CPU core No. 10 to see whether the CPU core is bound or not. And repeating the steps until a certain server node is found, the corresponding CPU core No. 10 is not bound, and the appointed distribution is completed.

The second method comprises the following steps: specifying the number of CPU cores that the container Pod needs to bind to. Some services may only have requirements on the number of CPUs needing to be bound, but have no requirements on specific binding with which CPU core, and the number of the CPU cores needing to be bound can be directly filled in; the number of the bound cores may be one or more, and is mainly related to the traffic load, and the larger the traffic load is, the larger the number of the CPUs needing the bound cores is. During allocation, the Kubelet selects a corresponding number of CPU cores from the available CPU list to allocate to the container Pod, specifically, a background scheduler of the Kubernetes container platform automatically schedules the corresponding number of CPU cores from a certain server node to the container Pod, and the scheduled CPU cores need to be located on the same NUMA node.

For example, assuming that the number of CPUs that require a core is specified to be 2, the daemon will determine which server node has two CPU cores available on the same NUMA node. If CPU core number 10 and 11 on the very first server node have not been bound, and the two CPU cores happen to be on the same NUMA node, the scheduler can directly schedule the container Pod to run on both CPU cores. If the first server node does not have enough CPU cores available, the second server node can continue to search; and repeating the steps until a certain server node is found, wherein two available CPU cores can be located on the same NUMA node, and the appointed distribution is completed.

The third method comprises the following steps: and the specified container Pod needs to bind the NUMA node where the CPU core is located, namely, the specified NUMA operation. In certain specific scenarios, such as running high performance MySQL database multi-instance services, it is necessary to bind the container Pod to different NUMA nodes respectively to balance NUMA allocation and improve application performance. In this case, a NUMA node may be designated to create a container Pod, and specifically, the created container Pod may be manually filled in or selected from an optional list as to which NUMA node the created container Pod is located; for example, the selection runs on NUMA0 or NUMA 1. During allocation, the Kubelet selects the CPU core on the corresponding NUMA node from the available CPU list to allocate to the container Pod, specifically, a background scheduler of the kubernets container platform automatically allocates the corresponding CPU core from the NUMA topology to the container Pod.

For example, assuming that the number of CPUs that require a core to bind is specified to be 2 and must run on the NUMA1 node, the daemon determines from the list of available CPUs which two available CPU cores on the NUMA1 node are available for use. If just CPU core number 10 and 11 on NUMA1 node have not been bound, the scheduler can directly schedule the container Pod to run on both CPU cores. This schedule fails if there are no two available CPU cores on the NUMA1 node that can be used.

The method is as follows: specifying binding of container Pod on NUMA node aligned with PCI device, i.e., specifying pciniuma affinity operation. As can be seen from fig. 2, besides the physical CPU, the system also has many other PCI devices, such as optical cards, network cards, etc.; assuming that a first optical card is directly bound on a NUMA0 node and a second optical card is directly bound on a NUMA1 node; at this time, if the process in the container Pod uses the optical port on the first optical card, for example, when a high-performance network forwarding program such as DPDK is used, the performance of the container process and the used optical port running on the NUMA0 node will be higher; the optical card is an optical network card, and the optical port is an optical fiber port on the optical card. Thus, when an allocation is made, the Kubelet will find a NUMA node that is aligned with the PCI device to be used by the container Pod, and select a CPU core on that NUMA node from the list of available CPUs to allocate to the container Pod.

And step 30, writing the allocated CPU core into a CPU set of the Linux system according to the allocation record of the Kubelet, and binding the container Pod to the allocated CPU core, thereby realizing affinity binding of the CPU.

The step is mainly to realize the binding of the container Pod and the CPU core through the CPU set of Linux, so that the CPU binding setting of the container Pod is completed, and the process on the container Pod can run on the bound CPU core. Furthermore, in the operation process of the Pod, if the bound CPU core needs to be replaced, the parameters of the CPU set are changed to realize the modification of the bound core, so that the dynamic update of the CPU core is supported; the specific process is as follows: re-assigning the CPU binding information according to the binding replacement requirement so that the Kubelet re-assigns the CPU core meeting the condition to the container Pod according to the re-assigned CPU binding information and the available CPU list; and writing the redistributed CPU core into a CPU set of the Linux system, replacing the originally distributed CPU core, and binding the container Pod to the redistributed CPU core instead. Equivalent to resetting the CPU affinity of the container Pod by modifying the parameters of cpsets.

Further, when the information is collected in step 10, for any server node, all CPU cores on the node are not generally used for container binding, because the Linux system itself needs to occupy a part of CPU cores, for example, for a system network card, a platform component, and the like. Therefore, after the available CPU core of each server node is acquired, the CPU cores that need to be used by the system themselves need to be excluded, and then the remaining CPU cores are used to construct an initial available CPU list.

In addition, after the initial available CPU core and corresponding CPU topology information of each server node are collected and the available CPU list is generated, hardware changes may still occur in subsequent clusters, which may cause the CPU topology to change accordingly and the available CPU core on the server node to change. In view of this, the corresponding CPU aware component may periodically acquire the available CPU core of each server node and the corresponding CPU topology information, and further periodically update the available CPU list. Therefore, even if the CPU topology information changes due to hardware changes, the available CPU list can be updated in time, and the CPU cores can be allocated and bound according to the updated available CPU list when the subsequent container Pod is created.

Further, after the step 20 is executed, that is, after the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and records allocation information, the method further includes: and according to the allocation record of the Kubelet, removing the allocated CPU core from the available CPU list, and completing the updating of the available CPU list, so that when the next container Pod is created, the CPU allocation is carried out according to the updated available CPU list. After the updating is completed, the remaining CPU cores in the available CPU list are all unbound by other processes, and then when the container Pod is created subsequently, the remaining unbound CPU cores on each server node can be directly selected, so that the scheduling efficiency is improved.

In summary, the kubernets container platform CPU binding method provided in the embodiments of the present invention mainly has the following advantages:

CPU resources of each server node are issued to the Kubelet as schedulable device resources by utilizing a DevicePlugin mechanism of Kubernetes, so that after a container Pod is created and a core binding CPU is appointed, the Kubelet can allocate a corresponding CPU core to the container Pod from an available device resource pool according to the appointed core binding requirement, affinity binding of the container Pod and the appointed CPU core is realized, and stable operation of CPU sensitive application is met. The method and the device support the Kubernets container platform to bind the CPU core, assign the NUMA to distribute the CPU core and support the NUMA affinity distribution of the PCI equipment when the container is created, and can solve the technical problems of the binding of the container assigned CPU core, the operation of the container assigned NUMA and the operation of the container assigned PCI NUMA affinity under the scenes of NFV and the like of the CPU sensitive application.

In addition, the binding of the container Pod and the CPU core is realized through the CPU set of Linux, the dynamic allocation of the CPU core is supported, the bound CPU core can be updated in the container operation process, and the CPU affinity of the container Pod can be reset by modifying the parameters of the CPU set.

Example 2

On the basis of the kubernets container platform CPU binding method provided in embodiment 1, the embodiment of the present invention further introduces step 20. As shown in fig. 3, the step 20, that is, the Kubelet selects a CPU core meeting the condition from the available CPU list according to the CPU binding list, and allocates the CPU core meeting the condition to the container Pod, specifically includes the following steps:

step 201, when creating a container Pod on a Kubernetes container platform, designating CPU core binding information according to service requirements, and submitting a corresponding CPU core binding request based on the CPU core binding information.

As can be seen from embodiment 1, the CPU bind information can be specified here in four ways: in the first mode, the specific number of the CPU core to be bound by the container Pod is specified; specifying the number of CPU cores to be bound by the Pod; specifying NUMA nodes where CPU cores needing to be bound by the container Pod are located; the fourth way, the container Pod is appointed to be bound on the NUMA node aligned with the PCI device; for specific specifying modes and principles, reference may be made to embodiment 1, which is not described herein in detail. After the CPU core binding information is specified, the Kubelet may submit a corresponding CPU core binding request to the CPU core binding request validation component based on the CPU core binding information.

Step 202, verifying whether the CPU binding request is legal, and if so, generating a CPU binding list according to the appointed CPU binding information.

Although the Kubernetes container platform supports CPU core binding, not all the containers Pod corresponding to the service application have the need to perform CPU core binding operation. Some services have lower application level, do not need to be bound by a CPU, and do not have the requirement; some business applications have a high level, and therefore, CPU binding is required. Whether the service application needs to be bound by the CPU is already set when the service application is started, and the method specifically comprises the following steps: when a service application corresponding to a container Pod is started, if the service application needs to perform CPU binding on the container Pod, a specific request identifier is set, and the identifier exists all the time when the container Pod is created and is attached to a corresponding CPU binding request after the container Pod is created.

Therefore, the verifying whether the CPU core binding request is legal specifically includes: and after receiving the CPU binding request, the CPU binding request verifying component verifies whether the CPU binding request carries a specific request identifier. If yes, the CPU binding request is proved to be legal, namely the corresponding service application really needs to carry out CPU binding, and a CPU binding list is generated according to the appointed CPU binding information subsequently; if not, the CPU core binding request is proved to be illegal, namely the corresponding service application does not need to carry out CPU core binding, and subsequent CPU distribution and binding operation is not needed to be carried out. By verifying the specific request identifier, the container Pod creating operation or the container Pod changing operation of the non-binding request can be distinguished, and unnecessary waste of CPU resources is avoided.

When the specific number of the CPU core to which the container Pod needs to be bound is specified in step 201, the specific number of the CPU core is recorded in the CPU binding list; when the number of the CPU cores to be bound by the container Pod is specified in step 201, the number of the CPU cores is recorded in the CPU binding list; when the NUMA node where the CPU core to which the container Pod needs to be bound is located is specified in step 201, the specific number of the NUMA node is recorded in the CPU binding list; when it is specified in step 201 that the container Pod is bound on a NUMA node aligned with a PCI device, a specific name or number of the PCI device is recorded in the CPU binding list.

Step 203, the Kubelet calls the devicepugin plug-in to select the CPU core meeting the conditions from the available CPU list to allocate to the container Pod according to the CPU binding list, and records allocation information. Wherein, the distribution process can be executed by the following two steps:

firstly, a scheduling program on a Kubernetes container platform schedules the container Pod to a server node which meets the condition according to the CPU binding list and the available CPU list. Wherein, the condition of meeting here means that the information recorded in the CPU binding list is satisfied; for example, when the number of CPU cores is specified to be 4, then a condition of being met means that there should be at least 4 CPU cores available on the server node. Of course, a plurality of server nodes meeting the condition may be selected, so that an optimal server node meeting the rule of each filter may be further selected from the plurality of server nodes meeting the condition through the filter (i.e., filter) in the scheduler, for example, a server node with the lowest CPU load and the lowest disk pressure may be selected, thereby achieving optimal scheduling of the server node, and finally, the container Pod is scheduled to the server node.

And secondly, calling DevicePlugin to select qualified CPU cores on the server node to be allocated to the container Pod by the dispatcher according to the CPU binding list and the available CPU list. Wherein, the dispatcher realizes the designated allocation of the CPU core by calling the Allocate method of the DevicePlugin plug-in. In addition, the condition of meeting here also means that the information recorded in the CPU binding list is satisfied; for example, when the number of CPU bindings is designated as 4, the container Pod is dispatched to 4 of the CPUs of the server node selected previously.

After the Kubelet completes the designated allocation of the CPU core by calling the devicepugin plug-in, the allocated CPU core may be recorded, and then the allocated CPU core is removed from the available CPU list, that is, from the CPU device pool in the Kubelet, which indicates that the CPU core is already bound, and when the next container Pod is created, the remaining unbound CPU cores may be selected from the CPU device pool for binding.

Example 3

Based on the kubernets container platform CPU core binding methods provided in

embodiments

1 and 2, the core binding scheme provided by the present invention may also support using a CPU core configured by Linux isolcocpus according to a server node, thereby implementing strong isolation of CPU core binding.

If the container Pod needs to exclusively run the CPU, when the Linux system kernel is started and before the Kubernets container platform runs, the CPU kernels in the cluster are divided into an exclusive type and a non-exclusive type in advance by setting parameters. When the container Pod calls a cgroup CPU set mechanism of Linux to bind with any exclusive type of CPU core, the scheduling task of the Linux system prohibits other processes from running on the CPU, so that the processes on the container Pod are enabled to run exclusively on the CPU, and the container pods belonging to different CPU pools are ensured to be physically isolated from each other all the time.

In combination with embodiment 1 and embodiment 2, if it is considered that some containers Pod need to exclusively run the CPU, the whole CPU binding process is roughly as follows:

in step 10, the kubernets container platform based devicepugin mechanism reports the list of available CPUs to Kubelet.

As in embodiment 1, first, an available CPU core and corresponding CPU topology information of each server node are acquired, and an available CPU list is generated based on the acquired information; then the DevicePlugin plug-in registers the plug-in to the Kubelet, and simultaneously reports the acquired available CPU list to the Kubelet; the specific implementation process can refer to embodiment 1, which is not described herein. It should be noted that, since the Linux system kernel has been divided into two classes, exclusive and non-exclusive, in advance when starting up, the available CPU cores in the available CPU list actually include two classes, namely, exclusive CPU and non-exclusive CPU. The exclusive CPU and/or the non-exclusive CPU can be set with corresponding marks for representing the types of the CPU cores so as to distinguish the CPU cores during subsequent CPU core allocation.

In step 20, CPU binding information is designated when a container Pod is created on the kubernets container platform, so that the Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and records allocation information.

When the CPU core binding information is specified here, it is necessary to further specify whether or not the CPU core of the exclusive type needs to be selected for binding, in addition to the four ways enumerated in embodiment 1. If the process in the container Pod has higher performance requirement and needs to monopolize the CPU to run, the container Pod is appointed to be bound on an monopolizing CPU core; if the process in the container Pod does not need to run exclusively on the CPU, the container Pod does not need to be bound to the exclusive CPU core, that is, no specific requirement is made.

When the allocation is performed, the Kubelet firstly judges whether the container Pod needs to monopolize the operation of the CPU according to the CPU core binding information. If the exclusive CPU operation is needed, finding the exclusive type CPU core on each server node from the available CPU list, and then selecting the CPU core meeting the conditions from the exclusive type CPU cores to allocate to the container Pod according to the methods in the embodiment 1 and the embodiment 2; if the exclusive CPU operation is not needed, the CPU core meeting the conditions is directly selected to be allocated to the container Pod according to the methods in the embodiment 1 and the embodiment 2, and the type of the CPU core does not need to be concerned.

In step 30, the allocated CPU core is written into the CPU set of the Linux system according to the allocation record of the Kubelet, so that the container Pod is bound to the allocated CPU core, thereby implementing affinity binding of the CPU.

If the previously allocated CPU core is of the exclusive type, after the binding of the CPU core is completed in step 30, the scheduling task of the Linux system prohibits other processes from running on the CPU core, and the container Pod process monopolizes the CPU to run. If the previously allocated CPU core is of a non-exclusive type, it is also possible that other processes continue to be scheduled to run on this CPU core after the CPU core binding is completed in step 30, for example, some processes on a non-kubernets container platform or processes on other containers Pod in the Linux system, and that other processes are not prevented from being scheduled on this CPU core.

For the exclusive type CPU core, before the process in the container Pod is bound to the exclusive CPU core, any process cannot be scheduled to run on the CPU core; once the process in a certain container Pod is bound to the CPU core, only the process in the container Pod can be scheduled to run on the CPU core, and other processes cannot be scheduled to run on the CPU core. In this way, complete physical isolation, i.e. strong isolation, of the container Pod is achieved by the cpu set of Linux.

Of course, a strongly isolated CPU binding is the most strict way to bind, because once bound, all other processes can no longer use the CPU, so that the utilization of the CPU is reduced. Therefore, if the container Pod does not have the necessity of monopolizing the CPU operation, it can be bound preferentially on the CPU core of the non-exclusive type; when the exclusive CPU is necessary to run, the exclusive CPU is bound on the CPU core of the exclusive type. Therefore, the utilization rate of CPU resources can be improved while the Pod performance is ensured.

Example 4

On the basis of the kubernets container platform CPU core binding method provided in embodiments 1 to 3, an embodiment of the present invention further provides a kubernets container platform CPU core binding apparatus, which can be used to implement the methods in embodiments 1 to 3.

As shown in fig. 4, the apparatus provided by the embodiment of the present invention mainly includes a CPU aware component, a CPU binding devicepugin component, a CPU binding request verification component, and a CPU binding setting component. The CPU perception component, the CPU binding DevicePlugin component and the CPU binding setting component can be respectively set for each server node, and only one CPU binding request verification component can be set and is commonly used by all server nodes. The specific functions of the components are described below:

the CPU perception component is used for collecting available CPU cores of each server node and corresponding CPU topology information at regular time, and further generating an available CPU list based on the acquired information; for example, the system can be obtained by/proc/cpuinfo under Linux system (proc file shows software and hardware information of the system). Specifically, the daemoneset Pod of the CPU aware component is operated at each server node, the CPU topology, the corresponding NUMA topology and the PCI topology can be updated at regular time by monitoring the index of the system, and the available CPU list is updated, and the generated available CPU list can be stored in the ETCD database; where daemonset represents the resource type that runs the container Pod on all server nodes in the Kubernetes cluster. For a more specific function implementation process, reference may be made to the related description in embodiment 1, which is not described herein again.

The CPU core binding DevicePlugin component is used for acquiring the available CPU list from the CPU perception component, reporting the available CPU list to a Kubelet based on a DevicePlugin mechanism, so that the Kubelet allocates a corresponding CPU core to a container Pod according to the specified CPU core binding information when the container Pod is created and the reported available CPU list, and records allocation information. As can be seen from embodiment 1, the CPU core devicepugin component itself is also reported to Kubelet, so as to be called by Kubelet when CPU core allocation is performed subsequently. The CPU core binding DevicePlugin component can manage the CPU resource self-defined by the server node as a device through a DevicePlugin mechanism so as to realize schedulable.

The CPU binding setting component is used for writing the distributed CPU core into a CPU set of the Linux system according to the distribution record of the Kubelet, so that the container Pod is bound to the distributed CPU core, and the affinity binding of the CPU is realized. Wherein the component constantly monitors the container PodAPI of the kubernets container platform and triggers when a container Pod is created or the container Pod status is changed (e.g., restarted, etc.). When the container Pod is allocated to the exclusive CPU, the CPU binding setting component calls a cgroup CPU set mechanism of Linux, so that the container pods belonging to different CPUs can be ensured to be physically isolated from each other all the time, and the complete physical isolation of the container pods is realized; therefore, the CPU bind setting component may also be considered as an executor of CPU bind isolation at this time.

The CPU core binding request verification component mainly verifies a user request based on a webhook mechanism of a Kubernetes container platform; here, webhook denotes an HTTP callback mechanism used by Kubernetes to receive and process admission requests. Specifically, after a corresponding CPU binding request is submitted to the CPU binding request verification component based on the CPU binding information, the CPU binding request verification component is used for verifying whether the CPU binding request is legal or not, and if so, a CPU binding list is generated by a Kubernetes container platform according to the CPU binding information. The container Pod creation or container Pod change operation of the non-binding core request can be distinguished through verification, the allocation and binding of the CPU core are executed after the verification user request passes, the allocation and binding of the CPU core are not required to be executed if the verification fails, and the effective utilization of the CPU resource can be further ensured.

Further, the CPU binds to the deviceplug component is implemented based on the Device plug and DeviceManager mechanisms native to the kubernets container platform, and mainly uses the following two functions of the Device plug: List-Watch and Allocate.

For the List-Watch, the Kubelet calls the API to acquire the latest available CPU List from the CPU perception component, so that the discovery and the state updating of the available CPUs are realized. That is, the List-Watch is mainly used to implement the intermediate communication between the CPU binding DevicePlugin component and the CPU aware component.

For the Allocate, when the Kubelet creates a container to use the CPU device, the Kubelet calls the API to execute a corresponding operation of the CPU device, and notifies the Kubelet of the configuration of the CPU, the volume, the environment variables, and the like required for initializing the container; wherein the volume represents a data directory accessible by the container in the Pod.

The interaction process between the CPU bound deviceplipugin component and Kubelet may refer to fig. 5, which is as follows:

1) when the CPU core binding DevicePlugin component is started, the CPU core binding DevicePlugin component can be registered to a Kubelet through kubelelet sock in a gPC mode, so that the registered CPU core binding DevicePlugin component can be called when the Kubelet creates a container Pod in the following process; and reporting the CPU resource in the available CPU list to the Kubelet when the component is registered. In addition, the CPU core binding DevicePlugin component also provides monitoring UnixSocket and device identifier specification of the component to Kubelet; wherein, the device identifier specification is used as an extended resource, and records specified CPU binding information, such as cpu.k 8s.io/pin-app1: "1", which indicates that 1 CPU is specified to be bound.

2) The Kubelet exposes the CPU devices in the available CPU list to the Kubernetes Node state, sends the device identifier description to the APIserver according to the requirement of the extended resource, and the APIserver identifies the character string of the device identifier description. By identifying the character string, the Kubelet can determine how to perform subsequent distribution, and if the CPU core binding request Pod spec is verified to be legal, the Kubelet can generate a CPU core binding list based on the device identifier description; where Podspec represents the object specification of Kubernetes to describe the expected state of a Kubernetes resource object. Meanwhile, based on the provided monitoring Unix Socket, the Kubelet can monitor the CPU sensing component and timely acquire the latest available CPU List by binding the List-Watch interface of the DevicePlugin component with the CPU.

3) When creating a container Pod, the kubernets Scheduler (i.e. Scheduler) will schedule the container Pod onto a server node that meets the condition according to the CPU binding request Podspec.

4) After the dispatcher determines the server node where the container Pod is located, the Kubelet of the server node where the container Pod is located calls an allocation method of the CPU core binding DevicePlugin component, and the container Pod is dispatched to the CPU which meets the dispatching condition to complete the designated allocation. And subsequently finishing the CPU binding through the CPU binding core setting component.

Through the device provided by the embodiment of the invention, the CPU resource of each server node can be used as a schedulable device resource to be issued to Kubelet by fully utilizing a DevicePlugin mechanism of a Kubernets container platform. After the container Pod designates the core binding CPU, the Kubelet can allocate a corresponding CPU core to the container Pod from the available equipment resource pool according to the designated core binding requirement, so that the affinity binding of the container Pod and the designated CPU core is realized, and the stable operation of the CPU sensitive application is met.

Example 5

Based on the apparatus for binding the CPU of the kubernets container platform provided in embodiment 4, an embodiment of the present invention further provides a specific implementation manner provided by the method for binding the CPU of the kubernets container platform. As shown in fig. 6, the method mainly comprises the following steps:

step 401, starting a daemoneset Pod of the corresponding CPU core binding component at each server node. Each server node needs to start a corresponding CPU perception component, a CPU core binding DevicePlugin component and a CPU core binding setting component, and the CPU core binding request verification component only needs to be configured and scheduled to run by one Pod.

Step 402, the CPU aware component periodically collects available CPU cores of each server node and corresponding CPU topology information, and generates an available CPU list based on the collected information. The CPU topology information comprises available CPU information, NUMA topology information and NUMA node information to which the PCI equipment belongs.

In step 403, the CPU bind devicepugin component registers with the Kubelet according to the generated available CPU list. Two parts of registration are needed, one is that the CPU binds the DevicePlugin component itself for subsequent Kubelet call; and secondly, the available CPU list can be stored in a CPU equipment pool of the Kubelet, so that the CPU can be conveniently and directly selected from the equipment pool for distribution in the follow-up process.

Step 404, when creating container Pod on kubernets container platform, user appoints CPU binding information according to service requirement. Specifically, the specification can be performed by four ways described in embodiment 1 and embodiment 2, which are not described herein again.

Step 405, the container Pod generates a corresponding CPU binding request based on the CPU binding information, and submits the CPU binding request to the CPU binding request validation component.

And step 406, the CPU core binding request verifying component verifies whether the received CPU core binding request is legal, and if so, a CPU core binding list is generated by the Kubernets container platform according to the CPU core binding request. Specifically, whether the request is legal or not is judged by judging whether the CPU core binding request carries a specific request identifier or not.

Step 407, the Kubelet calls the allocation method of the CPU core binding devicepugin component according to the reported available CPU list and CPU core binding list, allocates corresponding CPU devices to the container Pod, and records the allocated information.

And step 408, the CPU binding setting component writes the distributed CPU into a CPU set of the Linux system according to the distribution record stored by the Kubelet, so as to realize affinity binding between the CPU and the container Pod. If the selected CPU is an exclusive CPU configured by the Linux kernel isoLCpus, the scheduling task of the Linux system prohibits other processes from running on the CPU kernel, and the container Pod process is strongly isolated and exclusively runs the CPU.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A Kubernetes container platform CPU core binding method is characterized by comprising the following steps:

2. The method for kubernets container platform CPU binding according to claim 1, wherein the devicepugin mechanism based on kubernets container platform reports the available CPU list to Kubelet, specifically:

3. The method for CPU binding of a kubernets container platform according to claim 1, wherein the step of designating CPU binding information when creating a container Pod on the kubernets container platform, so that a Kubelet allocates a corresponding CPU core to the container Pod according to the reported available CPU list and the CPU binding information, and records allocation information specifically includes:

4. The Kubernetes container platform CPU binding method according to claim 3, wherein when a service application corresponding to a container Pod is started, if the service application needs to perform CPU binding on the container Pod, a specific request identifier is set and attached to a corresponding CPU binding request after the container Pod is created;

5. The method of Kubernetes container platform CPU core binding according to claim 4, wherein the Kubelet selects a eligible CPU core from the list of available CPU cores to allocate to a container Pod according to the list of CPU cores, specifically:

6. The kubernets container platform CPU core binding method according to claim 1, wherein when creating a container Pod on the kubernets container platform, the specifying CPU core binding information is specifically performed in a manner that:

7. The kubernets container platform CPU core binding method according to claim 6, wherein when specifying a specific number of CPU core that a container Pod needs to bind, Kubelet selects a CPU core of a corresponding number from the available CPU list to allocate to the container Pod;

8. The Kubernetes container platform CPU core binding method of claim 1, wherein after the Kubelet assigns a corresponding CPU core to a container Pod according to the reported available CPU list and the CPU core binding information, and records the assignment information, the method further comprises:

9. The method of kubernets container platform CPU binding as claimed in claim 1, wherein before the kubernets container platform based devicepugin mechanism reports the list of available CPUs to Kubelet, the method further comprises: when a Linux system kernel is started, CPU kernels in a cluster are divided into an exclusive type and a non-exclusive type; after the container Pod is bound with any exclusive type of CPU core, the scheduling task of the Linux system prohibits other processes from running on the CPU core, so that the processes on the container Pod are exclusively run on the CPU core.

10. An apparatus for kubernets container platform CPU core binding, characterized in that, the method for performing kubernets container platform CPU core binding according to any of claims 1-9 comprises a CPU sensing component, a CPU binding devicepugin component, a CPU core binding setting component and a CPU core binding request verifying component;