WO2023045467A1

WO2023045467A1 - Container cpu resource scheduling and isolation method and apparatus, and storage medium and electronic device

Info

Publication number: WO2023045467A1
Application number: PCT/CN2022/102750
Authority: WO
Inventors: 郭天
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-09-26
Filing date: 2022-06-30
Publication date: 2023-03-30
Also published as: CN115858083A

Abstract

Provided are a container CPU resource scheduling and isolation method and apparatus, and a storage medium and an electronic device. The method comprises: a container orchestration engine planning and creating a plurality of resource pods, and causing each node to divide its own CPU resources by resource pod; the container orchestration engine acquiring a container creation request instruction, wherein the container creation request instruction carries a resource pod label that is expected to enter; the container orchestration engine determining a target node according to the resource pod label and the state of the resource pod corresponding to each node; and the container orchestration engine sending a container creation instruction to an execution agent module of the target node, such that the execution agent module creates a container on the target node and binds the container with a CPU core corresponding to the resource pod. Therefore, the problem of different types of pods having a precise isolation requirement for CPU resources is solved.

Description

Container CPU resource scheduling and isolation method and device, storage medium and electronic equipment

Cross References to Related Applications

This disclosure is based on the Chinese patent application CN202111132020.0 filed on September 26, 2021 with the title of "Container CPU resource scheduling and isolation method and device, storage medium and electronic equipment", and claims the priority of this patent application, by reference All the disclosed content is incorporated into this disclosure.

technical field

The present disclosure relates to the field of computer technology, and in particular, to a container CPU resource scheduling and isolation method and device, a storage medium, and electronic equipment.

Background technique

Kubernetes is currently the most mainstream and widely used open source container computing platform in the industry. It allows users to easily and efficiently deploy container applications on a batch of common infrastructure nodes, and provides a set of application deployment, planning, updating, and maintenance of the whole life Cycle management mechanism to meet different actual needs.

However, native Kubernetes treats the resources on each node as a whole. After excluding the resources reserved for the system, all remaining resources on the node are incorporated into a separate schedulable resource pool by Kubernetes. When the scheduler selects nodes for container pods, it uses the total vacancy of this schedulable resource pool To evaluate; when the pod is running on the node, the resources of the entire schedulable pool can be used, and precise core binding or isolation control cannot be performed.

For the above-mentioned problem that the resource management and scheduling mechanism of native Kubernetes cannot meet the precise isolation requirements of different types of pods for CPU resources, no effective solution has been proposed yet.

Contents of the invention

Embodiments of the present disclosure provide a container CPU resource scheduling and isolation method and device, a storage medium, and electronic equipment to at least solve the problem that the resource management and scheduling mechanism of native Kubernetes cannot meet the precise isolation requirements of different types of pods for CPU resources. .

According to one aspect of the present disclosure, a container CPU resource scheduling and isolation method is provided, including:

The container orchestration engine plans and creates resource pools, and makes each node divide its own CPU resources into resource pools; the container orchestration engine obtains container creation information; wherein, the container creation information includes the desired resource pool label; the container orchestration engine according to The resource pool label and the state of the resource pool corresponding to each node determine the target node; the container orchestration engine sends a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and The container is bound to the CPU core corresponding to the resource pool.

According to another aspect of the present disclosure, a container CPU resource scheduling and isolation method is provided, including: the execution agent module on the current node receives the container creation instruction sent by the container orchestration engine, wherein the above container creation instruction carries the container creation instruction Configuration data and configuration information of the resource pool to be bound, the above configuration information includes the resource pool label; the above-mentioned execution agent module calls the container runtime interface CRI on the node for business execution to create a container according to the above-mentioned container creation instruction and creates a container according to the above-mentioned resource pool The label determines the corresponding CPU core index; the above resource pool label and the corresponding CPU core index are sent to the above CRI, so that the CRI binds the above container to the CPU core corresponding to the above resource pool; the execution agent module sends the container creation result Information to the scheduler of the above container orchestration engine.

According to another aspect of the present disclosure, there is provided a container CPU resource scheduling and isolation device, including: a first creation unit configured to create a resource pool, and make each node divide its own CPU resources into resource pools; an acquisition unit, It is set to obtain container creation information; wherein, the container creation information includes the desired resource pool label; the determination unit is configured to determine the target node according to the resource pool label and the state of the resource pool corresponding to each node; the sending unit , it is set that the container orchestration engine sends a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and binds the container to the CPU core corresponding to the resource pool.

According to another aspect of the present disclosure, there is provided a CPU resource scheduling and isolation device, including: a receiving unit configured to receive a container creation instruction sent by a container orchestration engine, wherein the container creation instruction carries configuration data to be created and The configuration information of the resource pool to be bound, the configuration information includes the resource pool label; the second creation unit is configured to call the container runtime interface CRI on the node of the service execution to create the container according to the container creation instruction and The container is bound to the CPU core corresponding to the resource pool; the first sending unit is configured to send the resource pool label and the corresponding CPU core index to the CRI, so that how many CRIs associate the container with the The CPU core corresponding to the resource pool is bound; the second sending unit is configured to send task creation result information to the scheduler of the container orchestration engine.

According to yet another embodiment of the present disclosure, there is also provided a computer-readable storage medium, in which a computer program is stored in the above-mentioned computer-readable storage medium, wherein the above-mentioned computer program is configured to execute any one of the above-mentioned method embodiments when running in the steps.

According to yet another embodiment of the present disclosure, there is also provided an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute any one of the above method embodiments in the steps.

Description of drawings

FIG. 1 is a block diagram of a hardware structure of a communication device according to a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

Fig. 2 is a flowchart of a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

Fig. 3 is a flowchart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

4 is a schematic diagram of a node scheduling state of a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

5 is a schematic diagram of a node running process of a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

6 is a schematic diagram of a node CPU resource pool according to a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a container CPU resource scheduling and isolation system according to an embodiment of the present disclosure;

FIG. 8 is a flow chart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

FIG. 10 is a flow chart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

FIG. 11 is a flowchart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

Fig. 12 is a flowchart of another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of node resource division according to a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

14 is a schematic diagram of node resource division according to another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

15 is a schematic diagram of node resource division according to another container CPU resource scheduling and isolation method according to an embodiment of the present disclosure;

Fig. 16 is a schematic structural diagram of a container CPU resource scheduling and isolation device according to an embodiment of the present disclosure;

Fig. 17 is a schematic structural diagram of another container CPU resource scheduling and isolation device according to an embodiment of the present disclosure.

Detailed ways

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings and in combination with the embodiments.

It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

The method embodiments provided in the embodiments of the present application may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for scheduling and isolating container CPU resources according to an embodiment of the present disclosure. As shown in Figure 1, the mobile terminal may include one or more (only one is shown in Figure 1) processors 102 (processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.) and a memory 104 configured to store data, wherein the mobile terminal may further include a transmission device 106 and an input/output device 108 configured to communicate. Those skilled in the art can understand that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal. For example, the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .

The memory 104 can be set to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the container CPU resource scheduling and isolation method in the embodiment of the present disclosure, the processor 102 runs the computer program stored in the memory 104 program, so as to execute various functional applications and data processing, that is, to realize the above-mentioned method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is configured to receive or transmit data via a network. The specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, referred to as C for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet in a wireless manner.

Fig. 2 is a flowchart of a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure. As shown in Fig. 2, the process includes the following steps:

S202, the container orchestration engine plans and creates a resource pool, and makes each node divide its own CPU resources according to the resource pool;

S204, the container orchestration engine acquires container creation information; wherein, the container creation information includes the desired resource pool label;

S206. The container orchestration engine determines the target node according to the resource pool label and the state of the resource pool corresponding to each node;

S208, the container orchestration engine sends a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and binds the container to the CPU core corresponding to the resource pool.

In the embodiment of the application, the container orchestration engine may include a Kubernetes platform, where a container may be the smallest business abstraction unit pod that Kubernetes can manage, and a pod may contain one or more containers. The resource pool label may be the name of a different resource pool obtained by dividing multiple CPUs, and the resource demand may be the time length of the occupied CPU, which is not limited here.

Through the embodiments of the present disclosure, the scheduler is used to obtain the creation instruction of the container from the container orchestration engine; wherein, the above-mentioned creation instruction carries the resource pool label and the resource demand that the above-mentioned container expects to enter; according to the state information of the resource pool corresponding to each node at present , select the target node that matches the above resource pool label and resource demand; the method of creating the above container on the above target node; because according to the status information of the resource pool corresponding to each node at present, select the target node that matches the above resource pool label and resource demand Matching target nodes can precisely control the CPU resources bound to the container and allow evaluation and scheduling in units of resource pools, and can more flexibly and accurately control and isolate CPU resources.

In one or more embodiments, before the above-mentioned step S202, the above-mentioned container orchestration engine plans and creates a resource pool, and makes each node divide its own CPU resources according to the resource pool, it includes:

Allocating the CPU resources corresponding to the above nodes to obtain an allocation result; wherein, the above allocation result is used to create a resource pool for the node to which the above container orchestration engine belongs;

Obtain a node configuration file, wherein the node resource pool configuration information is recorded in the above node configuration file;

Obtain the list of registered nodes, store and initialize the resource pool usage status information of the above registered nodes;

After the above-mentioned scheduler obtains the configuration information of all current nodes, it sends a resource pool initialization instruction to the execution agent module corresponding to each node; wherein, the above-mentioned resource pool initialization instruction is used to make the above-mentioned execution agent module divide the CPU core on the node into Several tag CPU groups, and match the above tag CPU groups to different resource pools.

In one or more embodiments, in step S206, the container orchestration engine determines the target node according to the resource pool label and the state of the resource pool corresponding to each node, including:

The container orchestration engine selects a target node that matches the label of the resource pool and the resource demand according to the state information of the resource pool corresponding to each node.

In one or more embodiments, the container orchestration engine selects target nodes that match the above resource pool labels and resource requirements according to the status information of the resource pools corresponding to the current nodes, including:

Screen candidate target nodes from the above-mentioned current nodes according to the status information of the resource pools corresponding to the current nodes; wherein, each of the above-mentioned nodes includes CPU resources and memory resources, and the above-mentioned resource pools include resource amounts of CPU groups with different labels;

When the candidate target nodes are selected, the scheduler determines the target nodes satisfying the scheduling of non-CPU resources from the candidate target nodes, and creates a container on the target node and assigns the container to the CPU corresponding to the resource pool The nucleus binds.

In one or more embodiments, the above method further includes: the scheduler obtains a request message for adding a new node from the container orchestration engine, wherein the above request message carries the access interface information of the execution proxy module corresponding to the above new adding node;

The above-mentioned scheduler sends a resource pool initialization instruction to the execution agent module of the above-mentioned newly added node, wherein the above-mentioned initialization instruction includes parameters of one or more resource pools to be created, and the above-mentioned resource pool parameters include a resource pool label and a CPU group;

When the scheduler receives the resource pool creation success message sent by the execution agent module of the newly added node, the scheduler stores the resource pool state information of the newly added node in the database.

In one or more examples, the method for scheduling and isolating container CPU resources further includes: if no candidate target node is selected, the scheduler suspends or terminates the creation task corresponding to the creation instruction.

In one or more examples, after the above-mentioned scheduler determines the target node satisfying the scheduling of non-CPU resources from the above-mentioned candidate target nodes, the above-mentioned container CPU resource scheduling and isolation method further includes: the above-mentioned scheduler deducts the target node for access The resource consumed by the container corresponding to the node, and update the resource status of the resource pool corresponding to the access node.

Fig. 3 is a flowchart of a container CPU resource scheduling and isolation method according to an embodiment of the present disclosure. As shown in Fig. 3 , the process includes the following steps:

S302. The execution agent module on the current node receives a container creation instruction from the scheduler, wherein the container creation instruction carries configuration data to be created and configuration information of a resource pool to be bound, and the configuration information includes a resource pool label;

S304, the above execution agent module calls the container runtime interface CRI on the node for service execution to create a container according to the above container creation instruction, and determines the corresponding CPU core index according to the above resource pool label;

S306. Send the above-mentioned resource pool label and the corresponding CPU core index to the above-mentioned CRI, so that the above-mentioned CRI binds the above-mentioned container to the CPU core corresponding to the above-mentioned resource pool;

S308, the execution agent module sends container creation result information to the above-mentioned scheduler.

Through the embodiments of the present disclosure, the execution agent module on the current node is used to receive the container creation instruction sent by the container orchestration engine, wherein the above-mentioned container creation instruction carries the configuration data of the container to be created and the configuration information of the resource pool to be bound. The configuration information includes the resource pool label; the above-mentioned execution agent module calls the container runtime interface CRI on the node where the service is executed according to the above-mentioned container creation instruction to create a container and determines the corresponding CPU core index according to the above-mentioned resource pool label; the above-mentioned resource pool label And the corresponding CPU core index is sent to the above-mentioned CRI, so that the above-mentioned CRI binds the above-mentioned container to the CPU core corresponding to the above-mentioned resource pool; the execution agent module sends the container creation result information to the scheduler of the above-mentioned container orchestration engine; because according to the current The status information of the resource pool corresponding to each node, select the target node that matches the above resource pool label and resource demand, can accurately bind and isolate CPU resources, accurately control the CPU resources bound to the container, and allow the resource pool to be used as the Units are evaluated and scheduled, and the technical effects of CPU resource control and isolation can be more flexible and accurate.

In one or more embodiments, before the execution agent module on the current node receives the container creation instruction sent by the container scheduler orchestration engine, it also includes:

After the proxy node accesses the container orchestration engine, it receives the resource pool initialization instruction sent by the scheduler of the container orchestration engine;

Divide the CPU cores on the above nodes into several tag CPU groups according to the above initialization instructions, and match the tag CPU groups to different resource pools;

The execution agent module sends resource pool initialization result information to the above-mentioned scheduler.

In one or more embodiments, the above-mentioned dividing the CPU resources on the agent node into several groups according to the above-mentioned initialization instructions includes: using the cgroup subsystem of the Linux kernel to build multiple CPU groups according to the configuration requirements of the above-mentioned resource pool, wherein , each CPU group includes preset CPU cores.

Pod is the smallest business abstraction unit that Kubernetes can manage. A pod can contain one or more containers. Users will write business orchestration blueprints according to actual needs, which will require the creation of one or more business pods. After the blueprint is submitted to Kubernetes, the built-in native scheduler of Kubernetes will evaluate all the nodes under its jurisdiction, combine the available resources of each node, the resource requirements of the Pod, and other factors, and finally decide which node to build the Pod on. The Kubernetes scheduler continuously monitors the running status of each node and all Pods to ensure that the resources of all nodes can be fully utilized, and at the same time, there will be no overloading of some nodes or failure of some Pods to obtain the resources they need. Phenomenon.

As shown in Figure 4, the scheduler needs to continuously monitor the current available resources on each node; the scheduler needs to ensure that the workload carried by all nodes is basically balanced; when creating a new Pod, the scheduler needs to combine the resource requirements of each Pod The available resources and workload of the node determine which node the Pod is created on.

For each node, there are several types of available resources, but the main objects of general evaluation are CPU and memory:

CPU is generally measured by CPU time/second. For example, for an 8-core node, the total available CPU resources within 1 second is 8 (8000 if milliseconds are used as the unit of measurement); while memory is directly measured by size. The unit of measurement, for example, if there is 16G memory on a node, then the total maximum available memory is 16G.

When a Pod is created, it will consume a portion of CPU and memory resources (can be declared in the blueprint), and these consumed resources will be deducted from the total resources of the node. If the current available resources of a node can no longer meet the needs of a Pod, the Kubernetes scheduler will not let the Pod be built on this node, but will find another node with sufficient resources. If the current available resources of all nodes do not meet the conditions, the creation process of the Pod will be suspended, and the scheduler will continue to monitor the resource status of all nodes, and wait for the resources of a certain node to be available before executing the creation of the Pod.

For a certain type of resource on a node, Kubernetes treats it as a whole. Especially for the CPU, although there may be multiple CPU cores on a node, the scheduler schedules them as a CPU time resource pool during work, and does not distinguish which core the resource belongs to.

For a node running multiple Pods, it is uncertain which Pod process is carried on a specific CPU core at a certain moment. For example, for core 0, the process of Pod A may be running at time 1, and the process of Pod B may be running at time 2, and the situation is similar for other cores. As shown in Figure 5: the scheduling of the above Pod processes between different CPU cores is not implemented by Kubernetes, but by the kernel scheduler of the node operating system. The task of the kernel scheduler of the operating system is to distribute all processes running on the node to all CPU cores in a balanced manner, so as to make full use of all CPU resources as much as possible. From this point of view, the kernel scheduler is very similar to the Kubernetes scheduler, except that their scheduling object levels are different: the Kubernetes scheduler is responsible for scheduling Pods to appropriate nodes, and balancing the load of all nodes as much as possible ; The kernel scheduler is responsible for reasonably allocating all Pod processes to all CPU cores on a specific node, so as to make all CPU cores fully utilized at all times and maintain load balance as much as possible.

A node will not only run pods, but also the core processes of the operating system and Kubernetes management processes will consume certain resources. If the scheduler does not reserve part of the system process and Kubernetes management process when calculating node resources, it may cause the Pod process to preempt the resources of these key processes, causing the node to work abnormally. To this end, Kubernetes specifically provides system resource reservation parameters, which allow system administrators to reserve some resources for system processes and management processes. Kubernetes will exclude these resources from the visible range of the scheduler to ensure that Pods will not occupy them. this part of the resource. Taking CPU resources as an example, as shown in Figure 6, the CPU resources on a node are divided into three blocks, among which the resources reserved for Pods are called allocatable pools, which do not contain management processes for the system or Kubernetes itself Reserved resources, so PODs running on this node will never occupy the reserved CPU resources.

In actual scenarios, there may be some key core services (such as telecommunication services or real-time video processing services), which have extremely high requirements on the stability of processing time and response delay, and a context switch caused by a thread preemption will affect the Its performance has a noticeable impact. Therefore, this type of business often has the need to monopolize CPU cores, and it will even explicitly specify that CPU cores with a specific number are to be monopolized, and no other process is allowed to use these cores.

There is another common requirement in reality: users want different types of PODs to have a certain degree of isolation protection in terms of resource usage and allocation. For example, if there are 3 A-type PODs and 3 B-type PODs on a node at the same time, then when one of the A-type PODs has an exception (such as an infinite loop) that takes up too much CPU resources, only the A-type PODs PODs will be affected, Class B PODs will not be affected in any way.

Obviously, native Kubernetes cannot meet the above two isolation requirements. Kubernetes regards the resources on each node as a whole. After excluding the resources reserved for the system, all remaining resources on the node are incorporated into a separate schedulable resource pool by Kubernetes. When the scheduler selects nodes for POD, it is based on the total vacancy of this schedulable resource pool. Evaluated; when the POD is running on the node, the resources of the entire schedulable pool can be used, and precise core binding or isolation control cannot be performed.

In order to solve the above problems, in an application embodiment, a container CPU resource scheduling and isolation method is provided. The applicable environment requirements for the above method are as follows: use Kubernetes to manage the container application environment; as shown in Figure 7, this The application also provides a container CPU resource scheduling and isolation device, including: module A: interface server (API Server), module B: database, module C: container runtime; module D: enhanced scheduler, module D: execution agent module (kubelet); where:

Module A is responsible for providing user interaction interface and functional interface. Users can manage and configure Kubernetes clusters through the interfaces or interfaces provided by this module, and at the same time create and manage various business Pods and related objects.

Module B is responsible for the internal management of the Kubernetes system and the access and persistence of state data. It will store the user configuration data input from module A, and return the data set queried by the user to module A; it will also store the node status and resource status information returned from module D, and respond to the data query request sent by subsequent module D.

Module C runs on the node and is responsible for accepting the request from module E, creating, deleting or configuring containers and images on the node, and responding to the container status query request.

Module D is responsible for maintaining resource status information of all nodes, scheduling decisions of all Pods, and sending execution instructions to module E.

Module E runs on the node and is responsible for accepting and responding to instructions from module D, initializing resource pools on the node, interacting with module C, managing the lifecycle of containers and binding CPU cores.

In the embodiment of this application, the key function of the scheduler is to perform unified evaluation and scheduling on multiple CPU resource pools of all nodes, and select an appropriate node for it according to the requirements of the POD; while the execution agent module runs on the node and is responsible for The scheduler's instructions create resource pools and PODs, and bind the two when containers are created.

In an optional embodiment, as shown in FIG. 8 , the method for scheduling and isolating container CPU resources includes: S802: Read the configuration and obtain the configuration information of the node resource pool. Specifically, before the environment is deployed, the user should pre-plan Configure the resource pool on a node with a specific specification, and record the configuration information in the corresponding configuration file. When the enhanced scheduler starts, it will read the configuration file to know the expected configuration information of the node resource pool.

S804: Obtain a node list, and initialize node resource states. Specifically, the enhanced scheduler obtains a registered node list from a built-in Kubernetes database, and records and initializes resource pool usage state information of these nodes in the database. When the system is just started, the node list may be empty, but the new node management action will trigger the scheduler's node addition process to complete the initialization action.

S806: Send a resource initialization instruction to the nodes. Specifically, after the scheduler has learned the information of all currently managed nodes, it sends a resource pool initialization instruction to the execution agents of each node. The execution agent should divide the CPU cores on the node into several groups according to the instructions, and put them into different resource pools for subsequent use when creating PODs.

In an optional embodiment, as shown in FIG. 9, the method for scheduling and isolating container CPU resources includes:

S902: receiving a newly added node, specifically, the enhanced scheduler obtains a message that the new node is managed from other Kubernetes management services, and obtains access interface information of the execution agent program of the new node.

S904: Send a resource pool initialization instruction to the node, specifically, the enhanced scheduler sends a resource pool initialization instruction to the execution agent of the new node, which contains details of one or more resource pools that need to be created (resource pool labels and included CPU cores index, etc.).

S906: Determine whether the resource pool initialization is successful, specifically: if the execution agent replies that the resource pool has been successfully created, continue to step S908; if it fails, enter step S910, resend the creation instruction and continue to wait.

S908: Update the node status information, and the enhanced scheduler records the resource pool status information of the new node in the database.

In an optional embodiment, as shown in FIG. 10, the method for scheduling and isolating container CPU resources includes:

S1002: The enhanced scheduler acquires CPU resource scheduling messages from other Kubernetes management services.

S1004: The enhanced scheduler obtains the specific parameters of CPU resource scheduling from the CPU resource scheduling message, which includes the label of the resource pool that the Pod wants to enter and the resource demand (if these parameters are not configured, default values are used).

S1006: The enhanced scheduler evaluates the resource status of the resource pools corresponding to all nodes, and screens the access nodes according to the resource pool label and resource demand that the Pod wants to enter, combined with the status information of the current resource pools corresponding to each node maintained in the database.

S1008: If the target node can be selected, go to step S1010; if no node can be selected, stop the process directly and go to step S1018; the creation of the Pod will be suspended.

S1010: Evaluate other restricted items;

S1012: If a batch of optional nodes can be selected, the enhanced scheduler will execute the scheduling logic of the Kubernetes native scheduler on the basis of these nodes to filter again. This step is mainly to complete other non-CPU resources (such as memory, ports, etc.) etc.) scheduling screening. If no node can be selected by the screening, go to step S1018, the process will be terminated directly, and the creation of the Pod will be suspended.

S1014: Send a pod creation instruction to the selection node; a final access candidate node can be obtained. The enhanced scheduler sends instructions to the execution agent of the node, which contains the detailed information of the Pod to be created, so that the execution agent can create the Pod on the node and bind the resource pool.

S1016. Update the status information of the node resource pool. The enhanced scheduler needs to update the resource status of the resource pool corresponding to the access node in the database, and deduct the resources consumed by the access Pod from it.

Step S1018, the process is terminated, and the creation of the Pod will be suspended.

In an optional embodiment, as shown in FIG. 11, the method for scheduling and isolating container CPU resources includes:

S1102: The execution agent program reads the initial configuration and starts, and registers the node where it is located with the Kubernetes control node. This step is the same as that of the Kubernetes native execution agent program (Kubelet).

S1104 After the node where the execution agent is located is managed by Kubernetes, the enhanced scheduler will send a resource pool initialization command.

S1106: The execution agent needs to divide the CPU resources on the node into several groups according to the instruction, which belong to different resource pool objects. There are many ways to group CPUs. The most common method is to use the cgroup subsystem of the Linux kernel to create multiple cpusets according to the configuration requirements of the resource pool, and each cpuset contains specified CPU cores.

S1108: No matter whether the resource pool initialization succeeds or fails, the execution agent should report to the enhanced scheduler and end the whole process.

In an optional embodiment, as shown in FIG. 12, the method for scheduling and isolating container CPU resources includes:

S1202: The execution agent on a certain node receives a Pod creation instruction from the enhanced scheduler, which includes information such as container configuration data to be created, resource pools to be bound, and the like.

S1204: Read the Pod creation instruction, which includes information such as container configuration data that needs to be created, resource pools to be bound, and the like.

S1206: The execution agent invokes a container runtime interface (Container Runtime Interface, CRI for short) on the node to create a container according to the creation instruction. When created, it will find the corresponding cpuset index according to the resource pool label, and bind the container process to the corresponding cpuset.

S1208: No matter whether the container is successfully created or failed, the execution agent should feed back the creation result to the enhanced scheduler and end the whole process.

Based on the above embodiments, in an application embodiment, it is assumed that there is a Kubernetes environment with 3 nodes, and all nodes have 8 CPU cores. Under the resource management mechanism of native Kubernetes, a possible initial resource division of any node in the system is shown in Figure 13:

In Figure 13, the environment reserves 1.5 cores for operating system processes, 2 cores for Kubernetes’ own management processes, and 4.5 cores for pods; the native resource allocation mechanism only supports one pod reservation Group (allocated pool); the original resource reservation mechanism is divided by available CPU time (for example, reserving 1.5 cores means that 150ms of CPU time can be used within 100ms), so different reservation groups It is possible for processes to be assigned to the same CPU core for scheduling (this may lead to resource contention); at a certain moment, it is uncertain which CPU core a process in a different reservation group runs on . The position shown in FIG. 13 is only an example, without any limitation.

In the execution environment of an embodiment, as shown in Figure 14, one core is reserved for the operating system process, two cores are reserved for the Kubernetes self-management process, and two cores are reserved for Pod group A , 3 cores are reserved for Pod group B. Enhanced resource allocation mechanism supports multiple Pod reservation groups (multiple allocatable pools). The enhanced solution is to precisely bind CPU cores to reserved groups by creating multiple cpusets, and users can arbitrarily specify the mapping relationship between CPU cores and reserved groups. As can be seen from Figure 14, Pod Reserved Group B is bound to

CPUs

3, 4, and 5, and all Pod processes included in Pod Reserved Group B will only run on

CPUs

3, 4, and 5. The same applies to other reservation groups. In this embodiment, which CPU core (or which range) the processes in different reservation groups run on is determined at any time.

In one embodiment, assume that the current container CPU resource scheduling and isolation system has been running for a period of time, and the resource status of each node is shown in Figure 15. At this time, the user needs to create a new Pod, and the enhanced scheduler will read the Pod The resource group and other resource requirements related parameters in the blueprint are judged based on the current resource status of each node, and the most suitable node is selected to create a Pod.

In one embodiment, the Pod indicates that it wants to belong to reserved group A, and the minimum requirement for CPU resources is 120ms. In this case, the enhanced scheduler traverses all nodes and finds that only the resource status of node 0 can meet the requirements; after other resource judgments are passed, the scheduler sends the command to create a Pod to the execution agent of node 0, and simultaneously Pods reserve group A resources with a deduction of 120ms (the remaining 60ms).

In an embodiment, the Pod indicates that it wants to belong to reserved group B, and the minimum requirement for CPU resources is 60ms. In this case, the enhanced scheduler traverses all nodes and finds that the resources of

nodes

1 and 2 can meet the requirements, but the resources of node 1 are more relaxed than those of node 2 (240>100), so the preferred result is node 1. After all other resource judgments are passed, the scheduler sends a Pod creation command to the execution agent of Node 1, and at the same time deducts 60ms of Group B resources reserved for Pods of Node 1 (the remaining 180ms).

In one embodiment, the Pod indicates that it wants to belong to reserved group A, and the minimum requirement for CPU resources is 200ms. In this case, the enhanced scheduler traverses all nodes and finds that no node can meet the requirements, so it directly ends the judgment and suspends the Pod creation process.

In one embodiment, a Pod belonging to reserved group A runs abnormally after it is successfully created on node 0, causing it to consume all the time on CPU 0 and CPU 1 (these CPU cores belong to reserved Group A), at this time, the Pods belonging to reserved group B are not affected by it, and the operating system process and Kubernetes management process are not affected by it, and the creation and scheduling process of Pods belonging to reserved group B can still proceed normally.

During the pod creation phase, the execution agent calls the container runtime interface on the node to create the container. The current mainstream container runtimes in the industry support specifying the cpuset when creating a container, so the execution agent only needs to pass the cpu parameter provided by the enhanced scheduler to the container runtime interface to realize the binding between the container and the specified cpu core.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present disclosure can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the above-mentioned methods in various embodiments of the present disclosure.

This embodiment also provides a container CPU resource scheduling and isolation device. The device is used to implement the above-mentioned embodiments and preferred implementation modes, and what has been described will not be repeated. As used below, the term "module" may be a combination of software and/or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

Fig. 16 is a structural block diagram of a container CPU resource scheduling and isolation device according to an embodiment of the present disclosure. As shown in Fig. 16 , the device includes:

The first creating unit 1602 is configured to create a resource pool, and make each node divide its own CPU resources according to the resource pool;

The obtaining unit 1604 is configured to obtain container creation information; wherein, the container creation information includes the desired resource pool label;

The determining unit 1606 is configured to determine the target node according to the resource pool label and the state of the resource pool corresponding to each node;

The sending unit 1608 is configured as a container orchestration engine sending a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and binds the container to the CPU core corresponding to the resource pool.

In the embodiment of the present disclosure, the container orchestration engine may include a Kubernetes platform, where a container may be the smallest business abstraction unit pod that Kubernetes can manage, and a pod may include one or more containers. The resource pool label can be the name of different resource pools obtained by dividing multiple CPUs, and the resource requirement can be the capacity of the occupied CPU and the duration of the occupied CPU, which is not limited here.

Through the embodiments of the present disclosure, the scheduler is used to obtain the creation instruction of the container from the container orchestration engine; wherein, the above-mentioned creation instruction carries the resource pool label and the resource demand that the above-mentioned container expects to enter; according to the state information of the resource pool corresponding to each node at present , select the target node that matches the above resource pool label and resource demand; the method of creating the above container on the above target node; because according to the status information of the resource pool corresponding to each node at present, select the target node that matches the above resource pool label and resource demand Matching target nodes can accurately bind and isolate CPU resources, thereby achieving precise control of CPU resources bound to containers and allowing evaluation and scheduling in units of resource pools, and more flexible and accurate control and isolation of CPU resources technical effect.

Fig. 17 is a structural block diagram of a container CPU resource scheduling and isolation device according to an embodiment of the present disclosure. As shown in Fig. 17 , the device includes:

The receiving unit 1702 is configured to receive the container creation instruction sent by the container orchestration engine, wherein the above-mentioned container creation instruction carries the configuration information of the container configuration data to be created and the resource pool to be bound, and the above-mentioned configuration information includes the resource pool label;

The second creation unit 1704 is configured to call the container runtime interface CRI on the node for service execution to create a container according to the container creation instruction and determine the corresponding CPU core index according to the resource pool label;

The first sending unit 1706 is configured to send the resource pool label and the corresponding CPU core index to the CRI, so that how many CRIs bind the container to the CPU core corresponding to the resource pool;

The second sending unit 1708 is configured to send task creation result information to the scheduler of the container orchestration engine.

Through the embodiments of the present disclosure, the execution agent module on the current node is used to receive the container creation instruction from the scheduler, wherein the above-mentioned creation instruction carries the configuration data of the container to be created and the resource pool information to be bound, and the information of the above-mentioned resource includes resource Pool label; the above-mentioned execution agent module calls the container runtime interface CRI on the node for business execution according to the above-mentioned creation instruction to create a container; wherein, the above-mentioned CRI determines the CPU setting index corresponding to the above-mentioned node according to the resource pool label, and the above-mentioned container process Bind to the corresponding CPU to set the index; the execution agent module sends the task creation result information to the above-mentioned scheduler; according to the state information of the resource pool corresponding to each node at present, select the target node that matches the above-mentioned resource pool label and resource demand, It can accurately bind and isolate CPU resources, thereby achieving the effect of improving CPU resource utilization.

It should be noted that the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.

Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

In an exemplary embodiment, the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.

Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.

In an exemplary embodiment, the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.

Obviously, those skilled in the art should understand that each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

A container CPU resource scheduling and isolation method, comprising:

The container orchestration engine plans and creates resource pools, and makes each node divide its own CPU resources according to the resource pools;

The container orchestration engine obtains container creation information; wherein, the container creation information includes the desired resource pool label;

The container orchestration engine determines the target node according to the resource pool label and the state of the resource pool corresponding to each node;

The container orchestration engine sends a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and binds the container to the CPU core corresponding to the resource pool.
The method according to claim 1, wherein, before the container orchestration engine plans and creates resource pools, and makes each node divide its own CPU resources according to the resource pools, the steps include:

Allocating the CPU resources corresponding to each node to obtain an allocation result; wherein the allocation result is used to create a resource pool for the node to which the container orchestration engine belongs;

Obtaining a node configuration file, wherein the node resource pool configuration information is recorded in the node configuration file;

Obtain a list of registered nodes, store and initialize resource pool usage status information of the registered nodes;

After the scheduler obtains the configuration information of all current nodes, it sends a resource pool initialization instruction to the execution agent module corresponding to each node; wherein, the resource pool initialization instruction is used to make the execution agent module assign the CPU on the node to The cores are divided into several tagged CPU groups, and the tagged CPU groups are matched to different resource pools.
The method according to claim 1, wherein the container orchestration engine determines the target node according to the resource pool label and the state of the resource pool corresponding to each node, including:

The container orchestration engine selects a target node that matches the label of the resource pool and the resource demand according to the state information of the resource pool corresponding to each node.
The method according to claim 3, wherein the container orchestration engine selects a target node matching the resource pool label and resource demand according to the state information of the resource pool corresponding to each current node, including:

Screen candidate target nodes from the current nodes according to the status information of the resource pools corresponding to the current nodes; wherein, each of the nodes includes CPU resources and memory resources, and the resource pools include resource amounts of CPU groups with different labels;

When the candidate target nodes are screened out, the scheduler determines the target nodes satisfying the scheduling of non-CPU resources from the candidate target nodes, and creates a container on the target node and combines the container with the The CPU core corresponding to the resource pool is bound.
The method according to claim 4, wherein the method further comprises:

If no candidate target node is screened out, the scheduler suspends or terminates the creation task corresponding to the creation instruction.
The method according to claim 4, wherein, after the scheduler determines a target node that satisfies scheduling of non-CPU resources from the candidate target nodes, the method further comprises:

The scheduler deducts the resource consumed by the container corresponding to the accessed target node, and updates the resource status of the resource pool corresponding to the accessed node.
A container CPU resource scheduling and isolation method, comprising:

The execution agent module on the current node receives the container creation instruction sent by the container orchestration engine, wherein the container creation instruction carries the configuration data of the container to be created and the configuration information of the resource pool to be bound, and the configuration information includes the resource pool label ;

The execution agent module calls the container runtime interface CRI on the node for service execution according to the container creation instruction to create a container and determines the corresponding CPU core index according to the resource pool label;

sending the resource pool label and the corresponding CPU core index to the CRI, so that the CRI binds the container to the CPU core corresponding to the resource pool;

The execution proxy module sends container creation result information to the scheduler of the container orchestration engine.
The method according to claim 7, wherein, before the execution agent module on the current node receives the container creation instruction sent by the container orchestration engine, it further includes:

After the current node accesses the container orchestration engine, receiving a resource pool initialization instruction sent by the scheduler of the container orchestration engine;

The execution agent module divides the CPU core on the current node into several label CPU groups according to the initialization instruction, and matches the label CPU groups to different resource pools;

The execution agent module sends resource pool initialization result information to the scheduler.
The method according to claim 8, wherein the execution proxy module divides the CPU cores on the current node into several label CPU groups according to the initialization instruction, and matches the label CPU groups to different resource pools, including:

Using the cgroup subsystem of the Linux kernel, multiple CPU groups are built according to the configuration requirements of the resource pool, wherein each CPU group includes a preset CPU core.
A container CPU resource scheduling and isolation device, comprising:

The first creation unit is set to create a resource pool, and makes each node divide its own CPU resources according to the resource pool;

The acquisition unit is configured to acquire container creation information; wherein, the container creation information includes the desired resource pool label;

The determination unit is configured to determine the target node according to the resource pool label and the state of the resource pool corresponding to each node;

The sending unit is configured as a container orchestration engine sending a container creation instruction to the execution agent module of the target node, so that the execution agent module creates a container and binds the container to the CPU core corresponding to the resource pool.
A container CPU resource scheduling and isolation device, comprising:

The receiving unit is configured to receive the container creation instruction sent by the container orchestration engine, wherein the container creation instruction carries the configuration information of the container configuration data to be created and the resource pool to be bound, and the configuration information includes the resource pool label;

The second creation unit is configured to call the container runtime interface CRI on the node for service execution to create a container according to the container creation instruction and determine the corresponding CPU core index according to the resource pool label;

The first sending unit is configured to send the resource pool label and the corresponding CPU core index to the CRI, so that how many CRIs bind the container to the CPU core corresponding to the resource pool;

The second sending unit is configured to send task creation result information to the scheduler of the container orchestration engine.
A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to perform the operation described in any one of claims 1 to 6 or 7 to 9 when running. Methods.
An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform the process described in any one of claims 1 to 6 or 7 to 9 described method.