WO2024082584A1 - 资源分配方法、容器管理组件和资源分配系统 - Google Patents

资源分配方法、容器管理组件和资源分配系统 Download PDF

Info

Publication number
WO2024082584A1
WO2024082584A1 PCT/CN2023/089017 CN2023089017W WO2024082584A1 WO 2024082584 A1 WO2024082584 A1 WO 2024082584A1 CN 2023089017 W CN2023089017 W CN 2023089017W WO 2024082584 A1 WO2024082584 A1 WO 2024082584A1
Authority
WO
WIPO (PCT)
Prior art keywords
container group
information
node
container
resource
Prior art date
Application number
PCT/CN2023/089017
Other languages
English (en)
French (fr)
Inventor
杨庆东
杨业飞
涂会
李希伟
周光
周海锐
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2024082584A1 publication Critical patent/WO2024082584A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a resource allocation method, a container management component, and a resource allocation system.
  • Container cloud is an important type of current cloud computing platform. When customers build container clouds, they often deploy hundreds of physical servers to carry the services on the container cloud.
  • container cloud cluster management is mainly carried out through container orchestration tools. For example, containers are scheduled to working nodes in the cluster through Kubernetes, resources on working nodes are allocated to containers, and so on.
  • a resource allocation method comprising: a container management component obtains information of a container group scheduled to a working node, wherein the information of the container group includes CPU resource application information and storage resource application information; the container management component determines a NUMA node matching the container group from at least one non-uniform memory architecture NUMA node according to the CPU resource application information and storage resource application information of the container group and available resource information of the working node under at least one NUMA node, wherein the NUMA node matching the container group is a NUMA node that can simultaneously meet the CPU resource and storage resource application requirements of the container group; the container management component allocates the CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • the container management component determines a NUMA node matching the container group from the at least one NUMA node according to the CPU resource application information and the storage resource application information of the container group and the available resource information of the working node under at least one non-uniform memory architecture NUMA node, including: the container management component selects a NUMA node matching the container group from the at least one NUMA node according to the CPU resource application information of the container group and the available CPU resource information of the working node under at least one NUMA node.
  • a first NUMA node wherein the first NUMA node is a NUMA node that can meet the CPU resource application requirements of the container group; the container management component selects a second NUMA node from the first NUMA node according to the storage resource application information of the container group and the available storage resource information of the working node under the first NUMA node, and uses the second NUMA node as the NUMA node matching the container group.
  • the method further includes: after allocating the CPU resources and storage resources under the NUMA node matching the container group to the container group, the container management component updates the information of the container group according to the information of the NUMA node matching the container group.
  • the method further includes: after detecting that the information of the container group includes information of a NUMA node matching the container group, updating the available resource information of the working node under at least one NUMA node through a topology resource management component.
  • the method further includes: after detecting that the information of the container group includes information of a NUMA node matching the container group, setting a resource allocation completion flag in the information of the container group by the topology resource management component.
  • the method further includes: before the container management component obtains information about the container group allocated to the working node, scheduling the container group to be scheduled to the working node through the scheduling management component.
  • scheduling the container group to be scheduled to the working node through the scheduling management component includes: obtaining resource application information of the container group to be scheduled; determining a working node allocated to the container group to be scheduled from the multiple working nodes according to the resource application information of the container group to be scheduled and available resource information of multiple working nodes in the cluster under the NUMA dimension; and scheduling the container group to be scheduled to the working node allocated to the container group to be scheduled.
  • scheduling the container group to be scheduled to the working node allocated to the container group to be scheduled includes: updating information of the container group to be scheduled in a database according to information of the working node allocated to the container group to be scheduled.
  • scheduling the container group to be scheduled to the working node through the scheduling management component further includes: before determining the working node allocated to the container group to be scheduled from the multiple working nodes, filtering out the working nodes that have not completed resource allocation for the allocated container group from the multiple working nodes.
  • a container management component comprising: an acquisition module, configured to acquire information of a container group scheduled to a working node, wherein the information of the container group includes CPU resource application information and storage resource application information; a determination module, configured to determine the CPU resource application information and storage resource application information of the container group according to the CPU resource application information and storage resource application information of the container group;
  • the method comprises the steps of: storing resource application information and available resource information of the working node under at least one non-uniform memory architecture NUMA node, determining a NUMA node matching the container group from the at least one NUMA node, wherein the NUMA node matching the container group is a NUMA node that can simultaneously meet the application requirements of the CPU resources and storage resources of the container group; and allocating the CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • the determination module is configured to: select a first NUMA node from the at least one NUMA node based on the CPU resource application information of the container group and the available CPU resource information of the working node under at least one NUMA node, wherein the first NUMA node is a NUMA node that can meet the CPU resource application requirements of the container group; select a second NUMA node from the first NUMA node based on the storage resource application information of the container group and the available storage resource information of the working node under the first NUMA node, and use the second NUMA node as the NUMA node that matches the container group.
  • the container management component also includes: an update module, which is configured to update the information of the container group according to the information of the NUMA node matching the container group after the allocation module allocates the CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • an update module which is configured to update the information of the container group according to the information of the NUMA node matching the container group after the allocation module allocates the CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • a resource allocation system comprising: a container management component as described above; and a topology resource management component, configured to update the available resource information of the working node under at least one NUMA node after detecting that the information of the container group contains information of a NUMA node matching the container group.
  • the topology resource management component is further configured to: after detecting that the information of the container group includes information of a NUMA node matching the container group, set a resource allocation completion flag in the information of the container group.
  • the resource allocation system further includes: a scheduling management component configured to schedule the container group to be scheduled to the working node before the container management component obtains information about the container group allocated to the working node.
  • the scheduling management component schedules the container group to be scheduled to the working node, including: obtaining resource application information of the container group to be scheduled; determining a working node allocated to the container group to be scheduled from the multiple working nodes according to the resource application information of the container group to be scheduled and available resource information of multiple working nodes in the cluster under the NUMA dimension; and scheduling the container group to be scheduled to the working node allocated to the container group to be scheduled.
  • the scheduling management component scheduling the container group to be scheduled to the working node further includes: before determining the working node allocated to the container group to be scheduled from the multiple working nodes, filtering out the working nodes that have not completed resource allocation for the allocated container group from the multiple working nodes.
  • an electronic device comprising: a memory; and a processor coupled to the memory, wherein the processor is configured to execute the resource allocation method as described above based on instructions stored in the memory.
  • a computer-readable storage medium is further proposed, on which computer program instructions are stored, and when the instructions are executed by a processor, the above-mentioned resource allocation method is implemented.
  • FIG1 is a schematic diagram of a flow chart of a resource allocation method according to some embodiments of the present disclosure
  • FIG2 is a schematic diagram of a flow chart of a resource allocation method according to other embodiments of the present disclosure.
  • FIG3 is a schematic diagram of a process of scheduling a container group to a working node according to some embodiments of the present disclosure
  • FIG4 is a schematic diagram of a process of allocating resources on a working node to a container group according to some embodiments of the present disclosure
  • FIG5 is a schematic diagram of a process of updating available resources and container group information of a working node according to some embodiments of the present disclosure
  • FIG6 is a schematic diagram of the structure of a container management component according to some embodiments of the present disclosure.
  • FIG7 is a schematic diagram of the structure of a resource allocation system according to other embodiments of the present disclosure.
  • FIG8 is a schematic diagram of the structure of a resource allocation system according to some further embodiments of the present disclosure.
  • FIG9 is a schematic structural diagram of an electronic device according to some other embodiments of the present disclosure.
  • FIG. 10 is a schematic diagram of the structure of a computer system according to some embodiments of the present disclosure.
  • application containers can bind CPU cores through the CPU Manager in the Kubelet component, and can also expand peripheral devices, such as GPU resources, through device plug-ins.
  • Peripheral devices are managed by the Device Manager, and then the Topology Manager mechanism is enabled to summarize the resource allocation results of the CPU Manager and the Device Manager to give the optimal topology result, and finally try to ensure that the CPU, GPU and other resources allocated to the container can achieve affinity and exclusivity.
  • NUMA Non-Uniform Memory Access
  • a technical problem to be solved by the present disclosure is to provide a resource allocation method, a container management component and a resource allocation system to achieve alignment of CPU resources and storage resources allocated to a container group in the NUMA dimension.
  • FIG1 is a schematic diagram of a resource allocation method according to some embodiments of the present disclosure. As shown in FIG1 , the resource allocation method of the embodiment of the present disclosure includes:
  • Step S110 The container management component obtains information about the container group scheduled to the working node.
  • a container group is a collection of containers.
  • the container group is specifically a Pod.
  • Pod is the basic unit of Kubernetes scheduling.
  • the containers in the Pod share the network and file system, and can be combined to complete the service in a simple and efficient way through inter-process communication and file sharing.
  • the container management component queries the database through the interface service component to obtain information about the container group on the worker node. For example, when the worker node is worker node 1 in the cluster, the information about the container group on worker node 1 is queried from the database through the interface service component.
  • the worker node can be a physical machine or a virtual machine based on a physical machine.
  • the container management component can be built based on the Kubelet component in the container orchestration tool Kubernetes
  • the interface service component can be built based on the Kube interface service component (Kube API Server) in Kubernetes
  • the database can be built based on a distributed key-value database (such as ETCD).
  • the information of the container group on the working node includes: an identifier of the container group scheduled to the working node, CPU resource application information and storage resource application information of the container group.
  • Step S120 the container management component determines a NUMA node matching the container group from at least one NUMA node according to the CPU resource application information and storage resource application information of the container group and the available resource information of the working node under at least one NUMA node.
  • the NUMA node that matches the container group is a NUMA node that can simultaneously meet the CPU resource application requirements and storage resource application requirements of the container group.
  • determining a NUMA node matching the container group from at least one NUMA node includes: step S121 and step S122.
  • Step S121 the container management component selects a first NUMA node from at least one NUMA node according to the CPU resource application information of the container group and the available CPU resource information of the working node under at least one NUMA node.
  • the first NUMA node is a NUMA node that can meet the CPU resource application requirements of the container group.
  • the available resource information of the working node where the container group is located includes the available resource information of two NUMA nodes, specifically 32 available CPU cores and 200G of available storage resources in NUMA0, and 32 available CPU cores and 100G of available storage resources in NUMA1. Since NUMA0 and NUMA1 can both meet the CPU resource application requirements of the container group, NUMA0 and NUMA1 are used as the first NUMA nodes.
  • Step S122 the container management component selects a second NUMA node from the first NUMA node according to the storage resource application information of the container group and the available storage resource information of the working node under the first NUMA node, and uses the second NUMA node as the NUMA node matching the container group.
  • the available storage resource information under the first NUMA node determined in step S121 includes 200G of available storage resources in NUMA0.
  • the storage resource requested by the container group is AEP.
  • AEP is a new type of non-volatile (Optane Memory) device, also known as Apache Pass, generally referred to as AEP. Before this, there were similar devices called non-volatile dual in-line memory modules (NVDIMM) or persistent memory (PMEM).
  • NVDIMM non-volatile dual in-line memory modules
  • PMEM persistent memory
  • the AEP device is extended based on FlexVolume or container storage interface (CSI) storage plug-in extension.
  • CSI container storage interface
  • an attribute is added to the configuration file of the container group (e.g., the yaml file of the Pod) (e.g., the attribute is named aepsize), and the attribute is used to indicate whether the container group needs to use the AEP device. For example, when the attribute value is 1, it indicates that the container group needs to use the AEP device. When the attribute value is 0, it indicates that the container group does not use the AEP device. If the container group uses the AEP device, the resource allocation process will be performed according to the process shown in Figure 1. If the container group does not use the AEP device, the resource allocation process can be performed according to the relevant technology.
  • Step S130 the container management component allocates CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • the CPU resources and storage resources under NUMA1 are allocated to the container group.
  • the above steps can achieve the alignment of CPU resources and storage resources allocated to the container group in the NUMA dimension, achieve better container affinity, effectively improve the effect of resource allocation for containers, and enhance the performance of containerized deployed applications.
  • FIG2 is a schematic diagram of a process flow of a resource allocation method according to some other embodiments of the present disclosure.
  • the resource allocation method of the embodiment of the present disclosure includes:
  • Step S210 The scheduling management component schedules the container group to the working node.
  • the scheduling management component is set on a master node in a cluster.
  • the cluster includes a master node and a worker node.
  • the master node and the worker node can be physical machines or virtual machines based on physical machines.
  • the scheduling management component allocates working nodes to the container group through the process shown in FIG. 3 to schedule the container group to the working nodes in the cluster.
  • Step S220 The container management component allocates resources to the container group scheduled to the working node where it is located.
  • the container management component allocates resources to the container group scheduled to the working node where it is located through the process shown in Figure 1.
  • the container management component allocates resources to the container group scheduled to the working node where it is located through the process shown in FIG. 4 .
  • Step S230 the topology resource management component updates the available resource information and container group information under the working node.
  • the topology resource management component is disposed on a master node in the cluster.
  • the topology resource management component updates the available resource information and container group information under the working node through the process shown in FIG. 5 .
  • the above steps can be used to schedule containers to working nodes and allocate resources to the container groups scheduled to the working nodes.
  • FIG3 is a schematic diagram of a process of scheduling a container group to a working node according to some embodiments of the present disclosure. As shown in FIG3, the process of scheduling a container group to a working node includes:
  • Step S211 the scheduling management component obtains resource application information of the container group.
  • the scheduling management component queries the database through the interface service component to obtain resource application information of the container group.
  • the scheduling management component is specifically the Kube Scheduler component
  • the interface service component is specifically the Kube API Server
  • the database is specifically the distributed key-value database ETCD
  • the information of the container group obtained by the query is the Pod information.
  • the database includes information of multiple container groups, and the information of each container group includes resource application information of the container group.
  • the resource application information of the container group includes at least one of CPU resource application information and storage resource application information of the container group.
  • Step S212 the scheduling management component determines a working node to be allocated to the container group from the multiple working nodes according to the resource application information of the container group and the available resource information of the multiple working nodes in the cluster in the NUMA dimension.
  • the available resource information of the working node in the NUMA dimension includes at least one of available CPU resource information and available storage resource information.
  • the scheduling management component selects a working node from the multiple working nodes that can meet the resource application requirements of the container group and achieve container affinity in the NUMA dimension.
  • the scheduling management component uses the working node as the working node assigned to the container group;
  • the scheduling management component can combine other container scheduling strategies to select the working node assigned to the container group, or use any one of the multiple working nodes that meet the above conditions as the working node assigned to the container group.
  • the candidate work nodes are work node 1 and work node 2
  • the available resources of work node 1 are: 32 available CPU cores and 200G of available storage resources in NUMA0, 32 available CPU cores and 100G of available storage resources in NUMA1
  • the available resource information of work node 2 is: 64 available CPU cores and 500G of available storage resources in NUMA1, and 32 available CPU cores in NUMA2.
  • the scheduling management component can combine other scheduling strategies to select the work node allocated to the container group from work nodes 1 and 2.
  • the process of scheduling the container group to the work node further includes: before determining the work node assigned to the container group from multiple work nodes, the scheduling management component filters out the work nodes that have not completed resource allocation for the assigned container group from the multiple work nodes.
  • the scheduling management component determines whether the working node has been assigned a container group, that is, whether the working node has been bound to a container group. If the working node has been assigned a container group, it is then determined whether the working node has completed resource allocation for the assigned container group. If the working node has not completed resource allocation for the assigned container group, the working node is filtered out; if the working node has not been assigned a container group, or the working node has completed resource allocation for the assigned container group, the working node is retained.
  • the scheduling management component determines whether the working node has been assigned a container group in the following manner: determining whether there is container group information including the working node identifier; if there is container group information including the working node identifier, determining that the working node has been assigned a container group; otherwise, determining that the working node has not been assigned a container group.
  • the scheduling management component determines whether the working node has completed resource allocation for the allocated container group in the following manner: for each container group allocated to the working node, determine whether the NUMA identifier in the container group information is empty. If the NUMA identifier in the container group information is empty, it indicates that the working node has not completed resource allocation for the allocated container group; if the NUMA identifier in the allocated container group information is not empty, it indicates that the working node has completed resource allocation for the allocated container group.
  • the scheduling manager schedules the container group to the working node by judging whether the available resources of the working node in the NUMA dimension meet the resource application requirements of the container group. On the one hand, it can optimize the container scheduling effect and avoid scheduling the container group to the working node with insufficient resources, which will cause a large number of container groups to be idle for a long time. In the process of creation, the efficiency of container deployment is improved. On the other hand, it is convenient to allocate resources on the working node to the container group in the future, improving the resource allocation effect.
  • the container group will not be scheduled to the working node, which can solve the problem of scheduling management component calculation errors caused by the delay in updating the available resources of the working node, thereby releasing the container group incorrectly.
  • Step S213 The scheduling management component schedules the container group to the working node assigned to it.
  • the scheduling management component sets the information of the working node assigned to the container group in the information of the container group.
  • the information of the working node assigned to the container group includes the identification of the working node.
  • the scheduling management component adds the identification of the working node assigned to the container group to the container group information stored in the database through the interface service component.
  • the above steps enable the scheduling management component to perceive the available resources of the working node in the NUMA dimension and perform container scheduling accordingly, which not only improves the container scheduling effect, but also helps to improve the subsequent resource allocation effect.
  • FIG4 is a schematic diagram of a process of allocating resources on a working node to a container group according to some embodiments of the present disclosure. As shown in FIG4 , the process of allocating resources on a working node to a container group in an embodiment of the present disclosure includes:
  • Step S221 The container management component obtains information about the container group allocated to the working node where it is located.
  • a container group is a set of closely connected containers.
  • the container group is specifically a Pod.
  • Pod is the basic unit of Kubernetes scheduling.
  • the containers in the Pod share the network and file system, and can be combined to complete the service in a simple and efficient way through inter-process communication and file sharing.
  • the container management component queries the database through the interface service component to obtain information about the container group on the worker node where it is located.
  • the container management component can be built based on the Kubelet component
  • the interface service component can be built based on the Kube API Server
  • the database can be built based on the distributed key-value database ETCD.
  • the working node where the container management component is located is working node 1 in the cluster
  • the information of the container group on working node 1 is queried from the database through the interface service component.
  • the working node can be a physical machine or a virtual machine based on a physical machine.
  • the information of the container group allocated to the working node includes: the allocated working node identifier, the CPU resource application information and the storage resource application information of the container group.
  • step S221 before step S221, it also includes: steps A to C.
  • Step A Perform disk planning for the working node to determine the storage resources of the working node under at least one NUMA node.
  • the storage resource of the working node is AEP.
  • AEP type storage devices on the host machine is equivalent to a hard disk.
  • the storage resources of the working node under at least one NUMA node are determined as follows: the storage resource under NUMA0 is 100G AEP0, and the storage resource under NUMA1 is 200G AEP1.
  • Step B Configure the container management component startup command and start it based on the startup command.
  • the container management component when the container management component is built based on the kubelet component, you can configure the startup command of the kubelet component as follows: Add the following command parameters to the startup command of the kubelet component:
  • Step C After starting the container management component, initialize the available resources of the working node where the container management component is located on at least one NUMA node.
  • Step S222 The container management component determines a NUMA node matching the container group from at least one NUMA node according to the CPU resource and storage resource application information of the container group and the available resource information of the working node under at least one NUMA node.
  • the NUMA node that matches the container group is a NUMA node that can simultaneously meet the application requirements of the container group for CPU resources and storage resources.
  • step S222 includes: the container management component selects a first NUMA node from at least one NUMA node based on the CPU resource application information of the container group and the available CPU resource information of the working node under at least one NUMA node, wherein the first NUMA node is a NUMA node that can meet the CPU resource application requirements of the container group; the container management component selects a second NUMA node from the first NUMA node based on the storage resource application information of the container group and the available storage resource information of the working node under the first NUMA node, and uses the second NUMA node as the NUMA node in the working node that matches the container group.
  • the storage resource requested by the container group is AEP.
  • AEP is a new type of non-volatile (Optane Memory) device, also known as Apache Pass, generally referred to as AEP. Before this, there were similar devices called Non-Volatile Dual In-line Memory Module (NVDIMM) or Persistent Memory (PMEM).
  • NVDIMM Non-Volatile Dual In-line Memory Module
  • PMEM Persistent Memory
  • the processing logic shown in step S222 can be obtained by extending the topology calculation logic of the original CPU manager (CPU Manager), thereby reducing the amount of code modification.
  • Step S223 the container management component allocates the CPU resources and storage resources under the NUMA node matching the container group to the container group.
  • the NUMA node matching the container group in the determined working nodes is NUMA1
  • the available CPU resources and available storage resources under NUMA1 are allocated to the container group.
  • Step S224 the container management component updates the information of the container group according to the information of the NUMA node matching the container group.
  • the container management component sets the information of the NUMA node that matches the container group in the information of the container group.
  • the information of the NUMA node that matches the container group includes the identifier of the NUMA node.
  • the container management component adds the identifier of the NUMA node that matches the container group to the container group information stored in the database through the interface service component.
  • the identifier of the NUMA node matching the container group is updated to the Pod information stored in the database in a patch manner. Patch is to submit the modification of certain fields in the object to Kubernetes.
  • two attributes are added to the Pod information: one is the NUMA node identifier assigned to the container group, and the other is the storage resource identifier assigned to the container group.
  • the NUMA node assigned to the container group is NUMA0 under worker node 1
  • the NUMA node identifier assigned to the container group is set to dev1/NUMA0 in the Pod information
  • the storage resource assigned to the container group is AEP0
  • the storage resource identifier assigned to the container group is set to vg:AEP0 in the Pod information.
  • the above process can be used to allocate resources on the working node to the container group, so that the CPU resources and storage resources allocated to the container group are aligned in the NUMA dimension, thereby achieving better container affinity, effectively improving the effect of resource allocation for the container, and improving the performance of containerized deployed applications.
  • FIG5 is a schematic diagram of a process for updating available resources and container group information of a working node according to some embodiments of the present disclosure. As shown in FIG5, the process for updating available resources and container group information of a working node according to an embodiment of the present disclosure includes:
  • Step S231 the topology resource management component monitors the information of the container group.
  • the topology resource management component is disposed in a master node in the cluster.
  • the topology resource management component monitors the information of the container group stored in the database through the interface service component.
  • Step S232 After monitoring that the information of the container group contains the information of the NUMA node matching the container group, The topology resource management component updates the available resource information of the working node under at least one NUMA node.
  • the topology resource management component updates the available resource information of the working node in the NUMA dimension.
  • the available resource information of the original working node in the NUMA dimension is: 32 available CPU cores and 200G available storage resources in NUMA0, and 32 available CPU cores and 100G available storage resources in NUMA1.
  • the container management component allocates 2 CPU cores and 200G storage resources in NUMA0 to a container group
  • the NUMA node identifier "NUMA0" is added to the information of the container group.
  • the topology resource management component detects that the container group information contains the NUMA node identifier "NUMA0”
  • the available resource information of the working node in the NUMA dimension is updated.
  • the updated available resource information of the working node in the NUMA dimension is: 30 available CPU cores and 0G available storage resources in NUMA0, and 32 available CPU cores and 100G available storage resources in NUMA1.
  • Step S233 the topology resource management component sets a resource allocation completion flag in the container group information.
  • the topology resource management component sets a resource allocation completion flag in the container group information stored in the database through the interface service component.
  • the above steps can realize timely updating of container group information and available resources of working nodes, thereby helping to improve the accuracy of container scheduling and resource allocation.
  • FIG6 is a schematic diagram of the structure of a container management component according to some embodiments of the present disclosure.
  • the container management component 600 of the embodiment of the present disclosure includes: an acquisition module 610 , a determination module 620 , and an allocation module 630 .
  • the acquisition module 610 is configured to acquire information of the container group scheduled to the working node.
  • a container group is a set of closely connected containers.
  • the container group is specifically a Pod.
  • Pod is the basic unit of Kubernetes scheduling.
  • the containers in the Pod share the network and file system, and can be combined to complete the service in a simple and efficient way through inter-process communication and file sharing.
  • the acquisition module 610 queries the database through the interface service component to obtain the information of the container group on the worker node (Worker Node) where it is located. For example, if the worker node where the container management component is located is worker node 1 in the cluster, the acquisition module 610 queries the information of the container group on worker node 1 from the database through the interface service component.
  • the worker node can be a physical machine or a virtual machine based on a physical machine.
  • the information of the container group allocated to the working node includes: the allocated working node identifier, and the CPU resource and storage resource application information of the container group.
  • the determination module 620 is configured to determine a NUMA node matching the container group according to the CPU resource and storage resource application information of the container group and the available resource information of the working node under at least one NUMA node.
  • the NUMA node that matches the container group is a NUMA node that can simultaneously meet the application requirements of the container group for CPU resources and storage resources.
  • the determination module 620 determines the NUMA node that matches the container group includes:
  • the determination module 620 selects a first NUMA node from at least one NUMA node according to the CPU resource application information of the container group and the available CPU resource information of the working node under at least one NUMA node, wherein the first NUMA node is a NUMA node that can meet the CPU resource application requirements of the container group.
  • the CPU resource application information of the container group includes 2 CPU cores
  • the available resource information of the working node where the container group is located includes: 32 available CPU cores and 200G of available storage resources in NUMA0, and 32 available CPU cores and 100G of available storage resources in NUMA1
  • the determination module 620 selects a second NUMA node from the first NUMA node according to the storage resource application information of the container group and the available storage resource information of the working node under the first NUMA node, and uses the second NUMA node as the NUMA node matching the container group.
  • the available storage resource information under the first NUMA node is as follows: 200G of available storage resources in NUMA0 and 100G of available storage resources in NUMA1, then NUMA1 is determined as the second NUMA node, that is, the NUMA node that can meet the container group's CPU resource and storage resource application requirements at the same time.
  • the allocation module 630 is configured to allocate the CPU resources and storage resources of the NUMA node in the working node that matches the container group to the container group.
  • the CPU resources and storage resources under NUMA1 are allocated to the container group.
  • the container management component 600 further includes an update module configured to: after the allocation module 630 allocates the CPU resources and storage resources under the NUMA node matching the container group to the container group, update the information of the container group according to the information of the NUMA node matching the container group. For example, add the identifier of the matching NUMA node to the container group information.
  • the above device can realize the alignment of CPU resources and storage resources allocated to the container group in the NUMA dimension, achieve better container affinity, effectively improve the effect of resource allocation for containers, and enhance the performance of container-deployed applications.
  • FIG. 7 is a schematic diagram of the structure of a resource allocation system according to some other embodiments of the present disclosure.
  • the resource allocation system 700 of the disclosed embodiment includes: a scheduling management component 710 , a container management component 720 , and a topology resource management component 730 .
  • the scheduling management component 710 is configured to schedule working nodes for the container group.
  • the scheduling management component 710 is set on a master node in a cluster.
  • the cluster includes a master node and a working node.
  • the master node and the working node can be physical machines or virtual machines based on physical machines.
  • the scheduling management component 710 is configured to schedule the container group to the working node through the process shown in FIG. 3 .
  • the container management component 720 is configured to allocate resources to the container group scheduled to the working node.
  • the container management component is configured to allocate resources to the container group scheduled to the working node where it is located through the process shown in Figure 4.
  • the topology resource management component 730 is configured to update the available resource information and container group information under the working node.
  • the topology resource management component 730 is disposed on a master node in the cluster.
  • the topology resource management component 730 is configured to update the available resource information and container group information under the working node through the process shown in FIG. 5 .
  • the above resource allocation system can realize the alignment of CPU resources and storage resources allocated to container groups in the NUMA dimension, achieve better container affinity, effectively improve the effect of resource allocation for containers, and enhance the performance of containerized deployed applications.
  • FIG8 is a schematic diagram of the structure of a resource allocation system according to some other embodiments of the present disclosure.
  • a resource allocation system based on the container orchestration tool Kubernetes is used as an example for explanation.
  • the resource allocation system of the embodiment of the present disclosure includes: Kube scheduling component (Scheduler) 810, Kube API service component (Server) 820, ETCD 830, Kubelet component 840, and topology resource management component 850.
  • Kube scheduling component 810, Kube API service component 820, ETCD830, and topology resource management component 850 are located in the master node in the cluster, and Kubelet component 840 is located in the working node in the cluster, such as working node 1 shown in Figure 8.
  • the Kube API service component 820 is configured to store the Pod information in ETCD 830 after receiving a request to create a Pod.
  • ETCD is a key-value database with consistency and high availability.
  • the Kube scheduling component 810 is configured to obtain resource information of multiple working nodes in the cluster in the NUMA dimension and resource application information of the Pod to be scheduled, and The resource application information of the Pod to be scheduled determines which working node the Pod is assigned to, and then updates the Pod information.
  • the Kube scheduling component 810 is also configured to: determine whether there are other Pods on the working node whose resource allocation logic has not been processed. If so, the Pod to be scheduled will not be allocated to the working node. If not, the working node will be allocated to the Pod to be scheduled based on the resource information of the NUMA dimension under the working node and the resource information required by the Pod to be scheduled.
  • the Kubelet component 840 is configured to obtain the Pod information related to it, that is, the Pod assigned to its host machine through the Kube API service component 820; determine to which NUMA node the CPU is assigned based on the CPU resources requested by the Pod related to it and the available CPU resources of the host machine in the NUMA dimension; then, based on the storage resources requested by the Pod related to it and the available storage resources of the host machine in the NUMA dimension, select a NUMA node that can meet the Pod's storage resource usage requirements from the NUMA nodes screened according to the CPU resources, and update the NUMA information in the Pod based on the finally selected NUMA node.
  • the topology resource management component 850 is configured to update the available resource information of the working node under at least one NUMA node after detecting that the Pod information contains NUMA node information, and set a resource allocation completion flag in the Pod information.
  • the above system can achieve alignment of CPU resources and storage resources allocated to container groups in the NUMA dimension with less modification to native Kubernetes components, achieve better container affinity, effectively improve the effect of resource allocation for containers, and enhance the performance of containerized deployed applications.
  • FIG9 is a schematic diagram of the structure of an electronic device according to some other embodiments of the present disclosure.
  • the electronic device 900 includes a memory 910; and a processor 920 coupled to the memory 910.
  • the memory 910 is used to store instructions for executing the corresponding embodiments of the resource allocation method.
  • the processor 920 is configured to execute the resource allocation method in any of the embodiments of the present disclosure based on the instructions stored in the memory 910.
  • Figure 10 is a schematic diagram of the structure of a computer system according to some embodiments of the present disclosure.
  • the computer system 1000 can be expressed in the form of a general-purpose computing device.
  • the computer system 1000 includes a memory 1010, a processor 1020, and a bus 1030 connecting different system components.
  • the memory 1010 may include, for example, a system memory, a non-volatile storage medium, etc.
  • the system memory may store, for example, an operating system, an application program, a boot loader, and other programs.
  • the system memory may include a volatile storage medium, such as a random access memory (RAM) and/or a cache memory.
  • the non-volatile storage medium may store, for example, instructions for executing at least one of the corresponding embodiments of the resource allocation method.
  • Volatile storage media include, but are not limited to, disk storage, optical storage, flash memory, etc.
  • the processor 1020 can be implemented by a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors, etc.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • each module such as an acquisition module, a determination module, and an allocation module can be implemented by a central processing unit (CPU) running instructions in a memory that execute corresponding steps, or can be implemented by a dedicated circuit that executes corresponding steps.
  • CPU central processing unit
  • the bus 1030 may use any of a variety of bus architectures, including, but not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, and a Peripheral Component Interconnect (PCI) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • PCI Peripheral Component Interconnect
  • the computer system 1000 interfaces 1040, 1050, 1060, the memory 1010, and the processor 1020 may be connected via a bus 1030.
  • the input/output interface 1040 may provide a connection interface for input/output devices such as a display, a mouse, and a keyboard.
  • the network interface 1050 may provide a connection interface for various networked devices.
  • the storage interface 1060 may provide a connection interface for external storage devices such as a floppy disk, a USB flash drive, and an SD card.
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer or other programmable device to produce a machine, so that the processor executes the instructions to produce means for implementing the functions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions may also be stored in a computer-readable memory, which cause the computer to work in a specific manner to produce an article of manufacture, including instructions for implementing the functions specified in one or more blocks in the flowchart and/or block diagram.
  • the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提出了一种资源分配方法、容器管理组件和资源分配系统,涉及计算机技术领域。在一些实施例中,资源分配方法包括:容器管理组件获取调度至工作节点上的容器组的信息,其中,容器组的信息包括CPU资源申请信息和存储资源申请信息;容器管理组件根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个NUMA节点下的可用资源信息,从至少一个NUMA节点中确定与容器组匹配的NUMA节点;容器管理组件将与所述容器组匹配的NUMA节点下的CPU资源和存储资源分配给所述容器组。通过以上方法,能够实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐。

Description

资源分配方法、容器管理组件和资源分配系统
相关申请的交叉引用
本申请是以CN申请号为202211278945.0,申请日为2022年10月19日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,尤其涉及一种资源分配方法、容器管理组件和资源分配系统。
背景技术
容器云是当前云计算平台的一个重要类型。客户在构建容器云时,往往会部署数百台物理服务器来承载容器云上的业务。
目前,主要通过容器编排工具对容器云进行集群管理,例如,通过Kubernetes将容器调度至集群中的工作节点、为容器分配工作节点上的资源等等。
发明内容
根据本公开的第一方面,提出了一种资源分配方法,包括:容器管理组件获取调度至工作节点上的容器组的信息,其中,所述容器组的信息包括CPU资源申请信息和存储资源申请信息;所述容器管理组件根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点,其中,与所述容器组匹配的NUMA节点为同时能够满足所述容器组CPU资源和存储资源申请需求的NUMA节点;所述容器管理组件将与所述容器组匹配的NUMA节点下的CPU资源和存储资源,分配给所述容器组。
在一些实施例中,所述容器管理组件根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点包括:所述容器管理组件根据所述容器组的CPU资源申请信息、以及所述工作节点在至少一个NUMA节点下的可用CPU资源信息,从所述至少一个NUMA节点中选取 第一NUMA节点,其中,所述第一NUMA节点为能够满足所述容器组CPU资源申请需求的NUMA节点;所述容器管理组件根据所述容器组的存储资源申请信息、以及所述工作节点在第一NUMA节点下的可用存储资源信息,从所述第一NUMA节点中选取第二NUMA节点,并将所述第二NUMA节点作为与所述容器组匹配的NUMA节点。
在一些实施例中,还包括:在所述将与所述容器组匹配的NUMA节点下的CPU资源和存储资源分配给所述容器组之后,所述容器管理组件根据与所述容器组匹配的NUMA节点的信息,更新所述容器组的信息。
在一些实施例中,还包括:在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,通过拓扑资源管理组件更新所述工作节点在至少一个NUMA节点下的可用资源信息。
在一些实施例中,还包括:在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,通过所述拓扑资源管理组件在所述容器组的信息中设置资源分配完成标识。
在一些实施例中,还包括:在所述容器管理组件获取分配至工作节点上的容器组的信息之前,通过调度管理组件将待调度容器组调度至所述工作节点上。
在一些实施例中,通过调度管理组件将待调度容器组调度至所述工作节点上包括:获取待调度容器组的资源申请信息;根据所述待调度容器组的资源申请信息、以及集群中多个工作节点在NUMA维度下的可用资源信息,从所述多个工作节点中确定为所述待调度容器组分配的工作节点;将所述待调度容器组调度至为所述待调度容器组分配的工作节点上。
在一些实施例中,将所述待调度容器组调度至为所述待调度容器组分配的工作节点上包括:根据为所述待调度容器组分配的工作节点的信息,更新数据库中所述待调度容器组的信息。
在一些实施例中,通过调度管理组件将待调度容器组调度至所述工作节点上还包括:在从所述多个工作节点中确定为所述待调度容器组分配的工作节点之前,从所述多个工作节点中滤除尚未针对已分配容器组完成资源分配的工作节点。
根据本公开的第二方面,提出一种容器管理组件,包括:获取模块,被配置为获取调度至工作节点上的容器组的信息,其中,所述容器组的信息包括CPU资源申请信息和存储资源申请信息;确定模块,被配置为根据所述容器组的CPU资源申请信息和 存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点,其中,与所述容器组匹配的NUMA节点为同时能够满足所述容器组CPU资源和存储资源申请需求的NUMA节点;分配模块,被配置为将与所述容器组匹配的NUMA节点下的CPU资源和存储资源,分配给所述容器组。
在一些实施例中,所述确定模块被配置为:根据所述容器组的CPU资源申请信息、以及所述工作节点在至少一个NUMA节点下的可用CPU资源信息,从所述至少一个NUMA节点中选取第一NUMA节点,其中,所述第一NUMA节点为能够满足所述容器组CPU资源申请需求的NUMA节点;根据所述容器组的存储资源申请信息、以及所述工作节点在第一NUMA节点下的可用存储资源信息,从所述第一NUMA节点中选取第二NUMA节点,并将所述第二NUMA节点作为与所述容器组匹配的NUMA节点。
在一些实施例中,所述容器管理组件还包括:更新模块,被配置为在所述分配模块将与所述容器组匹配的NUMA节点下的CPU资源和存储资源分配给所述容器组之后,根据与所述容器组匹配的NUMA节点的信息,更新所述容器组的信息。
根据本公开的第三方面,提出一种资源分配系统,包括:如前所述的容器管理组件;拓扑资源管理组件,被配置为在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,更新所述工作节点在至少一个NUMA节点下的可用资源信息。
在一些实施例中,所述拓扑资源管理组件还被配置为:在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,在所述容器组的信息中设置资源分配完成标识。
在一些实施例中,所述资源分配系统还包括:调度管理组件,被配置为在所述容器管理组件获取分配至工作节点上的容器组的信息之前,将待调度容器组调度至所述工作节点上。
在一些实施例中,所述调度管理组件将待调度容器组调度至所述工作节点上包括:获取待调度容器组的资源申请信息;根据所述待调度容器组的资源申请信息、以及集群中的多个工作节点在NUMA维度下的可用资源信息,从所述多个工作节点中确定为所述待调度容器组分配的工作节点;将所述待调度容器组调度至为所述待调度容器组分配的工作节点上。
在一些实施例中,所述调度管理组件将待调度容器组调度至所述工作节点上还包括:在从所述多个工作节点中确定为所述待调度容器组分配的工作节点之前,从所述多个工作节点中滤除尚未针对已分配容器组完成资源分配的工作节点。
根据本公开的第四方面,还提出一种电子装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器的指令执行如上述的资源分配方法。
根据本公开的第五方面,还提出一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现上述的资源分配方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为根据本公开一些实施例的资源分配方法的流程示意图;
图2为根据本公开另一些实施例的资源分配方法的流程示意图;
图3为根据本公开一些实施例的将容器组调度至工作节点的流程示意图;
图4为根据本公开一些实施例的将工作节点上的资源分配给容器组的流程示意图;
图5为根据本公开一些实施例的对工作节点的可用资源和容器组信息进行更新的流程示意图;
图6为根据本公开一些实施例的容器管理组件的结构示意图;
图7为根据本公开另一些实施例的资源分配系统的结构示意图;
图8为根据本公开再一些实施例的资源分配系统的结构示意图;
图9为根据本公开另一些实施例的电子设备的结构示意图;
图10为根据本公开一些实施例的计算机系统的结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
相关技术中,在基于容器编排平台Kubernetes和Docker构建的集群环境中,应用容器可以通过Kubelet组件中的CPU管理器(CPU Manager)进行CPU绑核,也可以通过设备插件(Device-Plugins)的方式扩展外围设备,例如GPU资源等外围设备。外围设备通过设备管理器(Device Manager)进行管理,再启用拓扑管理器(Topology Manager)机制对CPU管理器和设备管理器的资源分配结果进行汇总以给出最优拓扑结果,最后尽量保证容器被分配的CPU、GPU等资源可以做到亲和性和独占性。
本公开的发明人发现,相关技术对存储设备并没有做出较好的支持,且无法实现为容器分配的CPU与存储设备在非一致性内存架构(Non-Uniform Memory Access,NUMA)维度进行资源对齐。
本公开要解决的一个技术问题是,提供一种资源分配方法、容器管理组件和资源分配系统,以实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐。
图1为根据本公开一些实施例的资源分配方法的流程示意图。如图1所示,本公开实施例的资源分配方法包括:
步骤S110:容器管理组件获取调度至工作节点上的容器组的信息。
其中,容器组为一组容器的集合。示例性地,在容器编排工具Kubernetes中,容器组具体为Pod。Pod是Kubernetes调度的基本单位,Pod内的容器共享网络和文件系统,可以通过进程间通信和文件共享这种简单高效的方式组合完成服务。
在一些实施例中,容器管理组件通过接口服务组件查询数据库,以得到工作节点(Worker Node)上的容器组的信息。例如,当工作节点为集群中的工作节点1时,通过接口服务组件从数据库中查询工作节点1上的容器组的信息。示例性地,工作节点可以是物理机,也可以是依托于物理机的虚拟机。
示例性地,容器管理组件可基于容器编排工具Kubernetes中的Kubelet组件构建,接口服务组件可基于Kubernetes中的Kube接口服务组件(Kube API Server)构建,数据库可基于分布式键值数据库(例如ETCD)构建。
在一些实施例中,工作节点上的容器组的信息包括:调度至该工作节点上的容器组的标识、容器组的CPU资源申请信息和存储资源申请信息。
步骤S120:容器管理组件根据容器组的CPU资源申请信息和存储资源申请信息、以及工作节点在至少一个NUMA节点下的可用资源信息,从至少一个NUMA节点中确定与容器组匹配的NUMA节点。
其中,与容器组匹配的NUMA节点为同时能够满足容器组的CPU资源申请需求和存储资源申请需求的NUMA节点。
在一些实施例中,从至少一个NUMA节点中确定与容器组匹配的NUMA节点包括:步骤S121和步骤S122。
步骤S121:容器管理组件根据容器组的CPU资源申请信息、以及工作节点在至少一个NUMA节点下的可用CPU资源信息,从至少一个NUMA节点中选取第一NUMA节点。
其中,第一NUMA节点为能够满足容器组的CPU资源申请需求的NUMA节点。
例如,假设容器组的CPU资源申请信息表明容器组需要申请2个CPU核,容器组所在工作节点的可用资源信息包括两个NUMA节点的可用资源信息,具体为NUMA0中的32个可用的CPU核、200G的可用存储资源、以及NUMA1中的32个可用的CPU核、100G的可用存储资源,由于NUMA0、NUMA1均能满足容器组的CPU资源申请需求,因此将NUMA0、NUMA1作为第一NUMA节点。
步骤S122:容器管理组件根据容器组的存储资源申请信息、以及工作节点在第一NUMA节点下的可用存储资源信息,从第一NUMA节点中选取第二NUMA节点,并将第二NUMA节点作为与容器组匹配的NUMA节点。
例如,假设容器组的存储资源申请信息包括200G的存储资源,通过步骤S121确定的第一NUMA节点下的可用存储资源信息包括NUMA0中的200G的可用存储资 源、NUMA1中的100G的可用存储资源,由于NUMA1能够同时满足容器组CPU资源和存储资源申请需求,因此将NUMA1作为第二NUMA节点,即与容器组匹配的NUMA节点。
在一些实施例中,容器组申请的存储资源为AEP。AEP是一种新型的非易失(Optane Memory)设备,又被称作Apache Pass,一般习惯称作AEP。在这之前也有类似的设备称作非易失性双列直插式内存模块(non-volatile dual in-line memory module,简称NVDIMM)或持久化内存(Persistent Memory,简称PMEM)。
在一些实施例中,基于FlexVolume或容器存储接口(CSI)存储插件扩展方式扩展使用AEP设备。
在一些实施例中,在容器组的配置文件(例如Pod的yaml文件)中增加一个属性(例如,将该属性命名为aepsize),该属性用于指示容器组是否需要使用AEP设备,例如,当该属性取值为1时,指示容器组需要使用AEP设备,当该属性取值为0时,指示容器组不使用AEP设备。如果容器组使用AEP设备,则会按照图1所示流程进行资源分配处理,如果容器组未使用AEP设备,则可按照相关技术进行资源分配处理。
步骤S130:容器管理组件将与容器组匹配的NUMA节点下的CPU资源和存储资源分配给容器组。
示例性地,假设通过步骤S110和步骤S120确定的与容器组匹配的NUMA节点为NUMA1,则将NUMA1下的CPU资源和存储资源分配给该容器组。
在本公开实施例中,通过以上步骤能够实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐,实现更好的容器亲和性,有效改善针对容器进行的资源分配的效果,提升容器化部署的应用的性能。
图2为根据本公开另一些实施例的资源分配方法的流程示意图。如图2所示,本公开实施例的资源分配方法包括:
步骤S210:调度管理组件将容器组调度至工作节点。
在一些实施例中,调度管理组件设置在集群中的主节点(Master Node)上。其中,集群包括主节点和工作节点。主节点和工作节点可以是物理机,也可以是依托于物理机的虚拟机。
在一些实施例中,调度管理组件通过图3所示流程为容器组分配工作节点,以将容器组调度至集群中的工作节点上。
步骤S220:容器管理组件为调度至其所在工作节点上的容器组分配资源。
在一些实施例中,容器管理组件通过图1所示流程为调度至其所在工作节点上的容器组分配资源。
在一些实施例中,容器管理组件通过图4所示流程为调度至其所在工作节点上的容器组分配资源。
步骤S230:拓扑资源管理组件更新工作节点下的可用资源信息和容器组的信息。
在一些实施例中,拓扑资源管理组件设置在集群中的主节点上。
在一些实施例中,拓扑资源管理组件通过图5所示流程更新工作节点下的可用资源信息和容器组的信息。
在本公开实施例中,通过以上步骤能够实现将容器调度至工作节点、以及为调度至工作节点上的容器组分配资源。
图3为根据本公开一些实施例的将容器组调度至工作节点的流程示意图。如图3所示,将容器组调度至工作节点的流程包括:
步骤S211:调度管理组件获取容器组的资源申请信息。
在一些实施例中,调度管理组件通过接口服务组件查询数据库,以得到容器组的资源申请信息。示例性地,在容器编排工具Kubernetes中,调度管理组件具体为Kube Scheduler组件,接口服务组件具体为Kube API Server,数据库具体为分布式键值数据库ETCD,查询得到的容器组的信息为Pod信息。
其中,数据库包括多个容器组的信息,每个容器组的信息包括容器组的资源申请信息。在一些实施例中,容器组的资源申请信息包括容器组的CPU资源申请信息和存储资源申请信息中的至少一项。
步骤S212:调度管理组件根据容器组的资源申请信息、以及集群中的多个工作节点在NUMA维度下的可用资源信息,从多个工作节点中确定为容器组分配的工作节点。
在一些实施例中,工作节点在NUMA维度下的可用资源信息包括可用CPU资源信息和可用存储资源信息中的至少一种。
在一些实施例中,在获取到容器组的资源申请信息和多个工作节点在NUMA维度下的可用资源信息之后,调度管理组件从多个工作节点中选取能够满足容器组的资源申请需求、且实现在NUMA维度的容器亲和性的工作节点。当选取出的满足上述条件的工作节点为一个时,调度管理组件将该工作节点作为给容器组分配的工作节点; 当选取出的满足上述条件的工作节点为多个时,调度管理组件可再结合其他容器调度策略选取出为容器组分配的工作节点,或者,将选取出的满足上述条件的多个工作节点中的任意一个作为给容器组分配的工作节点。
例如,假设容器组申请2个CPU核和100G的存储资源,候选工作节点为工作节点1和工作节点2,且工作节点1的可用资源为:NUMA0中的32个可用的CPU核、200G的可用存储资源,NUMA1中的32个可用的CPU核、100G的可用存储资源,工作节点2的可用资源信息为:NUMA1中的64个可用的CPU核、500G的可用存储资源,NUMA2中的32个可用的CPU核,通过比较容器组申请的资源、与候选工作节点上的可用资源,确定工作节点1和工作节点2均能够满足容器组的资源申请需求、且实现在NUMA维度的容器亲和性,在这种情况下,调度管理组件可再结合其他调度策略从工作节点1、2中选取为容器组分配的工作节点。
在一些实施例中,将容器组调度至工作节点的流程还包括:在从多个工作节点中确定为容器组分配的工作节点之前,调度管理组件从多个工作节点中滤除尚未针对已分配容器组完成资源分配的工作节点。
例如,对于多个工作节点中的每一个,调度管理组件判断该工作节点是否已分配容器组,即判断该工作节点是否已绑定容器组。如果该工作节点已分配容器组,再判断该工作节点是否已针对已分配容器组完成资源分配。如果该工作节点尚未针对已分配容器组完成资源分配,将该工作节点滤除;如果该工作节点未分配容器组、或者该工作节点已针对已分配容器组完成资源分配,保留该工作节点。
在一些实施例中,调度管理组件根据如下方式判断该工作节点是否已分配容器组:判断是否存在包括该工作节点标识的容器组信息;若存在包括该工作节点标识的容器组信息,确定该工作节点已分配容器组;否则,确定该工作节点未分配容器组。
在一些实施例中,调度管理组件根据如下方式判断工作节点是否已针对已分配容器组完成资源分配:对于已分配至该工作节点上的每个容器组,判断该容器组信息中的NUMA标识是否为空,若该容器组信息中的NUMA标识为空,表明该工作节点还未完成针对已分配容器组的资源分配;若已分配容器组信息中的NUMA标识不为空,表明该工作节点已完成针对已分配容器组的资源分配。
在本公开实施例中,调度管理器通过判断工作节点在NUMA维度的可用资源是否满足容器组的资源申请需求,以将容器组调度至工作节点上,一方面,能够优化容器调度效果,避免将容器组调度至资源不足的工作节点上而导致大量的容器组长时间 处于创建中,提高容器部署效率,另一方面,便于后续为该容器组分配工作节点上的资源,提高资源分配效果。进一步,通过判断工作节点上是否有未处理完的资源分配逻辑,如果有,则不将容器组调度到该工作节点,能够解决因工作节点的可用资源更新延迟所导致的调度管理组件计算错误,从而错误放行容器组的问题。
步骤S213:调度管理组件将容器组调度至为其分配的工作节点上。
在一些实施例中,调度管理组件在容器组的信息中设置为容器组分配的工作节点的信息。其中,为容器组分配的工作节点的信息包括工作节点的标识。例如,调度管理组件通过接口服务组件在数据库存储的容器组信息中增加为容器组分配的工作节点的标识。
在本公开实施例中,通过以上步骤能够使调度管理组件在NUMA维度上感知工作节点的可用资源,并据此进行容器调度,不仅提高了容器调度效果,而且有助于提高后续的资源分配效果。
图4为根据本公开一些实施例的将工作节点上的资源分配给容器组的流程示意图。如图4所示,本公开实施例中的将工作节点上的资源分配给容器组的流程包括:
步骤S221:容器管理组件获取分配至其所在工作节点上的容器组的信息。
其中,容器组为一组紧密联系的容器集合。示例性地,在容器编排工具Kubernetes中,容器组具体为Pod。Pod是Kubernetes调度的基本单位,Pod内的容器共享网络和文件系统,可以通过进程间通信和文件共享这种简单高效的方式组合完成服务。
在一些实施例中,容器管理组件通过接口服务组件查询数据库,以得到其所在工作节点(Worker Node)上的容器组的信息。具体实施时,在基于容器编排工具Kubernetes构建资源分配系统时,容器管理组件可基于Kubelet组件构建,接口服务组件可基于Kube API Server构建,数据库可基于分布式键值数据库ETCD构建。
例如,容器管理组件所在的工作节点为集群中的工作节点1,则通过接口服务组件从数据库中查询工作节点1上的容器组的信息。其中,工作节点可以是物理机,也可以是依托于物理机的虚拟机。
其中,分配至其所在工作节点上的容器组的信息包括:分配的工作节点标识、容器组的CPU资源申请信息和存储资源申请信息。
在一些实施例中,在步骤S221之前,还包括:步骤A至步骤C。
步骤A:对工作节点进行磁盘规划,以确定工作节点在至少一个NUMA节点下的存储资源。
在一些实施例中,工作节点的存储资源为AEP。AEP类型的存储设备在宿主机上的使用方式相当于一块硬盘。例如确定工作节点在至少一个NUMA节点下的存储资源为:NUMA0下的存储资源为100G的AEP0,NUMA1下的存储资源为200G的AEP1。
步骤B:配置容器管理组件启动命令,基于启动命令启动。
例如,在容器管理组件基于kubelet组件构建时,可按如下方式配置kubelet组件的启动命令:在kubelet组件的启动命令中增加如下命令参数:
--cpu-manager-policy=static--feature-gates=CPUManager=true--topology-manager-policy=single-numa-node;其表示启用将容器使用资源调度到一个NUMA节点上的亲和性策略。
步骤C:在启动容器管理组件后,对其所在工作节点在至少一个NUMA节点的可用资源进行初始化。
步骤S222:容器管理组件根据容器组的CPU资源和存储资源申请信息、以及工作节点在至少一个NUMA节点下的可用资源信息,从至少一个NUMA节点中确定与容器组匹配的NUMA节点。
其中,与容器组匹配的NUMA节点为同时能够满足容器组CPU资源和存储资源申请需求的NUMA节点。
在一些实施例中,步骤S222包括:容器管理组件根据容器组的CPU资源申请信息、以及工作节点在至少一个NUMA节点下的可用CPU资源信息,从至少一个NUMA节点中选取第一NUMA节点,其中,第一NUMA节点为能够满足容器组CPU资源申请需求的NUMA节点;容器管理组件根据容器组的存储资源申请信息、以及工作节点在第一NUMA节点下的可用存储资源信息,从第一NUMA节点中选取第二NUMA节点,并将第二NUMA节点作为工作节点中的与容器组匹配的NUMA节点。
在一些实施例中,容器组申请的存储资源为AEP。AEP是一种新型的非易失(Optane Memory)设备,又被称作Apache Pass,一般习惯称作AEP。在这之前也有类似的设备称作非易失性双列直插式内存模块(Non-Volatile Dual In-line Memory Module,简称NVDIMM)或持久化内存(Persistent Memory,简称PMEM)。
在一些实施例中,在容器管理组件具体基于Kubelet组件构建时,可通过扩展原有的CPU管理器(CPU Manager)的拓扑计算逻辑,以得到步骤S222所示的处理逻辑,从而减少代码的修改量。
步骤S223:容器管理组件将与容器组匹配的NUMA节点下的CPU资源和存储资源分配给容器组。
示例性地,假设确定的工作节点中与容器组匹配的NUMA节点为NUMA1,则按照容器组的需求,将NUMA1下的可用CPU资源和可用存储资源分配给该容器组。
步骤S224:容器管理组件根据与容器组匹配的NUMA节点的信息,更新容器组的信息。
在一些实施例中,容器管理组件在容器组的信息中设置为与容器组匹配的NUMA节点的信息。其中,与容器组匹配的NUMA节点的信息包括NUMA节点的标识。例如,容器管理组件通过接口服务组件,在数据库存储的容器组信息中,增加与容器组匹配的NUMA节点的标识。
在一些实施例中,在资源分配系统基于容器编排工具Kubernetes构建、容器管理组件基于kubelet组件构建时,将与容器组匹配的NUMA节点的标识以patch方式更新到数据库存储的Pod信息中。其中,patch是将对象中某些字段的修改提交给Kubernetes。
在一些实施例中,Pod的信息中会增加两个属性:一是为容器组分配的NUMA节点标识,二是为容器组分配的存储资源标识。例如,当为容器组分配的NUMA节点为工作节点1下的NUMA0时,在Pod的信息中将为容器组分配的NUMA节点标识设置为dev1/NUMA0;当为容器组分配的存储资源为AEP0时,在Pod信息中将为容器组分配的存储资源标识设置为vg:AEP0。
在本公开实施例中,通过以上流程能够将工作节点上的资源分配给容器组,实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐,从而实现更好的容器亲和性,有效改善针对容器进行的资源分配的效果,提升容器化部署的应用的性能。
图5为根据本公开一些实施例的对工作节点的可用资源和容器组信息进行更新的流程示意图。如图5所示,本公开实施例的对工作节点的可用资源和容器组信息进行更新的流程包括:
步骤S231:拓扑资源管理组件对容器组的信息进行监听。
在一些实施例中,拓扑资源管理组件设置在集群中的主节点中。
在一些实施例中,拓扑资源管理组件通过接口服务组件对数据库中存储的容器组的信息进行监听。
步骤S232:在监听到容器组的信息包含与容器组匹配的NUMA节点的信息之后, 拓扑资源管理组件更新该工作节点在至少一个NUMA节点下的可用资源信息。
在一些实施例中,拓扑资源管理组件在监听到容器组的信息中包含NUMA节点的标识等NUMA节点的信息之后,更新该工作节点在NUMA维度的可用资源信息。
例如,假设原先工作节点在NUMA维度下的可用资源信息为:NUMA0中有32个可用的CPU核、200G的可用存储资源,NUMA1中有32个可用的CPU核、100G的可用存储资源,在容器管理组件将NUMA0中的2个CPU核、200G的存储资源分配给一个容器组之后,在该容器组的信息中增加NUMA节点的标识“NUMA0”。在拓扑资源管理组件检测到该容器组信息包含NUMA节点的标识“NUMA0”之后,更新该工作节点在NUMA维度的可用资源信息。其中,更新后的工作节点在NUMA维度的可用资源信息为:NUMA0中有30个可用的CPU核、0G的可用存储资源,NUMA1中有32个可用的CPU核、100G的可用存储资源。
步骤S233:拓扑资源管理组件在容器组的信息中设置资源分配完成标识。
在一些实施例中,拓扑资源管理组件通过接口服务组件在数据库存储的容器组信息中,设置资源分配完成标识。
在本公开实施例中,通过以上步骤能够实现容器组信息和工作节点的可用资源的及时更新,进而有助于提高容器调度、资源分配的准确性。
图6为根据本公开一些实施例的容器管理组件的结构示意图。如图6所示,本公开实施例的容器管理组件600包括:获取模块610、确定模块620、分配模块630。
获取模块610,被配置为获取调度至工作节点上的容器组的信息。
其中,容器组为一组紧密联系的容器集合。示例性地,在容器编排工具Kubernetes中,容器组具体为Pod。Pod是Kubernetes调度的基本单位,Pod内的容器共享网络和文件系统,可以通过进程间通信和文件共享这种简单高效的方式组合完成服务。
在一些实施例中,获取模块610通过接口服务组件查询数据库,以得到其所在工作节点(Worker Node)上的容器组的信息。例如,容器管理组件所在的工作节点为集群中的工作节点1,则获取模块610通过接口服务组件从数据库中查询工作节点1上的容器组的信息。其中,工作节点可以是物理机,也可以是依托于物理机的虚拟机。
其中,分配至其所在工作节点上的容器组的信息包括:分配的工作节点标识、容器组的CPU资源和存储资源申请信息。
确定模块620,被配置为根据容器组的CPU资源和存储资源申请信息、以及工作节点在至少一个NUMA节点下的可用资源信息,确定与容器组匹配的NUMA节点。
其中,与容器组匹配的NUMA节点为同时能够满足容器组CPU资源和存储资源申请需求的NUMA节点。
在一些实施例中,确定模块620确定与容器组匹配的NUMA节点包括:
确定模块620根据容器组的CPU资源申请信息、以及工作节点在至少一个NUMA节点下的可用CPU资源信息,从至少一个NUMA节点中选取第一NUMA节点。其中,第一NUMA节点为能够满足容器组CPU资源申请需求的NUMA节点。
例如,假设容器组的CPU资源申请信息包括2个CPU核,容器组所在工作节点的可用资源信息包括:NUMA0中有32个可用的CPU核、200G的可用存储资源,NUMA1中有32个可用的CPU核、100G的可用存储资源,则确定NUMA0、NUMA1均能满足容器组的CPU资源申请需求,因此将NUMA0、NUMA1作为第一NUMA节点。
确定模块620根据容器组的存储资源申请信息、以及工作节点在第一NUMA节点下的可用存储资源信息,从第一NUMA节点中选取第二NUMA节点,并将第二NUMA节点作为与容器组匹配的NUMA节点。
例如,假设容器组的存储资源申请信息包括200G的存储资源,第一NUMA节点下的可用存储资源信息如下:NUMA0中有200G的可用存储资源、NUMA1中有100G的可用存储资源,则确定NUMA1为第二NUMA节点,即能够同时满足容器组CPU资源和存储资源申请需求的NUMA节点。
分配模块630,被配置为将该工作节点中的与容器组匹配的NUMA节点下的CPU资源和存储资源分配给容器组。
示例性地,假设确定的工作节点中的与容器组匹配的NUMA节点为NUMA1,则将NUMA1下的CPU资源和存储资源分配给该容器组。
在一些实施例中,容器管理组件600还包括更新模块,被配置为:在分配模块630将与容器组匹配的NUMA节点下的CPU资源和存储资源分配给容器组之后,根据与容器组匹配的NUMA节点的信息,更新容器组的信息。例如,在容器组信息中增加与之匹配的NUMA节点的标识。
在本公开实施例中,通过以上装置能够实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐,实现更好的容器亲和性,有效改善针对容器进行的资源分配的效果,提升容器化部署的应用的性能。
图7为根据本公开另一些实施例的资源分配系统的结构示意图。如图7所示,本 公开实施例的资源分配系统700包括:调度管理组件710、容器管理组件720、拓扑资源管理组件730。
调度管理组件710,被配置为容器组调度工作节点。
在一些实施例中,调度管理组件710设置在集群中的主节点(Master Node)上。其中,集群包括主节点和工作节点。主节点和工作节点可以是物理机,也可以是依托于物理机的虚拟机。
在一些实施例中,调度管理组件710被配置为通过图3所示流程将容器组调度至工作节点上。
容器管理组件720,被配置为:为调度至工作节点上的容器组分配资源。
在一些实施例中,容器管理组件被配置为通过图4所示流程为调度至其所在工作节点上的容器组分配资源。
拓扑资源管理组件730,被配置为更新工作节点下的可用资源信息和容器组的信息。
在一些实施例中,拓扑资源管理组件730设置在集群中的主节点上。
在一些实施例中,拓扑资源管理组件730被配置为通过图5所示流程更新工作节点下的可用资源信息和容器组的信息。
在本公开实施例中,通过以上资源分配系统能够实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐,实现更好的容器亲和性,有效改善针对容器进行的资源分配的效果,提升容器化部署的应用的性能。
图8为根据本公开再一些实施例的资源分配系统的结构示意图。在图8中,以基于容器编排工具Kubernetes构建资源分配系统为例进行说明。如图8所示,本公开实施例的资源分配系统包括:Kube调度组件(Scheduler)810、Kube API服务组件(Server)820、ETCD 830、Kubelet组件840、拓扑资源管理组件850。
其中,Kube调度组件810、Kube API服务组件820、ETCD830、拓扑资源管理组件850位于集群中的主节点中,Kubelet组件840位于集群中的工作节点中,比如图8所示的工作节点1。
Kube API服务组件820,被配置为在接收到创建Pod的请求后,将Pod的信息存储至ETCD830中。ETCD是具有一致性和高可用性的键值数据库。
Kube调度组件810,被配置为获取集群中多个工作节点在NUMA维度的资源信息和待调度的Pod的资源申请信息,根据多个工作节点在NUMA维度的资源信息和 待调度的Pod的资源申请信息,确定将该Pod分配到哪个工作节点上,然后更新Pod信息。
在一些实施例中,Kube调度组件810,还被配置为:判断工作节点上是否有资源分配逻辑未处理完成的其他Pod,如有,则不将待调度的Pod分配到该工作节点上,如没有,再根据该工作节点下NUMA维度的资源信息和待调度的Pod需要的资源信息为待调度的Pod分配工作节点。
Kubelet组件840,被配置为通过Kube API服务组件820获取与之相关的Pod信息,即分配到其所在宿主机上的Pod;根据与之相关的Pod申请的CPU资源和宿主机在NUMA维度下的可用CPU资源,确定将CPU分配到哪个NUMA节点下;然后,根据与之相关的Pod申请的存储资源和宿主机在NUMA维度下的可用存储资源,从根据CPU资源筛选出的NUMA节点中选出能够满足Pod存储资源使用需求的NUMA节点,并根据最终选出的NUMA节点更新Pod中的NUMA信息。
拓扑资源管理组件850,被配置为在检测到Pod的信息中包含NUMA节点信息之后,更新工作节点在至少一个NUMA节点下的可用资源信息,以及在Pod的信息中设置资源分配完成标识。
在本公开实施例中,通过以上系统能够在对原生Kubernetes组件的修改较少的情况下,实现为容器组分配的CPU资源和存储资源在NUMA维度的对齐,实现更好的容器亲和性,有效改善针对容器进行的资源分配的效果,提升容器化部署的应用的性能。
图9为根据本公开另一些实施例的电子设备的结构示意图。如图9所示,电子设备900包括存储器910;以及耦接至该存储器910的处理器920。存储器910用于存储执行资源分配方法对应实施例的指令。处理器920被配置为基于存储在存储器910中的指令,执行本公开中任意一些实施例中的资源分配方法。
图10为根据本公开一些实施例的计算机系统的结构示意图。如图10所示,计算机系统1000可以通用计算设备的形式表现。计算机系统1000包括存储器1010、处理器1020和连接不同系统组件的总线1030。
存储器1010例如可以包括系统存储器、非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。系统存储器可以包括易失性存储介质,例如随机存取存储器(RAM)和/或高速缓存存储器。非易失性存储介质例如存储有执行资源分配方法中的至少一种的对应实施例的指令。非 易失性存储介质包括但不限于磁盘存储器、光学存储器、闪存等。
处理器1020可以用通用处理器、数字信号处理器(DSP)、应用专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管等分立硬件组件方式来实现。相应地,诸如获取模块、确定模块、分配模块的每个模块,可以通过中央处理器(CPU)运行存储器中执行相应步骤的指令来实现,也可以通过执行相应步骤的专用电路来实现。
总线1030可以使用多种总线结构中的任意总线结构。例如,总线结构包括但不限于工业标准体系结构(ISA)总线、微通道体系结构(MCA)总线、外围组件互连(PCI)总线。
计算机系统1000这些接口1040、1050、1060以及存储器1010和处理器1020之间可以通过总线1030连接。输入输出接口1040可以为显示器、鼠标、键盘等输入输出设备提供连接接口。网络接口1050为各种联网设备提供连接接口。存储接口1060为软盘、U盘、SD卡等外部存储设备提供连接接口。
这里,参照根据本公开实施例的方法、装置和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个框以及各框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可提供到通用计算机、专用计算机或其他可编程装置的处理器,以产生一个机器,使得通过处理器执行指令产生实现在流程图和/或框图中一个或多个框中指定的功能的装置。
这些计算机可读程序指令也可存储在计算机可读存储器中,这些指令使得计算机以特定方式工作,从而产生一个制造品,包括实现在流程图和/或框图中一个或多个框中指定的功能的指令。
本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。
通过上述实施例中的资源分配方法、容器管理组件和资源分配系统,能够实现容器组使用的CPU资源和存储资源在NUMA维度的对齐。
至此,已经详细描述了根据本公开的资源分配方法、容器管理组件和资源分配系统。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。

Claims (20)

  1. 一种资源分配方法,包括:
    容器管理组件获取调度至工作节点上的容器组的信息,其中,所述容器组的信息包括CPU资源申请信息和存储资源申请信息;
    所述容器管理组件根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点,其中,与所述容器组匹配的NUMA节点为同时能够满足所述容器组CPU资源和存储资源申请需求的NUMA节点;
    所述容器管理组件将与所述容器组匹配的NUMA节点下的CPU资源和存储资源,分配给所述容器组。
  2. 根据权利要求1所述的资源分配方法,其中,所述容器管理组件根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点包括:
    所述容器管理组件根据所述容器组的CPU资源申请信息、以及所述工作节点在至少一个NUMA节点下的可用CPU资源信息,从所述至少一个NUMA节点中选取第一NUMA节点,其中,所述第一NUMA节点为能够满足所述容器组CPU资源申请需求的NUMA节点;
    所述容器管理组件根据所述容器组的存储资源申请信息、以及所述工作节点在第一NUMA节点下的可用存储资源信息,从所述第一NUMA节点中选取第二NUMA节点,并将所述第二NUMA节点作为与所述容器组匹配的NUMA节点。
  3. 根据权利要求1或2所述的资源分配方法,还包括:
    在所述将与所述容器组匹配的NUMA节点下的CPU资源和存储资源分配给所述容器组之后,所述容器管理组件根据与所述容器组匹配的NUMA节点的信息,更新所述容器组的信息。
  4. 根据权利要求3所述的资源分配方法,还包括:
    在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,通过拓扑资源管理组件更新所述工作节点在至少一个NUMA节点下的可用资源信息。
  5. 根据权利要求4所述的资源分配方法,还包括:
    在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,通过所述拓扑资源管理组件在所述容器组的信息中设置资源分配完成标识。
  6. 根据权利要求1至5任一所述的资源分配方法,还包括:
    在所述容器管理组件获取分配至工作节点上的容器组的信息之前,通过调度管理组件将待调度容器组调度至所述工作节点上。
  7. 根据权利要求6所述的资源分配方法,其中,通过调度管理组件将待调度容器组调度至所述工作节点上包括:
    获取待调度容器组的资源申请信息;
    根据所述待调度容器组的资源申请信息、以及集群中多个工作节点在NUMA维度下的可用资源信息,从所述多个工作节点中确定为所述待调度容器组分配的工作节点;
    将所述待调度容器组调度至为所述待调度容器组分配的工作节点上。
  8. 根据权利要求7所述的资源分配方法,其中,将所述待调度容器组调度至为所述待调度容器组分配的工作节点上包括:
    根据为所述待调度容器组分配的工作节点的信息,更新数据库中所述待调度容器组的信息。
  9. 根据权利要求7或8所述的资源分配方法,其中,通过调度管理组件将待调度容器组调度至所述工作节点上还包括:
    在从所述多个工作节点中确定为所述待调度容器组分配的工作节点之前,从所述多个工作节点中滤除尚未针对已分配容器组完成资源分配的工作节点。
  10. 一种容器管理组件,包括:
    获取模块,被配置为获取调度至工作节点上的容器组的信息,其中,所述容器组的信息包括CPU资源申请信息和存储资源申请信息;
    确定模块,被配置为根据所述容器组的CPU资源申请信息和存储资源申请信息、以及所述工作节点在至少一个非一致性内存架构NUMA节点下的可用资源信息,从所述至少一个NUMA节点中确定与所述容器组匹配的NUMA节点,其中,与所述容器组匹配的NUMA节点为同时能够满足所述容器组CPU资源和存储资源申请需求的NUMA节点;
    分配模块,被配置为将与所述容器组匹配的NUMA节点下的CPU资源和存储资源,分配给所述容器组。
  11. 根据权利要求10所述的容器管理组件,其中,所述确定模块被配置为:
    根据所述容器组的CPU资源申请信息、以及所述工作节点在至少一个NUMA节点下的可用CPU资源信息,从所述至少一个NUMA节点中选取第一NUMA节点,其中,所述第一NUMA节点为能够满足所述容器组CPU资源申请需求的NUMA节点;
    根据所述容器组的存储资源申请信息、以及所述工作节点在第一NUMA节点下的可用存储资源信息,从所述第一NUMA节点中选取第二NUMA节点,并将所述第二NUMA节点作为与所述容器组匹配的NUMA节点。
  12. 根据权利要求10或11所述的容器管理组件,所述容器管理组件还包括:
    更新模块,被配置为在所述分配模块将与所述容器组匹配的NUMA节点下的CPU资源和存储资源分配给所述容器组之后,根据与所述容器组匹配的NUMA节点的信息,更新所述容器组的信息。
  13. 一种资源分配系统,包括:
    权利要求10至12任一所述的容器管理组件;
    拓扑资源管理组件,被配置为在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,更新所述工作节点在至少一个NUMA节点下的可用资源信息。
  14. 根据权利要求13所述的资源分配系统,其中,所述拓扑资源管理组件还被配置为:
    在检测到所述容器组的信息中包含与所述容器组匹配的NUMA节点的信息之后,在所述容器组的信息中设置资源分配完成标识。
  15. 根据权利要求13或14所述的资源分配系统,其中,所述资源分配系统还包括:
    调度管理组件,被配置为在所述容器管理组件获取分配至工作节点上的容器组的信息之前,将待调度容器组调度至所述工作节点上。
  16. 根据权利要求15所述的资源分配系统,其中,所述调度管理组件将待调度容器组调度至所述工作节点上包括:
    获取待调度容器组的资源申请信息;
    根据所述待调度容器组的资源申请信息、以及集群中的多个工作节点在NUMA维度下的可用资源信息,从所述多个工作节点中确定为所述待调度容器组分配的工作节点;
    将所述待调度容器组调度至为所述待调度容器组分配的工作节点上。
  17. 根据权利要求16所述的资源分配系统,其中,所述调度管理组件将待调度容器组调度至所述工作节点上还包括:
    在从所述多个工作节点中确定为所述待调度容器组分配的工作节点之前,从所述多个工作节点中滤除尚未针对已分配容器组完成资源分配的工作节点。
  18. 一种电子设备,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令执行如权利要求1至9任一项所述的资源分配方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执 行时实现权利要求1至9任一项所述的资源分配方法。
  20. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1至9任一所述的资源分配方法。
PCT/CN2023/089017 2022-10-19 2023-04-18 资源分配方法、容器管理组件和资源分配系统 WO2024082584A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211278945.0A CN115525434A (zh) 2022-10-19 2022-10-19 资源分配方法、容器管理组件和资源分配系统
CN202211278945.0 2022-10-19

Publications (1)

Publication Number Publication Date
WO2024082584A1 true WO2024082584A1 (zh) 2024-04-25

Family

ID=84704241

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089017 WO2024082584A1 (zh) 2022-10-19 2023-04-18 资源分配方法、容器管理组件和资源分配系统

Country Status (2)

Country Link
CN (1) CN115525434A (zh)
WO (1) WO2024082584A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525434A (zh) * 2022-10-19 2022-12-27 京东科技信息技术有限公司 资源分配方法、容器管理组件和资源分配系统
CN116483547A (zh) * 2023-06-21 2023-07-25 之江实验室 资源调度方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210149703A1 (en) * 2019-11-20 2021-05-20 Sap Se Numa-aware resource allocation and placement of database containers
CN114510321A (zh) * 2022-01-30 2022-05-17 阿里巴巴(中国)有限公司 资源调度方法、相关装置和介质
CN114610497A (zh) * 2022-03-21 2022-06-10 中国电信股份有限公司 容器调度方法、集群系统、装置、电子设备及存储介质
CN114721824A (zh) * 2022-04-06 2022-07-08 中国科学院计算技术研究所 一种资源分配方法、介质以及电子设备
CN115525434A (zh) * 2022-10-19 2022-12-27 京东科技信息技术有限公司 资源分配方法、容器管理组件和资源分配系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210149703A1 (en) * 2019-11-20 2021-05-20 Sap Se Numa-aware resource allocation and placement of database containers
CN114510321A (zh) * 2022-01-30 2022-05-17 阿里巴巴(中国)有限公司 资源调度方法、相关装置和介质
CN114610497A (zh) * 2022-03-21 2022-06-10 中国电信股份有限公司 容器调度方法、集群系统、装置、电子设备及存储介质
CN114721824A (zh) * 2022-04-06 2022-07-08 中国科学院计算技术研究所 一种资源分配方法、介质以及电子设备
CN115525434A (zh) * 2022-10-19 2022-12-27 京东科技信息技术有限公司 资源分配方法、容器管理组件和资源分配系统

Also Published As

Publication number Publication date
CN115525434A (zh) 2022-12-27

Similar Documents

Publication Publication Date Title
WO2024082584A1 (zh) 资源分配方法、容器管理组件和资源分配系统
WO2018149221A1 (zh) 一种设备管理方法及网管系统
CN109684065B (zh) 一种资源调度方法、装置及系统
WO2020211579A1 (zh) 一种基于分布式批量处理系统的处理方法、装置及系统
AU2015419073B2 (en) Life cycle management method and device for network service
CN110941481A (zh) 资源调度方法、装置及系统
CN108549583B (zh) 大数据处理方法、装置、服务器及可读存储介质
CN109144710B (zh) 资源调度方法、装置及计算机可读存储介质
CN108572845B (zh) 分布式微服务集群的升级方法及相关系统
CN109564528B (zh) 分布式计算中计算资源分配的系统和方法
US9244737B2 (en) Data transfer control method of parallel distributed processing system, parallel distributed processing system, and recording medium
CN105786603B (zh) 一种基于分布式的高并发业务处理系统及方法
CN109324890B (zh) 资源管理方法、装置及计算机可读存储介质
CN106775948B (zh) 一种基于优先级的云任务调度方法及装置
WO2020125396A1 (zh) 一种共享数据的处理方法、装置及服务器
WO2022105337A1 (zh) 一种任务调度方法与系统
WO2023000673A1 (zh) 硬件加速器设备管理方法、装置及电子设备和存储介质
WO2021022964A1 (zh) 一种基于多核系统的任务处理方法、装置及计算机可读存储介质
CN110187960A (zh) 一种分布式资源调度方法及装置
US20170185503A1 (en) Method and system for recommending application parameter setting and system specification setting in distributed computation
CN113660231A (zh) 一种报文解析方法、装置、设备和存储介质
WO2016074130A1 (zh) 一种系统调用命令的批处理方法及装置
CN112948066A (zh) 一种基于异构资源的Spark任务调度方法
CN108170417B (zh) 一种在mesos集群中集成高性能的作业调度框架的方法和装置
WO2020108337A1 (zh) 一种cpu资源调度方法及电子设备