CN116755829A

CN116755829A - Method for generating host PCIe topological structure and method for distributing container resources

Info

Publication number: CN116755829A
Application number: CN202310531388.7A
Authority: CN
Inventors: 郑孟蕾; 李唯杰; 冯飞; 付斌章
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-09-15

Abstract

The application relates to a method for generating a host PCIe topological structure and a method for distributing container resources. The generating method comprises the following steps: respectively acquiring a first topological structure and a second topological structure; acquiring an association relationship between a PCIe switching unit and a NUMA node; and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host. The application solves the technical problems of time delay increase and resource waste caused by unnecessary data transmission generated when the container executes the calculation task because the topological relation of components such as host computer calculation, network and the like is difficult to obtain by a container arranging system such as K8s and the like in the related technology, and the application achieves the technical effects of solving the problem of the affinity allocation of the resources in the container to a great extent, avoiding the unnecessary data transmission and improving the calculation performance of the container.

Description

Method for generating host PCIe topological structure and method for distributing container resources

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method for generating a host PCIe topology structure and a method for allocating container resources.

Background

In recent years, the theory and technology of artificial intelligence are mature, the application field is expanded continuously, the development characteristic of turning from unidirectional product supply to bidirectional co-construction of various industry depths is gradually displayed, and a cloud service platform becomes an important channel for enterprises to acquire and apply AI capability. The cloud primary architecture has the advantages of elastic capacity expansion, agility distribution, high efficiency, easiness in use, compatibility, adaptation and the like, can help a user to reduce cloud cost, improves cloud service availability and cloud service quality, and has gained general acceptance in the industry. The container arrangement technology represented by K8s is characterized in that compared with a virtual machine, the created container such as Docker can realize lighter, agile and efficient calling of basic resources, and the container arrangement technology also becomes an important basic technology of cloud primordial.

Virtualization is used as a new technology, which can map a physical network card, a graphics processor (Graphics Processing Unit, GPU) and the like into one or more virtual functions, and support finer granularity allocation in a container through a virtualization instance, so that the utilization efficiency of resources is improved. However, in the current resource pool, since the container arranging system such as K8s is difficult to obtain the topological relation of the processor (Central Processing Unit, CPU), the memory, the network card, the GPU and other computing, network and storage components in the host, the affinity problem may exist in the resources allocated to the container, so that unnecessary data transmission is generated when the container performs the computing task, and time delay and resource waste are caused. And this problem is more pronounced than the allocation of traditional physical resources, as virtualization generates finer grained resources.

For the above-described problems, no effective solution has been proposed.

Disclosure of Invention

The method for generating the host PCIe topological structure and the method for allocating the container resources at least solve the technical problems that in the related art, the container arrangement system such as K8s is difficult to acquire topological relations of components such as host computing and networks, allocated resources possibly have affinity allocation problems, unnecessary data transmission is generated when the container executes computing tasks, and time delay is increased and resources are wasted.

According to an aspect of an embodiment of the present application, there is provided a method for generating a host PCIe topology, including: respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a topological structure formed by a network card, a network card virtual function, a graphic processor and a virtual graphic processor which are associated by the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the non-unified memory access NUMA node; acquiring an association relationship between the PCIe transit unit and the NUMA node; and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host.

According to another aspect of the embodiment of the present application, there is provided a method for allocating container resources, including: obtaining a resource to be allocated of a container; and under the condition that the allocatable resources of the host meet the resources to be allocated, performing resource allocation on the container based on a PCIe topological structure of the host, wherein the PCIe topological structure of the host is obtained based on the method of any one of the above.

According to another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor, and a memory storing a program, wherein the program comprises instructions that when executed by the processor cause the processor to perform the method of any of the above.

According to another aspect of an embodiment of the present application, there is provided a non-transitory machine-readable medium storing computer instructions for causing a computer to perform the method of any one of the above.

In the embodiment of the application, a first topological structure and a second topological structure are respectively acquired; the first topological structure is used for representing a network card, a network card virtual function, a graphic processor and a topological structure formed by the virtual graphic processor which are associated with the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the NUMA node; acquiring an association relationship between a PCIe switching unit and a NUMA node; and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host. That is, the embodiment of the application realizes the layer-by-layer nearby matching of the CPU, the memory, the network card, the GPU and the virtualized examples thereof by constructing the host resource multi-layer topological structure from the CPU interface unit and the NUMA node to the PCIe switching unit, thereby solving the technical effects that the distributed resources possibly have the affinity distribution problem, so that the unnecessary data transmission is generated when the container executes the calculation task, the time delay is increased and the resource waste is caused, the affinity distribution problem of the resources in the container can be solved to a great extent, the unnecessary data transmission is avoided, and the calculation performance of the container is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the application, from which other embodiments can be obtained for a person skilled in the art without inventive effort.

FIG. 1 is a schematic diagram of a state of randomly allocated container resources within a host according to an alternative embodiment of the present application;

FIG. 2 is a flowchart of a method for generating a host PCIe topology according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a host PCIe topology and a nearby allocation of resources provided by an alternative embodiment of the present application;

FIG. 4 is a schematic diagram of a host PCIe topology generation process provided by an alternative embodiment of the present application;

FIG. 5 is a flowchart of a method for allocating container resources according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a container resource allocation process provided by an alternative embodiment of the present application;

FIG. 7 is a schematic diagram of a host PCIe topology and a nearby allocation of resources under a single NUMA node according to an alternative embodiment of the application;

Fig. 8 shows a block diagram of a hardware structure of an electronic device.

Detailed Description

Embodiments of the present embodiment will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present embodiments are illustrated in the accompanying drawings, it is to be understood that the present embodiments may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present embodiments. It should be understood that the drawings and the embodiments of the present embodiments are presented for purposes of illustration only and are not intended to limit the scope of the embodiments.

First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:

k8s: kubernetes (also known as k8s or "kube") is an open-source container orchestration platform that can automatically perform many of the manual operations involved in deploying, managing, and expanding containerized applications.

Kubelet: the service running on the K8s node may read the container manifest (container manifest) to ensure that the specified container is started and running.

A container: a container is a form of operating system virtualization that can use one container to run all content from a small micro-service or software process to a large application; unlike server or computer virtualization methods, containers do not contain operating system images, so they are more lightweight and portable, and multiple containers can be deployed as one or more container clusters in a large application deployment.

And (3) virtualization: the computer virtualization technology is a resource management technology, and is characterized in that various entity resources (CPU, GPU, memory, network card and the like) of a computer are abstractly converted into resource instances which can be segmented and recombined, so that the physical structure obstacle is broken, and a user can apply the computer resources in a better mode than the original configuration.

SR-IOV: single Root I/O Virtualization (Single Root I/O Virtualization) is an extension of the PCIe specification, allowing devices (e.g., network adapters) to split access to their resources among various PCIe hardware functions.

VF: the virtual function (Virtual Functions), which is a lightweight PCIe function on a network adapter supporting single root I/O virtualization (SR-IOV), the VF is associated with a PCIe Physical Function (PF) on the network adapter, representing a virtualized instance of the network adapter.

PCIe: PCI-Express (Peripheral Component Interconnect Express) is a high-speed serial computer expansion bus standard, and belongs to high-speed serial point-to-point dual-channel high-bandwidth transmission.

Topology: topology is a discipline for studying the properties of geometric figures or spaces that remain unchanged after continuously changing shapes, and it only considers the positional relationships between objects, regardless of their shape and size, and computer topology refers to the abstract discussion of the method, form and geometry of interconnecting various components in a computer system by describing them with two most basic graphical elements, points and lines, in geometry.

PCIe Switch: is a PCIe to PCIe bridge that provides expansion or aggregation capabilities and allows more devices to connect to one PCle port.

NUMA/Numa node: non-uniform memory access (Non Uniform Memory Access) is a computer memory design for multiple processors, where the processor accesses its local memory faster than non-local memory under a NUMA node.

CPU Socket: a CPU socket is an interface for mounting a CPU, and is generally composed of metal pins and sockets for connecting the CPU and a motherboard.

RDMA: remote memory direct access (Remote Direct Memory Access) is a technique that implements zero copy and kernel bypass techniques through hardware to provide high performance network data access.

Optionally, in the default resource allocation scheme of K8s, the Device Plugin registers and reports each resource in the node to the kubelet at the start-up, and then reports the number of each resource in the node to the K8s scheduler through the kubelet. When the container is created, the K8s selects the node meeting the resource requirement and sends the quantity of the resources to be allocated to the corresponding kubelet, and at the moment, the kubelet directly selects the corresponding quantity of the resources from the recorded equipment list in sequence for allocation. Because of the randomness of the resource reporting and container generation order, the actual effect of the default allocation scheme is equivalent to random allocation of each item of resource.

Under the random allocation policy, the affinity state of resources such as a CPU, a Memory (Memory), a network card (NIC, VF), a graphics card (GPU, vGPU) and the like in the container cannot be determined, and there are various possibilities. For example, some possible allocation scenarios in a virtualized environment are shown in FIG. 1:

(1) The VF and vGPU of the container belong to the same PCIe Switch, and the CPU, memory belong to the same NUMA node, as shown by the gray color blocks in FIG. 1. This is in line with the affinity feature, and minimizes unnecessary data transmission, improves the computational efficiency of tasks within the container, and reduces costs.

(2) The VFs and vGPUs of a container belong to different PCIe switches under the same NUMA node, as shown by the left diagonal color blocks in FIG. 1. At this point, unnecessary data transfers across the PCIe Switch occur between the VF and vGPU, resulting in reduced performance.

(3) VF and vGPU of the container belong to different NUMA nodes under the same CPU Socket, as shown by the right-hand slashed color blocks in FIG. 1. At this point, unnecessary data transfer across NUMA nodes occurs between the VF and the vGPU, resulting in further performance degradation.

(4) VF and vGPU of the container belong to different CPU sockets as shown by the cross-line color blocks in FIG. 1, or the CPU, memory and VF of the container belong to different CPU sockets as shown by the dotted-line color blocks in FIG. 1. At this time, unnecessary inter-CPU Socket data transmission occurs between the VF and vGPU or between the VF and vGPU and the CPU and between the memory, resulting in more affected performance.

In the virtualized scenario, the affinity problem is more prominent when randomly allocated than in the traditional physical resource scenario due to finer granularity of resources. In one aspect, the increase in resource instances reduces the probability that the allocation results will conform to the affinity characteristics under the random strategy. On the other hand, resources in a single virtualized instance are relatively few, so bandwidth contention and performance degradation due to additional transmissions can be more severe.

In current K8s implementations, mechanisms are provided for resource affinity allocation according to NUMA nodes. However, since the affinity support is limited to NUMA node layers, on one hand, the strategy will be completely disabled for a single NUMA node model; on the other hand, the policy cannot realize the affinity of the PCIe Switch layer, or the affinity of the CPU Socket layer when the same NUMA node condition is not satisfied, but the number of VF and vGPU in the virtualized scene is large and the number of single resources is small, so that the policy is limited greatly because the policy needs to realize finer granularity of nearby allocation.

In order to overcome the above-mentioned drawbacks, an embodiment of the present application provides a method for generating a host PCIe topology, and fig. 2 is a flowchart of the method for generating a host PCIe topology provided by the embodiment of the present application, as shown in fig. 2, the method includes the following steps:

Step S202, respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a network card, a network card virtual function, a graphic processor and a topological structure formed by the virtual graphic processor which are associated with the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the NUMA node;

wherein the CPU Core is also called CPU Core; VF is a virtualized instance of the network card; a vGPU is a virtualized instance of a graphics processor. In addition, in the implementation process, the PCIe switching unit is configured to connect one PCIe bus to a plurality of PCIe devices, so as to implement data exchange and communication between the plurality of devices, where the PCIe switching unit includes, but is not limited to, PCIe Switch; the CPU interface unit is used for connecting the CPU and the main board, and the CPU interface unit comprises, but is not limited to, a CPU Socket.

Step S204, obtaining the association relationship between the PCIe switching unit and the NUMA node;

step S206, the first topological structure and the second topological structure are associated based on the association relation, and the PCIe topological structure of the host is obtained.

The PCIe topology of the host is also called a host resource multi-layer topology, and is used to describe the topology relationship of the host computing, network, storage, and other resources.

In the embodiment of the application, the layer-by-layer nearby matching of the CPU, the memory, the network card, the GPU and the virtualized examples thereof is realized by constructing the host resource multi-layer topological structure from the CPU interface unit and the NUMA node to the PCIe switching unit, so that the technical effects that the distributed resources possibly have the affinity distribution problem, the unnecessary data transmission is generated when the container executes the calculation task, the time delay is increased and the resource waste is caused, the affinity distribution problem of the resources in the container can be solved to a great extent, the unnecessary data transmission is avoided, and the calculation performance of the container is improved are further solved.

As an alternative embodiment, in a specific implementation process, the acquiring the first topology includes the following steps:

step 21, obtaining the PCIe path of the network card, NUMA nodes to which the network card belongs, the virtual function of the network card and the mapping relation between the network card and the virtual function of the network card;

step 22, obtaining the PCIe path of the graphics processor, NUMA nodes to which the graphics processor belongs, the virtual graphics processor and the mapping relation between the graphics processor and the virtual graphics processor;

Step 23, identifying the network card and the graphics processor under the same PCIe switching unit based on the PCIe path of the network card and the PCIe path of the graphics processor;

and step 24, acquiring a first topological structure based on the NUMA node to which the network card belongs, the network card virtual function, the mapping relation between the network card and the network card virtual function, the mapping relation between the NUMA node to which the graphics processor belongs, the virtual graphics processor, the graphics processor and the virtual graphics processor and the association relation between the network card and the graphics processor under the same PCIe switching unit.

PCIe lanes refer to data transmission lanes on the PCI Express bus. PCIe bus is a high-speed serial bus that transfers data over multiple lanes (referred to as "lanes"). One channel has a pair of differential signal code lines for transmitting data and clock signals. PCIe paths typically consist of one or more transmission paths, each with a separate data transmission path. Data may be transmitted across channels to increase overall linewidth. The PCIe path also includes devices and interfaces connected to the bus that can transfer data from one lane to another. For example, the PCIe path may be "PCI 0000:17:17:00.0/0000:18:00.0/0000:19:04.0," where "PCI0000:17" indicates that the device is connected to a bus with a PCI bus number of 17; "0000:17:00.0" indicates that this device is the first device on PCI bus 17, function number 0; "0000:18:00.0" means that the PCI bus number connected to the upper level device is 18, the device number is 0, and the function number is 0; "0000:19:04.0" means that the PCI bus number connected to the higher-level device is 19, the device number is 4, and the function number is 0.

The PCIe path uses Bus/Device/Function (BDF) numbers to describe PCI and PCIe devices, including hexadecimal Bus numbers, device numbers, and Function numbers.

Further, the step 24 includes the following implementation processes:

step 2401, acquiring a first sub-topology structure based on an association relationship between a network card and a graphics processor under the same PCIe switching unit;

step 2402, associating the network card virtual function to the first sub-topology structure based on the mapping relationship between the network card and the network card virtual function, and associating the NUMA node to which the network card belongs to the first sub-topology structure to obtain a first target topology structure;

step 2403, associating the virtual graphics processor to the first sub-topology structure based on a mapping relationship between the graphics processor and the virtual graphics processor, and associating the NUMA node to which the graphics processor belongs to the first sub-topology structure to obtain a second target topology structure;

step 2404, associating the first target topology with the second target topology to obtain the first topology.

In the embodiment of the application, the connection mode and the topology structure (the topology structure formed by the network card, the network card virtual function, the graphic processor and the virtual graphic processor which are associated under the PCIe switching unit) of the computer hardware are accurately obtained, so that the performance optimization and the resource management of the container are better carried out.

As an alternative embodiment, in the implementation process, the acquiring the second topology includes the following steps:

step 31, acquiring a CPU interface unit, NUMA nodes contained in the CPU interface unit, a CPU core, NUMA nodes to which the CPU core belongs, a memory and NUMA nodes to which the memory belongs;

step 32, obtaining the second topology structure based on the CPU interface unit, the NUMA node contained in the CPU interface unit, the CPU core, the NUMA node to which the CPU core belongs, the memory and the association relation of the NUMA node to which the memory belongs.

Further, the step 32 includes the following specific implementation processes:

step 3201, acquiring a second sub-topology structure based on the association relationship between the CPU interface unit and NUMA nodes contained in the CPU interface unit;

step 3202, associating the CPU core and the memory to the second sub-topology structure based on the NUMA node to which the CPU core belongs and the NUMA node to which the memory belongs, respectively, to obtain the second topology structure.

In the embodiment of the application, the connection mode and the topology structure (the topology structure formed by the CPU interface unit, the CPU core and the memory which are associated under the NUMA node) of the computer hardware are accurately obtained, so that the performance optimization and the resource management of the container are better carried out.

In an alternative embodiment of the present application, first, the process of generating the PCIe topology of the resources in the host is shown in fig. 3, and the main steps include:

step 301: reading network card information through a host PCIe structure or reading a PCIe path through a network card file, acquiring information of a NUMA node to which the network card belongs, reading a mapping relation between the network card and the VF on the basis, and establishing association between the network card and the VF;

step 302: reading GPU information through a host PCIe structure or reading a PCIe path through a GPU file, acquiring information of a NUMA node to which the GPU belongs, reading a mapping relation between the GPU and the vGPU on the basis, and establishing association between the GPU and the vGPU;

step 303: on the basis of the steps, identifying the network card and the GPU under the same PCIe Switch through the PCIe path, and constructing sub-topology structures of each PCIe Switch and associated network card, GPU and virtualized instance thereof;

step 304: reading CPU and memory information of a host, including CPU Core, CPU Socket, memory and NUMA node data, constructing topology structures of the CPU Socket and the NUMA node in the host, taking the topology structures as top layer parts of a PCIe topology structure of the host, and associating the CPU Core and the memory;

step 305: and (3) associating the PCIe Switch sub-topology structure generated in the step (3) with the CPU Socket generated in the step (4) and the NUMA node top-level topology structure by matching with the NUMA node information to generate a complete topology structure.

It should be noted that, the host PCIe structure includes, but is not limited to, a CPU Socket layer, a NUMA node layer, and a PCIe Switch layer, where each layer includes one or more nodes, each node represents a component of the layer, and a connection line between each node represents an association relationship between corresponding components of each node.

The schematic diagram of the host PCIe topology structure obtained in the scheme is shown in FIG. 4, and in actual use, the topology structure can be also represented by a JSON and other structural modes.

Aiming at the situation that the K8s container resource allocation lacks referenceable information, the scheme firstly generates a host PCIe multi-layer topological structure, and solves the problem that allocation optimization basis is ambiguous. On the basis, for the affinity problem possibly caused by the K8s default random scheduling strategy, the scheme can realize layer-by-layer nearby matching of resources such as network, calculation and the like based on the generated PCIe topological structure, so that unnecessary transmission caused by the container resource affinity problem is avoided to the greatest extent, and further the efficiency is improved and the communication cost is reduced. Compared with a scheduling strategy of a new version K8s based on NUMA nodes, the method and the device not only can adapt to a host with a single NUMA node structure, but also can realize finer granularity nearby allocation at a PCIe Switch layer and a CPU Socket layer under a multi-NUMA node environment, so that a better affinity effect is achieved.

In a virtualized scene with more resource instances and finer granularity, compared with a default strategy of K8s, the scheme can greatly improve the probability of affinity allocation, and further focuses limited resources on actual tasks by avoiding additional transmission, so that the performance of a container is improved more remarkably.

According to another aspect of the embodiment of the present application, there is provided a method for allocating container resources, and fig. 5 is a flowchart of the method for allocating container resources according to the embodiment of the present application, as shown in fig. 5, where the method includes the following steps:

step S502, obtaining the resources to be allocated of the container;

such resources to be allocated include, but are not limited to, the CPU, memory, network card virtual functions, virtual graphics processor, etc. required by the container.

In step S504, in the case that the allocatable resources of the host meet the resources to be allocated, the resource allocation is performed on the container based on the PCIe topology structure of the host, where the PCIe topology structure of the host is obtained based on the method of any one of the above.

The above-mentioned allocable resources include, but are not limited to, CPU, memory, network card virtual function and virtual graphics processor, etc. remained in each NUMA node.

As an alternative embodiment, in the implementation, the resource allocation for the container based on the PCIe topology of the host includes the following steps:

step 41, determining NUMA nodes with residual resources larger than preset residual resources as target NUMA nodes;

optionally, the PCIe topology structure of the host includes a plurality of NUMA nodes, the NUMA node with the most residual resources may be screened from the plurality of NUMA nodes, and the NUMA node with the most residual resources may be used as the target NUMA node.

Step 42, allocating the allocatable resources to the container based on the target NUMA node.

In an embodiment of the application, allocating resources for a container is achieved by utilizing NUMA nodes with residual resources greater than a preset residual resource.

As an alternative embodiment, when the allocable resource is a CPU, the step 42 includes the following implementation procedures:

step 4201, determining whether the target NUMA node has idle CPU cores and whether the number of idle CPU cores meets the number of CPU cores required by the container;

step 4202, if the target NUMA node has idle CPU cores and the number of idle CPU cores meets the number of CPU cores required by the container, allocating the idle CPU cores to the container;

Step 4203, if the target NUMA node does not have any idle CPU cores or the number of idle CPU cores does not meet the number of CPU cores required by the container, allocating CPU cores under the same CPU interface unit as the target NUMA node for the container;

further, allocating a CPU core under the same CPU interface unit as the target NUMA node to the container includes the following implementation steps:

step 4204, determining whether the number of CPU cores under the same CPU interface unit satisfies the number of CPU cores required for the container;

step 4205, if the number of CPU cores under the same CPU interface unit meets the number of CPU cores required by the container, allocating the CPU cores under the same CPU interface unit to the container;

in step 4206, if the number of CPU cores under the same CPU interface unit does not meet the number of CPU cores required for the container, then the CPU cores under a different CPU interface unit than the target NUMA node are allocated to the container.

In the embodiment of the application, reasonable and effective allocation of CPU resources can be realized.

As an alternative embodiment, when the allocable resource is memory, the step 42 further includes the following implementation process:

Step 4207, determining whether the remaining memory of the target NUMA node meets the memory required by the container;

step 4208, if the remaining memory of the target NUMA node meets the memory required by the container, allocating the remaining memory of the target NUMA node to the container;

in step 4209, if the remaining memory of the target NUMA node does not meet the memory required by the container, then the remaining memory of other NUMA nodes under the same CPU interface unit is allocated for the container.

Further, allocating the remaining memory of other NUMA nodes under the same CPU interface unit for the container includes the following implementation steps:

step 4210, judging whether the remaining memory of other NUMA nodes under the same CPU interface unit meets the memory required by the container;

step 4211, if the remaining memory of the other NUMA nodes meets the memory required by the container, allocating the remaining memory of the other NUMA nodes to the container;

in step 4212, if the remaining memory of the other NUMA nodes does not meet the memory required by the container, then memory under a different CPU interface unit than the target NUMA node is allocated to the container.

In the embodiment of the application, the reasonable and effective allocation of the memory resources can be realized.

As an alternative embodiment, when the allocatable resources are network card virtual functions and virtual graphics processors, before allocating the allocatable resources to the container based on the target NUMA node, the method further comprises the following implementation:

Step 52, judging whether the network card virtual function and the virtual graphic processor of the same PCIe switching unit meet the network card virtual function and the virtual graphic processor required by the container;

in step 54, if the same network card virtual function and virtual graphics processor of the PCIe switch unit meet the network card virtual function and virtual graphics processor required by the container, then the same network card virtual function and virtual graphics processor of the PCIe switch unit are allocated to the container.

It should be noted that, if the network card virtual function and the virtual graphics processor of the same PCIe switching unit do not satisfy the network card virtual function and the virtual graphics processor required by the container, the network card virtual function and the virtual graphics processor of the target NUMA node are allocated to the container, that is, the content related to step 42 is executed.

As an alternative embodiment, when the allocatable resource is a network card virtual function and a virtual graphics processor, the step 42 further includes the following implementation steps:

step 4213, judging whether the network card virtual function and the virtual graphics processor of the target NUMA node meet the network card virtual function and the virtual graphics processor required by the container;

step 4214, if the network card virtual function and the virtual graphics processor of the target NUMA node meet the network card virtual function and the virtual graphics processor required by the container, distributing the network card virtual function and the virtual graphics processor of the target NUMA node to the container;

In step 4215, if the network card virtual function and virtual graphics processor of the target NUMA node do not meet the network card virtual function and virtual graphics processor required by the container, the network card virtual function and virtual graphics processor under the same CPU interface unit as the target NUMA node are allocated to the container.

Further, allocating the network card virtual function and the virtual graphics processor under the same CPU interface unit as the target NUMA node to the container includes the following specific implementation processes:

step 4216, judging whether the network card virtual function and the virtual graphic processor under the same CPU interface unit as the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container;

4217, if the network card virtual function and the virtual graphics processor under the same CPU interface unit as the target NUMA node meet the network card virtual function and the virtual graphics processor required by the container, distributing the network card virtual function and the virtual graphics processor under the same CPU interface unit to the container;

in step 4218, if the network card virtual function and virtual graphics processor under the same CPU interface unit as the target NUMA node do not satisfy the network card virtual function and virtual graphics processor required by the container, the network card virtual function and virtual graphics processor under the different CPU interface unit are allocated to the container.

The virtual functions of the network card and the virtual graphics processor required by the container can be embodied in various aspects such as quantity, size and type.

In the embodiment of the application, the reasonable and effective allocation of VF and vGPU resources can be realized.

As an alternative embodiment, the above step 41 further includes the following implementation procedures:

step 4101, obtaining the remaining resources of the NUMA node and the weight coefficients corresponding to the remaining resources;

step 4102, calculating to obtain a residual resource evaluation value of the NUMA node based on the residual resources and the weight coefficients corresponding to the residual resources;

step 4103, obtaining a preset residual resource evaluation value corresponding to the preset residual resource;

in step 4104, when the residual resource evaluation value is greater than the preset residual resource evaluation value, determining that the residual resource of the NUMA node is greater than the preset residual resource, and taking the NUMA node as the target NUMA node.

The remaining resources and the preset remaining resources may be evaluated using a remaining resource evaluation value.

The remaining resources, namely the allocatable resources corresponding to the NUMA nodes, include, but are not limited to, remaining CPU, memory, network card virtual functions, virtual graphics processor and the like of each NUMA node. In a specific implementation process, a corresponding weight coefficient may be set for the remaining resources, for example, the weight coefficient of the CPU is 0.3, the weight coefficient of the memory is 0.3, the weight coefficient of the network card virtual function is 0.2, and the weight coefficient of the virtual graphics processor is 0.2, so that the remaining resource evaluation value of the NUMA node is 0.3×the number of cpus+0.3×the memory size+0.2×the number of network card virtual functions+0.2×the number of virtual graphics processors. It should be noted that the weight coefficient may be flexibly set according to the requirements of the application scenario.

In the embodiment of the application, the target NUMA node is acquired more accurately by calculating and comparing the residual resource evaluation values.

As an optional embodiment, if the remaining resources of the plurality of NUMA nodes are greater than the preset remaining resources, the NUMA node with the largest remaining resource evaluation value is taken as the target NUMA node.

It should be noted that, the residual resource evaluation value is an index obtained by comprehensively calculating the quantity and quality of resources such as residual CPU, memory, storage and the like of the NUMA node, and the higher the evaluation value is, the more sufficient the residual resources of the node are, so that the task requirement can be better satisfied. Therefore, the NUMA node with the largest evaluation value is selected as the target node, so that the system resource can be utilized to the greatest extent, and the execution efficiency and performance of the task are improved.

In an alternative embodiment of the present application, based on the generated PCIe topology of the host, the nearby allocation of container resources may be performed, where the process is shown in fig. 6, and the main steps include:

step 601: after kubelet receives the number of the resources to be allocated sent by K8s, judging whether the existing allocable resources in the host meet the requirements, if any resources are insufficient, reporting errors directly and returning;

It should be noted that, based on the K8s node scheduling mechanism kubelet component periodically reports the resource condition of the node to an application programming interface Server (API Server), the node receives the requirement of the resource to be allocated, at this time, the K8s has been determined in advance, and the total amount of the node resources can meet the requirement.

Step 602: selecting the NUMA node with the largest residual resources for pre-allocation;

step 603: the distribution CPU: firstly selecting spare CPU cores in the same NUMA node, namely, insufficient quantity, and sequentially selecting CPU cores of the same CPU Socket and the cross CPU Socket;

step 604: memory allocation: if the NUMA node has enough memory, selecting, otherwise, sequentially trying to select the memory with and across the CPU Socket;

step 605: VF and vGPU are allocated: traversing each PCIe Switch in the NUMA node, if the VF and vGPU are enough, selecting, otherwise, sequentially trying to select the same NUMA node, the same CPU Socket and the cross-CPU Socket;

step 606: kubelet obtains the selected Device specific information from the Device plug in and allocates container resources.

In the resource allocation process, the above steps 603, 604 and 605 may be performed simultaneously or separately.

Alternatively, for a single NUMA node host, i.e., where all of one or more CPU sockets are logically treated as the same NUMA node, the NUMA node layer in this scheme may be skipped, and a PCIe topology including a CPU Socket layer and a PCIe Switch layer may be constructed, as shown in FIG. 7. In the allocation process of CPU, memory, network card and GPU resources, the search process for different NUMA nodes can be skipped on the basis of the original technical scheme, for example, the matching is directly tried in the same CPU Socket under the condition that the same PCIe Switch has no allocable resources, and then the nearby allocation of container resources in the PCIe Switch and CPU Socket layers is completed.

In the RDMA virtualization scene, the method is directed to the requirement that a K8s container arranging system nearby distributes host resources when constructing a container, and the PCIe topology comprising CPU Socket, NUMA node and PCIe Switch hierarchical structure is generated by reading PCIe information of components such as CPU, network card, GPU and the like and mapping relation between the PCIe information and a virtualized instance; on the basis, the nearby scheduling of the CPU, the memory, the VF and the vGPU is realized, and the problem of affinity of resources such as network and calculation in a container is solved. Compared with the existing random scheduling, the scheme can improve the calculation efficiency and reduce the communication cost; compared with the affinity scheduling based on NUMA nodes only, the method can realize the nearby allocation of resources in CPU Socket and PCIe Switch layers, can adapt to a single NUMA node host, and has more remarkable performance advantages in a virtualized environment.

The embodiment of the application provides a device for generating a host PCIe topological structure, which comprises the following steps:

the first acquisition module is used for respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a network card, a network card virtual function, a graphic processor and a topological structure formed by the virtual graphic processor which are associated with the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the non-unified memory access NUMA node;

the second acquisition module is used for acquiring the association relation between the PCIe switching unit and the NUMA node;

and the association module is used for associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host.

In the embodiment of the application, the generating device of the host PCIe topological structure realizes the layer-by-layer nearby matching of the CPU, the memory, the network card, the GPU and the virtualized instance thereof by constructing the host resource multi-layer topological structure from the CPU interface unit and the NUMA node to the PCIe switching unit, thereby solving the technical effects that the distributed resources possibly have the affinity distribution problem because the container arranging system such as K8s is difficult to acquire the topological relation of the components such as host computing and network, and the like, so that the container generates unnecessary data transmission when executing computing tasks, causes time delay increase and resource waste, and can solve the affinity distribution problem of the resources in the container to a great extent, avoid unnecessary data transmission and improve the computing performance of the container.

It should be noted that, the first acquiring module, the second acquiring module and the associating module correspond to steps S202 to S206 in the method embodiment, and the modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the method embodiment. It should be noted that the above module may be used as a part of the apparatus to operate in the electronic device provided in the embodiment of the present application.

According to another aspect of an embodiment of the present application, there is provided a container resource allocation apparatus including:

the third acquisition module is used for acquiring the resources to be allocated of the container;

and the allocation module is used for allocating the resources to the container based on the PCIe topology structure of the host under the condition that the allocatable resources of the host meet the resources to be allocated, wherein the PCIe topology structure of the host is obtained based on the method of any one of the above steps.

In the embodiment of the application, the container resource allocation device realizes the layer-by-layer nearby matching of the CPU, the memory, the network card, the GPU and the virtualized examples thereof by constructing the host resource multi-layer topological structure from the CPU interface unit and the NUMA node to the PCIe switching unit, thereby solving the technical effects that the allocated resources possibly have the affinity allocation problem because the container arrangement system such as K8s is difficult to acquire the topological relation of the components such as host computer calculation and network, and the like, so that the unnecessary data transmission is generated when the container executes the calculation task, the time delay is increased and the resource is wasted, and achieving the technical effects of solving the affinity allocation problem of the resources in the container to a great extent, avoiding the unnecessary data transmission and improving the calculation performance of the container.

Here, it should be noted that the third obtaining module and the allocating module correspond to steps S502 to S504 in the method embodiment, and the modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the method embodiment. It should be noted that the above module may be used as a part of the apparatus to operate in the electronic device provided in the embodiment of the present application.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, which when executed by the at least one processor is adapted to cause an electronic device to perform a method of an embodiment of the application.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network. Such electronic devices include, but are not limited to, computer devices.

Optionally, in this embodiment, the computer device may include: a memory and a processor, the memory storing a computer program; a processor for executing a computer program stored in the memory, the computer program when run causing the processor to perform the method steps of any one of the above.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for generating a host PCIe topology in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the method for generating a host PCIe topology described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located relative to the processor, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Optionally, the processor may call the information and the application program stored in the memory through the transmission device to execute the following steps: respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a network card, a network card virtual function, a graphic processor and a topological structure formed by the virtual graphic processor which are associated with the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a host in a CPU interface unit, a CPU core and a memory which are associated with the NUMA node; acquiring an association relationship between a PCIe switching unit and a NUMA node; and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host.

Optionally, in this embodiment, the above processor may further execute program code for: acquiring a first topology, comprising: acquiring a PCIe path of the network card, NUMA nodes to which the network card belongs, a network card virtual function and a mapping relation between the network card and the network card virtual function; acquiring a PCIe path of the graphics processor, NUMA nodes to which the graphics processor belongs, a virtual graphics processor and a mapping relation between the graphics processor and the virtual graphics processor; based on the PCIe path of the network card and the PCIe path of the graphics processor, identifying the network card and the graphics processor under the same PCIe switching unit; and acquiring a first topological structure based on the NUMA node to which the network card belongs, the network card virtual function, the mapping relation between the network card and the network card virtual function, the mapping relation between the NUMA node to which the graphic processor belongs, the virtual graphic processor, the graphic processor and the virtual graphic processor and the association relation between the network card and the graphic processor under the same PCIe switching unit.

Optionally, in this embodiment, the above processor may further execute program code for: based on a NUMA node to which the network card belongs, a network card virtual function, a mapping relation between the network card and the network card virtual function, a NUMA node to which the graphics processor belongs, a virtual graphics processor, a mapping relation between the graphics processor and the virtual graphics processor, and an association relation between the network card and the graphics processor under the same PCIe switching unit, obtaining a first topological structure comprises: acquiring a first sub-topology structure based on the association relationship between a network card and a graphics processor under the same PCIe switching unit; associating the network card virtual function to a first sub-topology structure based on a mapping relation between the network card and the network card virtual function, and associating NUMA nodes to which the network card belongs to the first sub-topology structure to obtain a first target topology structure; associating the virtual graphics processor to a first sub-topology structure based on a mapping relation between the graphics processor and the virtual graphics processor, and associating NUMA nodes to which the graphics processor belongs to the first sub-topology structure to obtain a second target topology structure; and correlating the first target topological structure with the second target topological structure to obtain the first topological structure.

Optionally, in this embodiment, the above processor may further execute program code for: acquiring a second topology, comprising: acquiring a CPU interface unit, NUMA nodes contained in the CPU interface unit, a CPU core, NUMA nodes to which the CPU core belongs, a memory and NUMA nodes to which the memory belongs; and acquiring a second topological structure based on the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, the NUMA nodes to which the CPU core belongs, the memory and the association relation of the NUMA nodes to which the memory belongs.

Optionally, in this embodiment, the above processor may further execute program code for: based on the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, the NUMA nodes to which the CPU core belongs, the memory and the association relation of the NUMA nodes to which the memory belongs, the method for acquiring the second topological structure comprises the following steps: acquiring a second sub-topology structure based on the association relationship between the CPU interface unit and NUMA nodes contained in the CPU interface unit; and respectively associating the CPU core and the memory to a second sub-topological structure based on the NUMA node to which the CPU core belongs and the NUMA node to which the memory belongs, so as to obtain the second topological structure.

In addition, the memory may be used to store software programs and modules, such as program instructions/modules corresponding to the container resource allocation method and apparatus in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby performing various functional applications and data processing, that is, implementing the container resource allocation method described above.

Optionally, the processor may call the information and the application program stored in the memory through the transmission device to execute the following steps: obtaining a resource to be allocated of a container; and under the condition that the allocatable resources of the host meet the resources to be allocated, allocating the resources to the container based on the PCIe topology structure of the host, wherein the PCIe topology structure of the host is obtained based on the method of any one of the above.

Optionally, in this embodiment, the above processor may further execute program code for: performing resource allocation on the container based on the PCIe topology structure of the host, wherein the method comprises the following steps: determining NUMA nodes with residual resources larger than preset residual resources as target NUMA nodes; the allocable resources are allocated to the container based on the target NUMA node.

Optionally, in this embodiment, the above processor may further execute program code for: when the allocatable resource is a CPU, allocating the allocatable resource to the container based on the target NUMA node, comprising: judging whether an idle CPU core exists in the target NUMA node and whether the number of the idle CPU cores meets the number of the CPU cores required by the container; if the target NUMA node has idle CPU cores and the number of the idle CPU cores meets the number of the CPU cores required by the container, the idle CPU cores are distributed to the container; if the target NUMA node does not have idle CPU cores or the number of idle CPU cores does not meet the number of CPU cores required by the container, the CPU cores under the same CPU interface unit as the target NUMA node are distributed for the container.

Optionally, in this embodiment, the above processor may further execute program code for: assigning a CPU core to a container under the same CPU interface unit as a target NUMA node, comprising: judging whether the number of CPU cores under the same CPU interface unit meets the number of CPU cores required by the container; if the number of the CPU cores under the same CPU interface unit meets the number of the CPU cores required by the container, distributing the CPU cores under the same CPU interface unit to the container; if the number of CPU cores under the same CPU interface unit does not meet the number of CPU cores required by the container, the CPU cores under the CPU interface unit which is different from the target NUMA node are distributed to the container.

Optionally, in this embodiment, the above processor may further execute program code for: when the allocable resource is memory, allocating the allocable resource to the container based on the target NUMA node includes: judging whether the residual memory of the target NUMA node meets the memory required by the container; if the residual memory of the target NUMA node meets the memory required by the container, the residual memory of the target NUMA node is distributed to the container; if the residual memory of the target NUMA node does not meet the memory required by the container, the residual memory of other NUMA nodes under the same CPU interface unit is allocated for the container.

Optionally, in this embodiment, the above processor may further execute program code for: allocating remaining memory of other NUMA nodes under the same CPU interface unit for the container includes: judging whether the residual memories of other NUMA nodes under the same CPU interface unit meet the memory required by the container; if the residual memories of other NUMA nodes meet the memory required by the container, distributing the residual memories of other NUMA nodes to the container; and if the residual memories of other NUMA nodes do not meet the memory required by the container, distributing the memory under the CPU interface unit which is different from the target NUMA node to the container.

Optionally, in this embodiment, the above processor may further execute program code for: in the case where the allocatable resources are network card virtual functions and virtual graphics processors, the method further comprises, prior to allocating the allocatable resources to the container based on the target NUMA node: judging whether the network card virtual function and the virtual graphic processor of the same PCIe switching unit meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphics processor of the same PCIe switching unit meet the network card virtual function and the virtual graphics processor required by the container, the network card virtual function and the virtual graphics processor of the same PCIe switching unit are distributed to the container.

Optionally, in this embodiment, the above processor may further execute program code for: when the allocatable resources are network card virtual functions and virtual graphics processors, allocating the allocatable resources to the container based on the target NUMA node comprises: judging whether the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor of the target NUMA node to the container; if the network card virtual function and the virtual graphic processor of the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, the network card virtual function and the virtual graphic processor which are under the same CPU interface unit as the target NUMA node are allocated for the container.

Optionally, in this embodiment, the above processor may further execute program code for: assigning to the container a network card virtual function and a virtual graphics processor under the same CPU interface unit as the target NUMA node, comprising: judging whether the network card virtual function and the virtual graphic processor under the same CPU interface unit as the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor under the same CPU interface unit to the container; if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, the network card virtual function and the virtual graphic processor under different CPU interface units are distributed to the container.

Optionally, in this embodiment, the above processor may further execute program code for: determining the NUMA node with the residual resource larger than the preset residual resource as the target NUMA node comprises the following steps: acquiring the residual resources of the NUMA node and the weight coefficients corresponding to the residual resources; calculating to obtain a residual resource evaluation value of the NUMA node based on the residual resources and the weight coefficients corresponding to the residual resources; acquiring a preset residual resource evaluation value corresponding to the preset residual resource; and when the residual resource evaluation value is larger than the preset residual resource evaluation value, determining that the residual resource of the NUMA node is larger than the preset residual resource, and taking the NUMA node as a target NUMA node.

Optionally, in this embodiment, the above processor may further execute program code for: the method further comprises the following steps: and if the residual resources of the plurality of NUMA nodes are larger than the preset residual resources, taking the NUMA node with the maximum residual resource evaluation value as the target NUMA node.

The embodiments of the present application also provide a non-transitory machine-readable medium storing a computer program, wherein the computer program is configured to cause a computer to perform the method of the embodiments of the present application when executed by a processor of the computer.

Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Optionally, in this embodiment, the storage medium may be configured to store program code corresponding to the method for generating a host PCIe topology provided in the foregoing embodiment, and when the program code is executed by the processor, control the processor to execute the method for generating a host PCIe topology.

Alternatively, in the present embodiment, the above-described storage medium is configured to store program code for performing the steps of: respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a network card, a network card virtual function, a graphic processor and a topological structure formed by the virtual graphic processor which are associated with the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the NUMA node; acquiring an association relationship between a PCIe switching unit and a NUMA node; and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: acquiring a first topology, comprising: acquiring a PCIe path of the network card, NUMA nodes to which the network card belongs, a network card virtual function and a mapping relation between the network card and the network card virtual function; acquiring a PCIe path of the graphics processor, NUMA nodes to which the graphics processor belongs, a virtual graphics processor and a mapping relation between the graphics processor and the virtual graphics processor; based on the PCIe path of the network card and the PCIe path of the graphics processor, identifying the network card and the graphics processor under the same PCIe switching unit; and acquiring a first topological structure based on the NUMA node to which the network card belongs, the network card virtual function, the mapping relation between the network card and the network card virtual function, the mapping relation between the NUMA node to which the graphic processor belongs, the virtual graphic processor, the graphic processor and the virtual graphic processor and the association relation between the network card and the graphic processor under the same PCIe switching unit.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: based on a NUMA node to which the network card belongs, a network card virtual function, a mapping relation between the network card and the network card virtual function, a NUMA node to which the graphics processor belongs, a virtual graphics processor, a mapping relation between the graphics processor and the virtual graphics processor, and an association relation between the network card and the graphics processor under the same PCIe switching unit, obtaining a first topological structure comprises: acquiring a first sub-topology structure based on the association relationship between a network card and a graphics processor under the same PCIe switching unit; associating the network card virtual function to a first sub-topology structure based on a mapping relation between the network card and the network card virtual function, and associating NUMA nodes to which the network card belongs to the first sub-topology structure to obtain a first target topology structure; associating the virtual graphics processor to a first sub-topology structure based on a mapping relation between the graphics processor and the virtual graphics processor, and associating NUMA nodes to which the graphics processor belongs to the first sub-topology structure to obtain a second target topology structure; and correlating the first target topological structure with the second target topological structure to obtain the first topological structure.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: acquiring a second topology, comprising: acquiring a CPU interface unit, NUMA nodes contained in the CPU interface unit, a CPU core, NUMA nodes to which the CPU core belongs, a memory and NUMA nodes to which the memory belongs; and acquiring a second topological structure based on the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, the NUMA nodes to which the CPU core belongs, the memory and the association relation of the NUMA nodes to which the memory belongs.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: based on the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, the NUMA nodes to which the CPU core belongs, the memory and the association relation of the NUMA nodes to which the memory belongs, the method for acquiring the second topological structure comprises the following steps: acquiring a second sub-topology structure based on the association relationship between the CPU interface unit and NUMA nodes contained in the CPU interface unit; and respectively associating the CPU core and the memory to a second sub-topological structure based on the NUMA node to which the CPU core belongs and the NUMA node to which the memory belongs, so as to obtain the second topological structure.

Alternatively, in this embodiment, the storage medium may be used to store the program code corresponding to the container resource allocation method provided in the foregoing embodiment, and when the program code is executed by the processor, the control processor executes the container resource allocation method.

Alternatively, in the present embodiment, the above-described storage medium is configured to store program code for performing the steps of: obtaining a resource to be allocated of a container; and under the condition that the allocatable resources of the host meet the resources to be allocated, allocating the resources to the container based on the PCIe topology structure of the host, wherein the PCIe topology structure of the host is obtained based on the method of any one of the above.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: performing resource allocation on the container based on the PCIe topology structure of the host, wherein the method comprises the following steps: determining NUMA nodes with residual resources larger than preset residual resources as target NUMA nodes; the allocable resources are allocated to the container based on the target NUMA node.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: when the allocatable resource is a CPU, allocating the allocatable resource to the container based on the target NUMA node, comprising: judging whether an idle CPU core exists in the target NUMA node and whether the number of the idle CPU cores meets the number of the CPU cores required by the container; if the target NUMA node has idle CPU cores and the number of the idle CPU cores meets the number of the CPU cores required by the container, the idle CPU cores are distributed to the container; if the target NUMA node does not have idle CPU cores or the number of idle CPU cores does not meet the number of CPU cores required by the container, the CPU cores under the same CPU interface unit as the target NUMA node are distributed for the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: assigning a CPU core to a container under the same CPU interface unit as a target NUMA node, comprising: judging whether the number of CPU cores under the same CPU interface unit meets the number of CPU cores required by the container; if the number of the CPU cores under the same CPU interface unit meets the number of the CPU cores required by the container, distributing the CPU cores under the same CPU interface unit to the container; if the number of CPU cores under the same CPU interface unit does not meet the number of CPU cores required by the container, the CPU cores under the CPU interface unit which is different from the target NUMA node are distributed to the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: when the allocable resource is memory, allocating the allocable resource to the container based on the target NUMA node includes: judging whether the residual memory of the target NUMA node meets the memory required by the container; if the residual memory of the target NUMA node meets the memory required by the container, the residual memory of the target NUMA node is distributed to the container; if the residual memory of the target NUMA node does not meet the memory required by the container, the residual memory of other NUMA nodes under the same CPU interface unit is allocated for the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: allocating remaining memory of other NUMA nodes under the same CPU interface unit for the container includes: judging whether the residual memories of other NUMA nodes under the same CPU interface unit meet the memory required by the container; if the residual memories of other NUMA nodes meet the memory required by the container, distributing the residual memories of other NUMA nodes to the container; and if the residual memories of other NUMA nodes do not meet the memory required by the container, distributing the memory under the CPU interface unit which is different from the target NUMA node to the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: in the case where the allocatable resources are network card virtual functions and virtual graphics processors, the method further comprises, prior to allocating the allocatable resources to the container based on the target NUMA node: judging whether the network card virtual function and the virtual graphic processor of the same PCIe switching unit meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphics processor of the same PCIe switching unit meet the network card virtual function and the virtual graphics processor required by the container, the network card virtual function and the virtual graphics processor of the same PCIe switching unit are distributed to the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: when the allocatable resources are network card virtual functions and virtual graphics processors, allocating the allocatable resources to the container based on the target NUMA node comprises: judging whether the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor of the target NUMA node to the container; if the network card virtual function and the virtual graphic processor of the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, the network card virtual function and the virtual graphic processor which are under the same CPU interface unit as the target NUMA node are allocated for the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: assigning to the container a network card virtual function and a virtual graphics processor under the same CPU interface unit as the target NUMA node, comprising: judging whether the network card virtual function and the virtual graphic processor under the same CPU interface unit as the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container; if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor under the same CPU interface unit to the container; if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, the network card virtual function and the virtual graphic processor under different CPU interface units are distributed to the container.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: determining the NUMA node with the residual resource larger than the preset residual resource as the target NUMA node comprises the following steps: acquiring the residual resources of the NUMA node and the weight coefficients corresponding to the residual resources; calculating to obtain a residual resource evaluation value of the NUMA node based on the residual resources and the weight coefficients corresponding to the residual resources; acquiring a preset residual resource evaluation value corresponding to the preset residual resource; and when the residual resource evaluation value is larger than the preset residual resource evaluation value, determining that the residual resource of the NUMA node is larger than the preset residual resource, and taking the NUMA node as a target NUMA node.

Optionally, in this embodiment, the storage medium is configured to store program code for further performing the steps of: the method further comprises the following steps: and if the residual resources of the plurality of NUMA nodes are larger than the preset residual resources, taking the NUMA node with the maximum residual resource evaluation value as the target NUMA node.

The embodiments of the present application also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present application.

With reference to fig. 8, a block diagram of an electronic device that may be a server or a client of an embodiment of the present application will now be described, which is an example of a hardware device that may be applied to aspects of the present application. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 8, the electronic device includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in the electronic device are connected to the I/O interface 805, including: an input unit 806, an output unit 807, a storage unit 808, and a communication unit 809. The input unit 806 may be any type of device capable of inputting information to an electronic device, and the input unit 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 807 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. The storage unit 808 may include, but is not limited to, magnetic disks, optical disks. The communication unit 809 allows the electronic device to exchange information/data with other devices over computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a CPU, a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above. For example, in some embodiments, method embodiments of the present application may be implemented as a computer program tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM802 and/or the communication unit 809. In some embodiments, the computing unit 801 may be configured to perform the methods described above by any other suitable means (e.g., by means of firmware).

A computer program for implementing the methods of embodiments of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of embodiments of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable signal medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that the term "comprising" and its variants as used in the embodiments of the present application are open-ended, i.e. "including but not limited to". The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. References to "one or more" modifications in the examples of the application are intended to be illustrative rather than limiting, and it will be understood by those skilled in the art that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

User information (including but not limited to user equipment information, user personal information and the like) and data (including but not limited to data for analysis, stored data, presented data and the like) according to the embodiment of the application are information and data authorized by a user or fully authorized by all parties, and the collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions, and are provided with corresponding operation entrances for users to select authorization or rejection.

The steps described in the method embodiments provided in the embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the application is not limited in this respect.

The term "embodiment" in this specification means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. The various embodiments in this specification are described in a related manner, with identical and similar parts being referred to each other. In particular, for apparatus, devices, system embodiments, the description is relatively simple as it is substantially similar to method embodiments, see for relevant part of the description of method embodiments.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for generating a host PCIe topology structure comprises the following steps:

respectively acquiring a first topological structure and a second topological structure; the first topological structure is used for representing a topological structure formed by a network card, a network card virtual function, a graphic processor and a virtual graphic processor which are associated by the host under the PCIe switching unit; the second topological structure is used for representing a topological structure formed by a CPU interface unit, a CPU core and a memory based on the association of the host and the non-unified memory access NUMA node;

acquiring an association relationship between the PCIe transit unit and the NUMA node;

and associating the first topological structure with the second topological structure based on the association relation to obtain the PCIe topological structure of the host.

2. The method of claim 1, wherein acquiring the first topology comprises:

acquiring a PCIe path of the network card, NUMA nodes to which the network card belongs, the network card virtual function and a mapping relation between the network card and the network card virtual function;

acquiring a PCIe path of the graphics processor, NUMA nodes to which the graphics processor belongs, the virtual graphics processor and a mapping relation between the graphics processor and the virtual graphics processor;

Based on the PCIe path of the network card and the PCIe path of the graphics processor, identifying the network card and the graphics processor under the same PCIe switching unit;

and acquiring the first topological structure based on the NUMA node to which the network card belongs, the network card virtual function, the mapping relation between the network card and the network card virtual function, the NUMA node to which the graphics processor belongs, the virtual graphics processor, the mapping relation between the graphics processor and the virtual graphics processor and the association relation between the network card and the graphics processor under the same PCIe switching unit.

3. The method of claim 2, wherein obtaining the first topology based on the NUMA node to which the network card belongs, the network card virtual function, a mapping relationship between the network card and the network card virtual function, the NUMA node to which the graphics processor belongs, the virtual graphics processor, a mapping relationship between the graphics processor and the virtual graphics processor, and an association relationship between the network card and the graphics processor under the same PCIe switch unit comprises:

acquiring a first sub-topology structure based on the association relationship between the network card and the graphics processor under the same PCIe switching unit;

Associating the network card virtual function to the first sub-topology structure based on a mapping relation between the network card and the network card virtual function, and associating a NUMA node to which the network card belongs to the first sub-topology structure to obtain a first target topology structure;

associating the virtual graphics processor to the first sub-topology structure based on a mapping relation between the graphics processor and the virtual graphics processor, and associating NUMA nodes to which the graphics processor belongs to the first sub-topology structure to obtain a second target topology structure;

and correlating the first target topological structure with the second target topological structure to obtain the first topological structure.

4. The method of claim 1, wherein acquiring the second topology comprises:

acquiring the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, NUMA nodes to which the CPU core belongs, the memory and NUMA nodes to which the memory belongs;

and acquiring the second topological structure based on the CPU interface unit, NUMA nodes contained in the CPU interface unit, the CPU core, NUMA nodes to which the CPU core belongs, the memory and the association relation of the NUMA nodes to which the memory belongs.

5. The method of claim 4, wherein obtaining the second topology based on the CPU interface unit, the NUMA node included in the CPU interface unit, the CPU core, the NUMA node to which the CPU core belongs, the memory, and an association of the NUMA node to which the memory belongs, comprises:

acquiring a second sub-topology structure based on the association relation between the CPU interface unit and NUMA nodes contained in the CPU interface unit;

and respectively associating the CPU core and the memory to the second sub-topology structure based on the NUMA node to which the CPU core belongs and the NUMA node to which the memory belongs, so as to obtain the second topology structure.

6. A method of container resource allocation, comprising:

obtaining a resource to be allocated of a container;

and in case that the allocatable resources of the host meet the resources to be allocated, allocating the resources to the container based on a PCIe topology of the host, wherein the PCIe topology of the host is obtained based on the method of any one of claims 1 to 5.

7. The method of claim 6, wherein allocating resources to the container based on the PCIe topology of the host comprises:

Determining NUMA nodes with residual resources larger than preset residual resources as target NUMA nodes;

the allocable resources are allocated to the container based on the target NUMA node.

8. The method of claim 7, wherein when the allocatable resource is a CPU, allocating the allocatable resource to the container based on the target NUMA node comprises:

judging whether the target NUMA node has idle CPU cores or not and whether the number of the idle CPU cores meets the number of the CPU cores required by the container or not;

if the target NUMA node has idle CPU cores and the number of the idle CPU cores meets the number of the CPU cores required by the container, the idle CPU cores are distributed to the container;

and if the target NUMA node does not have idle CPU cores or the number of the idle CPU cores does not meet the number of the CPU cores required by the container, distributing the CPU cores under the same CPU interface unit as the target NUMA node for the container.

9. The method of claim 8, wherein assigning the container a CPU core under the same CPU interface unit as the target NUMA node comprises:

Judging whether the number of CPU cores under the same CPU interface unit meets the number of CPU cores required by the container;

if the number of the CPU cores under the same CPU interface unit meets the number of the CPU cores required by the container, distributing the CPU cores under the same CPU interface unit to the container;

and if the number of the CPU cores under the same CPU interface unit does not meet the number of the CPU cores required by the container, distributing the CPU cores under the CPU interface unit which is different from the target NUMA node to the container.

10. The method of claim 7, wherein when the allocatable resource is memory, allocating the allocatable resource to the container based on the target NUMA node comprises:

judging whether the residual memory of the target NUMA node meets the memory required by the container;

if the residual memory of the target NUMA node meets the memory required by the container, distributing the residual memory of the target NUMA node to the container;

and if the residual memory of the target NUMA node does not meet the memory required by the container, distributing the residual memory of other NUMA nodes under the same CPU interface unit for the container.

11. The method of claim 10, wherein allocating remaining memory for the container for other NUMA nodes under the same CPU interface unit comprises:

judging whether the residual memories of other NUMA nodes under the same CPU interface unit meet the memory required by the container;

if the residual memory of the other NUMA nodes meets the memory required by the container, distributing the residual memory of the other NUMA nodes to the container;

and if the residual memories of the other NUMA nodes do not meet the memory required by the container, distributing the memory under a CPU interface unit which is different from the target NUMA node to the container.

12. The method of claim 7, wherein when the allocatable resources are network card virtual functions and virtual graphics processors, prior to allocating the allocatable resources to the container based on the target NUMA node, the method further comprises:

judging whether the network card virtual function and the virtual graphic processor of the same PCIe switching unit meet the network card virtual function and the virtual graphic processor required by the container;

and if the network card virtual function and the virtual graphic processor of the same PCIe switching unit meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor of the same PCIe switching unit to the container.

13. The method of claim 7, wherein, when the allocatable resources are network card virtual functions and virtual graphics processors, allocating the allocatable resources to the container based on the target NUMA node comprises:

judging whether the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container;

if the network card virtual function and the virtual graphic processor of the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor of the target NUMA node to the container;

and if the network card virtual function and the virtual graphic processor of the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor which are positioned under the same CPU interface unit as the target NUMA node for the container.

14. The method of claim 13, wherein assigning the container with a network card virtual function and a virtual graphics processor under the same CPU interface unit as the target NUMA node comprises:

Judging whether a network card virtual function and a virtual graphic processor under the same CPU interface unit as the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container;

if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor under the same CPU interface unit to the container;

and if the network card virtual function and the virtual graphic processor under the same CPU interface unit with the target NUMA node do not meet the network card virtual function and the virtual graphic processor required by the container, distributing the network card virtual function and the virtual graphic processor under different CPU interface units to the container.