CN114510321A

CN114510321A - Resource scheduling method, related device and medium

Info

Publication number: CN114510321A
Application number: CN202210114098.8A
Authority: CN
Inventors: 陈嘉园; 施俊智; 姜志峰; 朱国云; 杨成虎
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-05-17

Abstract

Provided are a resource scheduling method, a related device and a medium. The resource scheduling method comprises the following steps: acquiring a resource scheduling request aiming at the container group, wherein the resource scheduling request comprises the size of a target storage space of a persistent memory requested by the container group; acquiring device information of at least one persistent storage on a working node of a deployment container group, wherein the persistent storage comprises a plurality of persistent storage devices; determining allocation information for allocating a target persistent storage for the container group in the working node based on a preset allocation condition and the size of a target storage space, wherein the target persistent storage and a processor used by the container group are located in the same non-uniform memory access architecture region; and storing the allocation information into the configuration information of the container group, so that the working node acquires the allocation information from the configuration information and schedules the target persistent storage to the container group based on the allocation information. The embodiment of the disclosure improves the data access efficiency of the processor.

Description

Resource scheduling method, related device and medium

Technical Field

The present disclosure relates to the field of chips, and in particular, to a resource scheduling method, a related apparatus, and a medium.

Background

The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. Cloud computing systems often employ containers to achieve isolation between various applications, implementing operating environments and running applications in the containers. A Physical Machine (PM) deploying a container may be divided into one or more Non-Uniform Memory Access Architecture (NUMA) regions, where each Non-Uniform Memory Access Architecture region includes part or all of resources of the physical machine, and the resources may include processing cores, Memory, or network bandwidth. When a container group requests a Persistent storage Volume (Persistent Volume), a CSI (which is a storage driver and supports creation and deletion of a storage Volume and mounting and dismounting of a storage Volume) may monitor a requirement of the container group for the storage Volume, create a Persistent storage Volume corresponding to the container group, and mount the Persistent storage Volume to the container group for use by the container group, so that the CSI may often mount a Persistent storage Volume that is different from a processor (i.e., a processor for use by the container group) corresponding to the container group and belongs to a non-uniform memory access architecture region to the container group, and the processor corresponding to the container group needs to access the Persistent storage Volume located in another non-uniform memory access architecture region. Each processor in the non-uniform memory access architecture area has different time delay for accessing the memory at different positions, so that the time delay for accessing the persistent storage volume by the processor is increased, and the data access efficiency of the processor is reduced.

Disclosure of Invention

In view of the above, it is an object of the present disclosure to reduce latency of a processor accessing a persistent memory of a container group, thereby improving data access efficiency of the processor.

In a first aspect, an embodiment of the present disclosure provides a resource scheduling method, including:

obtaining a resource scheduling request for a container group, wherein the resource scheduling request comprises a target storage space size of a persistent memory requested by the container group;

obtaining device information of at least one persistent storage on a working node deploying the container group, the persistent storage comprising a plurality of persistent storage devices;

determining allocation information for allocating a target persistent storage for the container group in the working node based on a preset allocation condition and the size of the target storage space, wherein the target persistent storage and a processor used by the container group are located in the same non-uniform memory access architecture region;

and storing the allocation information into the configuration information of the container group, so that the working node acquires the allocation information from the configuration information and schedules the target persistent storage to the container group based on the allocation information.

Optionally, the device information includes a device name and a device identifier of the persistent storage device, and an identifier of a non-uniform memory access architecture region to which the device identifier belongs, and before the device information of at least one persistent storage on a working node where the container group is deployed is acquired, the resource scheduling method further includes:

dividing the at least one persistent storage into a plurality of persistent storage devices respectively, wherein each persistent storage device has a certain storage space size;

after the obtaining the device information of the at least one persistent storage on the working node deploying the container group, the resource scheduling method further includes:

registering device information of the at least one persistent memory with a control node.

Optionally, the preset allocation condition includes:

the size of the free storage space of the target persistent storage is not smaller than the size of the target storage space;

the target persistent storage has the same non-uniform memory access architecture region identification as the processor used by the set of containers.

Optionally, the storage spaces of the plurality of persistent storage devices are equal in size, and the determining, based on a preset allocation condition and the size of the target storage space, allocation information for allocating a target persistent storage for the container group in the work node includes:

determining the target number of the requested persistent storage devices according to the size of the target storage space and the size of the storage space of the persistent storage devices;

acquiring first persistent storage equipment with the same number of target processors and the same non-uniform memory access architecture region identification as the processors used by the container group;

and based on the one-to-one correspondence between the device identifiers of the persistent storage devices and the persistent storages, when the first persistent storage devices with the target number are located in the same persistent storage, using the persistent storage as the target persistent storage.

Optionally, after storing the allocation information in the configuration information of the container group, the resource scheduling method further includes:

emptying the configuration information of the container group in case the container group is destroyed.

In a second aspect, an embodiment of the present disclosure provides a resource scheduling apparatus, including:

a scheduling request obtaining unit, configured to obtain a resource scheduling request for a container group, where the resource scheduling request includes a target storage space size of a persistent storage requested by the container group;

a device information obtaining unit, configured to obtain device information of at least one persistent storage on a working node where the container group is deployed, where the persistent storage includes a plurality of persistent storage devices;

an allocation information determining unit, configured to determine, based on a preset allocation condition and the size of the target storage space, allocation information for allocating a target persistent storage to the container group in the work node, where the target persistent storage and a processor used by the container group are located in the same non-uniform memory access architecture region;

and the allocation information storage unit is used for storing the allocation information into the configuration information of the container group, so that the working node acquires the allocation information from the configuration information and schedules the target persistent storage to the container group based on the allocation information.

In a third aspect, an embodiment of the present disclosure provides a persistent memory device plug-in, including:

a monitoring unit, configured to obtain a resource scheduling request for a container group, where the resource scheduling request includes a target storage space size of a persistent storage requested by the container group;

the device detection unit is used for acquiring device information of at least one persistent storage on a working node for deploying the container group, and the persistent storage comprises a plurality of persistent storage devices;

the device assigning unit is used for determining the assignment information of assigning a target persistent storage for the container group in the working node based on a preset assignment condition and the size of the target storage space, wherein the target persistent storage and a processor used by the container group are positioned in the same non-uniform memory access architecture region;

the device specifying unit is further configured to store the allocation information into configuration information of the container group, so that the working node acquires the allocation information from the configuration information and schedules the target persistent storage to the container group based on the allocation information.

In a fourth aspect, an embodiment of the present disclosure provides a computing apparatus, including:

a processor;

a persistent memory;

the above persistent storage device plug-in is configured to dispatch a target persistent storage to a container group, where the target persistent storage and a processor used by the container group are located in the same non-uniform memory access architecture region.

In a fifth aspect, an embodiment of the present disclosure provides a system on a chip, including:

a processor;

a persistent memory;

In a sixth aspect, an embodiment of the present disclosure provides a computing device, including:

a memory for storing computer executable code;

a processor configured to execute the computer executable code such that the processor performs any of the resource scheduling methods described above.

In a seventh aspect, an embodiment of the present disclosure provides a computer storage medium having computer executable codes stored thereon, where the computer executable codes, when executed by a processor, implement the resource scheduling method described in any one of the above.

In the embodiment of the disclosure, each persistent storage on a working node deploying a container group includes a plurality of persistent storage devices, which is equivalent to virtualizing each persistent storage as a plurality of persistent storage devices logically, so that, based on a preset allocation condition and a size of a target storage space of a persistent storage requested by the container group, allocation information for allocating a target persistent storage to the container group in the working node may be determined, the allocation information is stored in the configuration information of the container group, so that the working node may obtain the allocation information from the configuration information, and the target persistent storage is scheduled to the container group based on the allocation information, so that a mechanism that a container arrangement management tool can manage device (device) resources is skillfully applied, so that the container group applies for a persistent storage volume to become an application persistent storage device, and a processor used by the container group and the target persistent storage device belonging to a non-consistent memory access architecture region are used for the container group The reservoirs are assigned to the group of containers for use. In this way, a processor for use with a container group may access persistent storage that is also part of a non-uniform memory access architecture region, which reduces the latency of the processor accessing the persistent storage and improves the data access efficiency of the processor.

Drawings

The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:

FIG. 1 is a block diagram of a data center to which one embodiment of the present disclosure is applied;

FIG. 2 is a block diagram of a data center to which one embodiment of the present disclosure is applied;

FIG. 3 is an internal block diagram of a computing device according to one embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a resource scheduling method according to an embodiment of the present disclosure;

FIG. 5 is an interaction diagram of a worker node and a control node according to one embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a resource scheduling apparatus according to an embodiment of the present disclosure.

Detailed Description

The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.

The following terms are used herein.

A computing device: the device with computing or processing capability may be embodied in the form of a terminal, such as an internet of things device, a mobile terminal, a desktop computer, a laptop computer, etc., or may be embodied as a server or a cluster of servers. In the context of a data center to which the present disclosure applies, the computing device is a server in the data center.

A processor: the computer is an operation core and a control core of a computing device, and the functions of the computer are mainly to execute computer instructions and process data in computer software.

A memory: a physical structure within the computer device for storing information. Depending on the application, the storage may be divided into a main storage (also referred to as an internal storage, or simply a memory/main storage) and a secondary storage (also referred to as an external storage, or simply a secondary/external storage). The main memory is used for storing instruction information and/or data information represented by data signals, such as data provided by the processor, and can also be used for realizing information exchange between the processor and the external memory. Since information provided by external memory needs to be brought into main memory for access by the processor, reference herein to memory is generally to main memory and reference herein to storage is generally to external memory.

Persistent Memory (PMEM): the non-volatile memory is a novel non-volatile memory (NVM), and is a composite memory combining traditional storage devices (such as mechanical hard disks, solid state disks, etc.) and memories (such as dynamic random access memories) at the same time. The persistent memory can be embedded into a slot of a motherboard like a dynamic random access memory, and compared with the dynamic random access memory, the persistent memory has the characteristic of data retention after power failure. Persistent memory has very large capacity, can provide faster read and write speeds than flash memory, and is cheaper in cost than dynamic random access memory. Currently, the usage patterns supported by persistent memory include: 1. and a memory mode, in which the persistent storage is used as a volatile storage medium, the ordinary memory is used as a cache of the persistent storage, and the persistent storage is operated by firmware on a memory controller and is transparent to a user and an operating system. 2. In the storage mode, a persistent memory is used as a nonvolatile storage medium, the persistent memory is used as a block device, which is equivalent to an SSD (solid state drive) with higher speed, a page cache (page cache) exists during access, which is transparent to a user and has a requirement (drive support) for an operating system. 3. AD (app direct) mode in which persistent memory is used as a volatile medium and is used as a block device, but is not accessed via a page cache.

Container (Container): as a lightweight virtualization technology, there is a group of processes that are resource-constrained and isolated from each other. The container technology creates independent operating environments for different application programs, realizes resource isolation, configuration and safety guarantee, and can meet the resource demand of application allocation according to the demand and ensure the isolation and availability of the application. The applications hosted in the container are referred to as container instances, or may also be referred to as containerized applications. In order to meet the requirement of large-scale application, in practice, many containers are often deployed in a computer cluster for unified management and external service, so that a container arrangement management tool is needed. The container orchestration tool uses container services and orchestrates them to decide how to interact between containers, extends lifecycle management capabilities to complex, multi-container workloads deployed across large computer clusters, providing an abstraction layer for developers and infrastructure teams to handle large-scale containerized deployments. A container orchestration tool is, for example, the K8s (known collectively as kubernets, a system that runs and coordinates containerized application processes) system, which is an open source system for automatically deploying, extending, and managing containerized applications. Docker is an open source application container engine, is a tool for starting and stopping containers, and enables developers to pack their application programs and dependency packages into a portable container, and then to release the container to any popular machine, and also can realize virtualization, and the containers completely use a sandbox mechanism and have no interface with each other.

Container group (Pod): is the basic unit of operation for a container orchestration tool (e.g., K8s), and is the smallest unit of deployment that can be created, debugged, and managed. All containers in the same container group share the same IP address, IPC, host name and other resources. The group of containers abstracts the network and storage from the underlying containers, making it easier to move containers within the cluster. Each container group can be packaged with one or more containers for carrying application programs. The containers of the container group are dispatched as a whole to a work node for execution.

Storage Volume (Volume): the method is a logical abstraction of places where data needing persistent storage is placed in the container, can be extended to different storage media through plug-ins, such as a cloud disk, a NAS and the like, and can be shared among a plurality of containers in the same container group.

Application environment of the present disclosure

The embodiment of the disclosure provides a resource scheduling scheme. The whole resource scheduling scheme is relatively universal, and can be used for various hardware devices which are deployed with persistent memories and use container scheduling tools for performing container scheduling management, such as data centers, AI (artificial intelligence) acceleration units, GPUs (graphic processing units), IOT (internet of things) devices capable of executing deep learning models, embedded devices, and the like. The resource scheduling scheme is independent of the hardware on which the computing device executing the scheme is ultimately deployed. For exemplary purposes, however, the following description will be made mainly with respect to a data center as an application scenario. Those skilled in the art will appreciate that the disclosed embodiments are also applicable to other application scenarios.

Data center

A data center is a globally collaborative network of devices that is used to communicate, accelerate, present, compute, store data information over an internet network infrastructure. In future development, the data center will become an asset for enterprise competition. With the popularization of data center applications, artificial intelligence and the like are increasingly applied to data centers. The neural network is an important technology of artificial intelligence, and is widely applied to big data analysis and operation of a data center.

In a conventional large data center, the network structure is generally as shown in fig. 1, i.e., a hierarchical inter-networking model (internetworking model). This model contains the following parts:

the server 140: each server 140 is a processing and storage entity of a data center in which the processing and storage of large amounts of data is performed by the servers 140. In some embodiments, in the data center, a virtualization technology may be used to construct one or more virtual machines in the server 140, and the multiple virtual machines share physical resources of the same physical host, such as a processor, a memory, a disk, a network device, and the like, so that the physical resources of one physical host can be shared by multiple tenants using the virtual machines as a granularity, the multiple tenants can conveniently and flexibly use the physical resources on the premise of security isolation, and the utilization rate of the physical resources is greatly improved.

In some embodiments, the data center may be applied to various application scenarios such as a Content Delivery Network (CDN), an e-commerce, a game, an audio/video, an internet of things, logistics, an industrial brain, and a city brain, and provides a computing service for an end user in various scenarios. Specifically, for each application scenario, an application program that can provide a computing service in the application scenario may be deployed in a server of the data center. In consideration of the possibility of deployment of a large number of applications in a data center, a container technology may be adopted, one or more containers are constructed in the server 140, the application programs are carried by the containers, then the deployment of the application programs is performed in container units, and by running these container instances deployed on the server 140, corresponding computing services may be provided for end users. In some embodiments, persistent storage (not shown) may be deployed on server 140. The persistent storage on the server 140 may operate in an AD (app direct) mode, in which a plurality of servers 140 using the persistent storage as a memory may be connected in an interconnected manner to form a memory resource pool (also referred to as a memory database, for example, Tair). As an example, a container carrying an in-memory database application may be deployed at server 140, and by running these container instances deployed at server 140, corresponding computations of viewing, deleting, modifying, adding, etc. data in persistent storage may be provided to end users. In some embodiments, server 140 may be partitioned into one or more non-coherent memory access fabric regions, each of which includes some or all of the resources of a physical machine, which may include processing cores, memory, persistent storage, network bandwidth, and the like.

The access switch 130: the access switch 130 is a switch used to access the server 140 to the data center. One access switch 130 accesses multiple servers 140. The access switches 130 are typically located on Top of the Rack, so they are also called set-Top (Top of Rack) switches, which physically connect the servers.

Aggregation switch 120: each aggregation switch 120 connects multiple access switches 130 while providing other services such as firewalls, intrusion detection, network analysis, and the like.

The core switch 110: core switches 110 provide high-speed forwarding of packets to and from the data center and connectivity for aggregation switches 120. The entire data center network is divided into an L3 layer routing network and an L2 layer routing network, and the core switch 110 provides a flexible L3 layer routing network for the entire data center network.

Typically, aggregation switch 120 is the demarcation point for L2 and L3 layer routing networks, with L2 networks below aggregation switch 120 and L3 networks above. Each group Of aggregation switches manages a Point Of Delivery (POD), within each Of which is a separate VLAN network. Server migration within a POD does not have to modify the IP address and default gateway because one POD corresponds to one L2 broadcast domain.

A Spanning Tree Protocol (STP) is typically used between aggregation switch 120 and access switch 130. STP makes only one aggregation layer switch 120 available to a VLAN network and the other aggregation switches 120 are used in the event of a failure (dashed lines in the above figure). That is, at the level of aggregation switches 120, no horizontal scaling is done, since only one is working even if multiple aggregation switches 120 are added.

Fig. 2 is a block diagram of a data center to which one embodiment of the present disclosure is applied. As shown in fig. 2, only the control node 210 and the work node 220 are shown in fig. 2 for convenience of description. In implementation, the control node 210 or the working node 220 may be one server 140, and may also be a virtual machine on the server 140. It should be understood that although only a limited number of worker nodes 220 are shown in FIG. 2, the disclosure should not be so limited.

In some embodiments, a plurality of container groups may be run on the worker node 220. A container orchestration tool (e.g., K8s) may be run on control node 210 to implement orchestration management of container instances in the data center to which it belongs, where the orchestration management of container instances includes at least one of: creation of container instances, elastic scaling, rolling updates, reconstruction, migration, shutdown, and the like. In some embodiments, in addition to the container orchestration management function, the control node 210 may be responsible for other management operations in the data center to which it belongs, such as monitoring and management of operations, logs, network states, and the like. Commands may be sent to each worker node 220 by control node 210. Briefly, the control node 210 is an administrator and the worker node 220 is an administrator. Background services running on control node 210 may generally include an Application Programming Interface server (API server)211 and a Scheduler (Scheduler) 212. Application interface server 211 is the front-end interface of a container orchestration tool (e.g., K8s), and various client tools, as well as other components of the container orchestration tool (e.g., K8s), may manage various resources of a cluster of data centers through application interface server 211. Scheduler 212 may decide on which work node to place the container group to run and the resources allocated for the container group.

In some embodiments, as shown in fig. 2, a client may send a resource scheduling request (e.g., a processor scheduling request, or a persistent memory scheduling request) for a group of containers to control node 210. In some embodiments, application interface server 211 may receive a processor scheduling request, and scheduler 212 may schedule a group of containers to run on worker node 220 and a processor on a target worker node 220 for use by the group of containers based on the processor scheduling request received by application interface server 211. It should be noted that the scheduler 212 may also allocate the processing cores on the work node 220 to be used by the container group in units of processing cores. In some embodiments, the application interface server 211 may receive a persistent storage scheduling request, and the scheduler 212 may allocate a target persistent storage belonging to a non-uniform memory access architecture region with a processor for a container group to a container group according to allocation information of the target persistent storage based on the persistent storage scheduling request received by the application interface server 211. Thus, a processor for use with a container group may access persistent storage that is in a non-uniform memory access architecture region, which reduces latency for the processor to access the persistent storage and improves data access efficiency of the processor. Since the specific process of scheduling the target persistent storage to the container group will be described in detail below, it is not described here in detail.

Computing device

Since the server 140 is the real processing device of the data center, fig. 3 shows an internal structure diagram of the server 140 (the computing device 141 or the system on chip 142 or the working node 220) according to an embodiment of the disclosure. In some embodiments, using virtualization technology, one or more virtual machines may be built at computing device 141, and in this case, the virtual machines may be worker nodes 220 of a data center. Computing device 141 may include multiple processors 32. As an example, as shown in fig. 3, computing device 141 may include processor 0, processor 1, processor 2, and processor 3, although it should be understood that the number of processors 32 should not be limited thereto.

As shown in fig. 3, computing device 141 may also include memory 33. The memory 33 in the computing apparatus 141 may be a main memory (referred to as a main memory or an internal memory) for storing instruction information and/or data information represented by data signals, such as data provided by the processor 32 (e.g., operation results), and may also be used for implementing data exchange between the processor 32 and an external storage device 37 (or referred to as an auxiliary memory or an external memory). The memory 33 is, for example, a Dynamic Random Access Memory (DRAM).

In some cases, processor 32 may need to access memory 33 to retrieve data in memory 33 or to make modifications to data in memory 33. To alleviate the speed gap between the processor 32 and the memory 33 due to the slow access speed of the memory 33, the computing device 141 further includes a cache memory 38 coupled to the bus 31, wherein the cache memory 38 is used for caching some data in the memory 33, such as program data or message data, which may be called repeatedly. The cache Memory 38 is implemented by a storage device such as a Static Random Access Memory (SRAM). The Cache 38 may have a multi-level structure, such as a three-level Cache structure having a first-level Cache (L1 Cache), a second-level Cache (L2 Cache), and a third-level Cache (L3 Cache), or may have a Cache structure with more than three levels or other types of Cache structures. In some embodiments, a portion of cache memory 38 (e.g., a level one cache, or a level one cache and a level two cache) may be integrated within processor 32 or in the same system on a chip as processor 32.

The information exchange between the memory 33 and the cache 38 is typically organized in blocks. In some embodiments, the cache 38 and the memory 33 may be divided into data blocks according to the same spatial size, and the data blocks may be the minimum unit of data exchange (including one or more data of a preset length) between the cache 38 and the memory 33. For the sake of brevity and clarity, each data block in the cache memory 38 will be referred to below simply as a cache block (which may be referred to as a cacheline or cache line), and different cache blocks have different cache block addresses; each data block in the memory 33 is referred to as a memory block, and different memory blocks have different memory block addresses. The cache block address comprises, for example, a physical address tag for locating the data block.

Due to space and resource constraints, the cache memory 38 cannot cache the entire contents of the memory 33, i.e. the storage capacity of the cache memory 38 is generally smaller than that of the memory 33, and the cache block addresses provided by the cache memory 38 cannot correspond to the entire memory block addresses provided by the memory 33. When the processor 32 needs to access the memory, firstly, the processor 32 accesses the cache memory 38 through the bus 31 to determine whether the content to be accessed is stored in the cache memory 38, if so, the cache memory 38 hits, and at this time, the processor 32 directly calls the content to be accessed from the cache memory 38; if the content that the processor 32 needs to access is not in the cache 38, the processor 32 needs to access the memory 33 via the bus 31 to look up the corresponding information in the memory 33. Because the access rate of the cache memory 38 is very fast, the efficiency of the processor 32 can be significantly improved when the cache memory 38 hits, thereby also improving the performance and efficiency of the overall computing device 141.

As shown, the processor 32, cache 38, and memory 33 are packaged in a system on chip (SoC) 301. The designer may configure the SoC architecture so that communications between various elements in computing device 141 are secure.

In some embodiments, as shown in FIG. 3, computing device 141 may also include persistent memory 35. In some embodiments, the persistent storage 35 may operate in an AD (app direct) mode, in which a plurality of computing devices 141 using the persistent storage as a memory may be connected in an interconnected manner to form a memory resource pool (i.e., a memory database, such as Tair). The persistent storage 35 is, for example, aep (apache pass), Optane (a kind of super memory), or the like. In some embodiments, the computing device 141 may be partitioned into one or more non-coherent memory access architecture regions 36 according to a non-coherent memory access architecture (i.e., a multi-processor computer architecture), where each processor 32 is provided with storage 33 and persistent storage 35. In addition to accessing its own equipped memory 33 and persistent storage 35, each processor 32 may also access the other processor's memory 33 and persistent storage 35. In some embodiments, at the time of startup of the computing device 141, the storage 33 and the persistent storage 35 closest to the processor 32 are set as local memories (i.e., the storage 33 and the persistent storage 35 provided for the processor 32 in the non-uniform memory access architecture region 36) and the storage 33 and the persistent storage 35 farther from the processor 32 are set as remote memories, according to the distance of the storage 33 and the persistent storage 35 from the processor 32. Since the local memory is closer to the processor 32 and has a faster access speed, the local memory can be set as a memory to be accessed preferentially, so that the data access efficiency of the processor can be improved. As an example, a non-uniform memory access architecture region 0, a non-uniform memory access architecture region 1, a non-uniform memory access architecture region 2, and a non-uniform memory access architecture region 3 are shown in fig. 3. Taking the non-uniform memory access architecture region 2 as an example, the processor 2 is equipped with a memory 2 and a persistent storage 2, and in addition to accessing the memory 2 and the persistent storage 2 which are equipped by itself, the processor 2 can also access a memory 0, a memory 1, and a memory 3, and a persistent storage 0, a persistent storage 1, and a persistent storage 3. The processor 2 preferentially accesses the memory 2 and the persistent memory 2.

In this example, the computing device 141 may also include various software. In some embodiments, as shown in FIG. 3, an operating system 306, container support 307, and container group 303 are provided above the underlying hardware (i.e., system on a chip 301). The operating system 306 is, for example, a UNIX operating system, a Linux operating system, or the like, which may be used on the server. Container support 307 is the various underlying implementations required to support the containers running thereon. For example, for docker, the container support 307 needs to implement two technologies, cgroup (abbreviation of controlgroups, control group) and namespace (namespace), cgroup implements resource quota, and namespace implements resource isolation. docker allows developers to package their applications and dependent operating environments into a portable container and then release them to computing device 141. As shown in fig. 3, based on the container support 307, a plurality of container groups 303 may be executed, and the execution environment and the execution application are implemented in each container of the container group 303, and isolation between the applications and mutual non-influence are implemented by container technology. The application programs may include, without limitation, programs for controlling or responding to external devices (e.g., biometric sensors, printers, microphones, speakers, flow valves, or other I/O components, sensors, actuators, or devices), programs for various I/O tasks, security programs, attestation programs, various computing modules, communication programs, communication support protocols, or other programs, or combinations thereof. As one example, the application may be an application that controls or responds to an in-memory database (i.e., persistent storage 35).

In some embodiments, as shown in fig. 3, a persistent memory Device plug-in (Device plug-in) 302 and a service agent (Kubelet)304 of a container orchestration tool (e.g., K8s) on the computing Device 141 are also provided on top of the underlying hardware (i.e., the system on a chip 301). In implementation, the persistent memory device plug-in 302 and the service agent (Kubelet)304 may be program modules of software, or may be hardware, for example, implemented based on FPGA or CPLD. In some embodiments, the persistent storage device plug-in 302 is configured to dispatch to the container group a target persistent storage located in the same non-uniform memory access architecture region as the processor used by the container group, so that the latency of the processor accessing the persistent storage of the container group can be reduced, thereby improving the data access efficiency of the processor. Since the specific process of scheduling the target persistent storage to the container group will be described in detail below, it is not described here in detail. In some embodiments, service agent 304 may receive and execute instructions from control node 210 to manage the group of containers and the containers in the group of containers. Service agent 304 may register information about the computing device 141 on application programming interface server 211 of control node 210, periodically report resource usage of the computing device 141 to control node 210, and monitor resources of computing device 141 and containers. In some embodiments, as shown in FIG. 3, a Garbage Collection (GC) module 305 may also be provided on top of the underlying hardware (i.e., system on a chip 301). In terms of implementation, the Garbage Collection (GC) module 305 may be a program module of software, or may be implemented in hardware, for example, based on FPGA or CPLD. In some embodiments, the Garbage Collection (GC) module 305 may establish a garbage collection process for each container and clear the configuration information for the group of containers if the group of containers is destroyed.

Further, the computing apparatus 141 may also include hardware devices such as a storage device 37, a display device (not shown), an audio device (not shown), an input/output device (not shown), and the like. The storage device 37 is a device for information access, such as a Solid State Drive (SSD), a Hard Disk Drive (HDD), an optical disc, and a universal serial bus flash (U) disc, which are coupled to the bus 31 through corresponding interfaces. The input/output devices may be, for example, text, audio, and video input/output devices. A display device is coupled to the bus 31, for example via a corresponding graphics card, for displaying in accordance with display signals provided by the bus 31. The computing device 141 also typically includes a communication device (not shown), and thus may communicate with a network or other devices in a variety of ways. The communication device may include, for example, one or more communication modules, and by way of example, the communication device may include a wireless communication module adapted for a particular wireless communication protocol. For example, the communication device may include a WLAN module for enabling Wi-FiTM communications in compliance with the 802.11 standard established by the Institute of Electrical and Electronics Engineers (IEEE); the communication device may also include a WWAN module for enabling wireless wide area communication conforming to a cellular or other wireless wide area protocol; the communication device can also comprise a communication module adopting other protocols, such as a Bluetooth module, or other communication modules of customized types; the communication device may also be a port for serial transmission of data.

Of course, the structure of different computer systems may vary depending on the motherboard, operating system, and instruction set architecture.

Resource scheduling method according to the embodiment of the disclosure

According to one embodiment of the present disclosure, a resource scheduling method is provided. The method may be performed by the persistent memory device plug-in 302. In the case where the computing apparatus 141 is a single computer, the persistent memory device plug-in 302 is a part of the single computer, and the resource scheduling method is executed by the part of the single computer. In the case where the computing apparatus 141 is a set of a plurality of computers, the persistent storage device plug-in 302 is a single computer, and the resource scheduling method is executed by the single computer. In the case of the computing device 141 being in the form of a cloud, the persistent memory device plug-in 302 is a series of computers or portions thereof in the cloud, and the resource scheduling method is performed by the series of computers or portions thereof in the cloud.

As shown in fig. 4, a resource scheduling method according to an embodiment of the present disclosure includes: step S410, obtaining a resource scheduling request aiming at a container group, wherein the resource scheduling request comprises the size of a target storage space of a persistent memory requested by the container group; step S420, acquiring device information of at least one persistent storage on a working node where the container group is deployed, wherein the persistent storage comprises a plurality of persistent storage devices; step S430, determining allocation information for allocating a target persistent storage for the container group in the working node based on a preset allocation condition and the size of the target storage space, wherein the target persistent storage and a processor used by the container group are located in the same non-uniform memory access architecture region; step S440, storing the allocation information into the configuration information of the container group, so that the working node acquires the allocation information from the configuration information, and schedules the target persistent storage to the container group based on the allocation information.

The above steps are described in detail below, respectively.

In step S410, a resource scheduling request for a container group is obtained, the resource scheduling request including a target storage space size of a persistent memory requested by the container group.

In some embodiments, a plurality of container groups are run on a working node (also referred to as a computing device), a running environment and an application program are run in each container of the container groups, and isolation and mutual noninterference between the application programs are realized through container technology. As one example, the application running in the container may be an application that controls or is responsive to an in-memory database (i.e., persistent storage). In some embodiments, by running these container instances deployed on the worker nodes, corresponding computations of viewing, deleting, modifying, adding, etc., of data in persistent storage may be provided to the end user. Thus, the client may send a resource scheduling request, e.g. a persistent memory scheduling request, for a group of containers to the control node. The application program interface server of the control node may send the resource scheduling request to the service agent of the working node, and the service agent may send the resource scheduling request to the persistent storage device plug-in of the working node, so that the persistent storage device plug-in may obtain the target storage space size of the persistent storage requested by the container group.

In step S420, device information of at least one persistent storage on a working node deploying the container group is acquired, the persistent storage including a plurality of persistent storage devices.

In some embodiments, when the plug-in of the persistent storage device on the working node is started, it may be determined whether the working node is provided with a persistent storage, if not, the working node may report to the control node that the working node does not support the persistent storage (for example, a message that the working node does not support the persistent storage may be reported to an application program interface server of the control node through a service agent), and if so, the number of the persistent storages on the working node, and the size of the storage space of each persistent storage and the non-uniform memory access architecture region to which the persistent storage belongs may be read. Then, the persistent storage device plug-in may logically divide at least one persistent storage on the working node into a plurality of persistent storage devices, and may assign a device Identification (ID) to each persistent storage device, each persistent storage device having a certain storage space size. In some embodiments, the at least one persistent storage is logically divided into a plurality of persistent storage devices of equal storage space size. The device information of the persistent storage may include a device name of the persistent storage (that is, a device identifier of the persistent storage corresponding to the persistent storage), a device identifier, and an identifier of a non-uniform memory access architecture region to which the device identifier belongs. In some embodiments, the persistent storage device plug-in may also store a one-to-one correspondence between a device identification of the persistent storage device and the persistent storage. The non-uniform memory access architecture region identifier to which the persistent storage device belongs is a non-uniform memory access architecture region identifier to which a persistent storage corresponding to the storage persistent storage device belongs. For example, taking persistent storage 0 and persistent storage 1 as an example, persistent storage 0 is located in non-uniform memory access architecture region 0, persistent storage 1 is located in non-uniform memory access architecture region 1, and the storage space size of persistent storage 0 and persistent storage 1 is 100G, persistent storage 0 and persistent storage 1 may be logically divided into 200 persistent storage devices, each persistent storage device has a storage space size of 1G, 200 persistent storage devices have a unique device Identification (ID), where 100 persistent storage devices are subordinate to persistent storage 0, and the other 100 persistent storage devices are subordinate to persistent storage 1. The device information of persistent storage 0 and persistent storage 2 may be represented as:

in some embodiments, after obtaining the device information of the at least one persistent storage on the working node that deploys the container group, the persistent storage device plug-in on the working node may also register the device information of the at least one persistent storage to the control node (e.g., by the service agent registering the device information of the at least one persistent storage to an application program interface server of the control node).

In step S430, based on a preset allocation condition and the size of the target storage space, determining allocation information for allocating a target persistent storage to the container group in the work node, where the target persistent storage and the processor used by the container group are located in the same non-uniform memory access architecture region.

In some embodiments, the preset allocation condition may include that a free memory size of the target persistent memory is not smaller than the target memory size; the target persistent storage has the same non-coherent memory access architecture region identification as the processor used by the set of containers. Based on the preset allocation condition and the target storage space size, the persistent storage device plug-in may determine the target persistent storage and allocation information of the target persistent storage. The allocation information may include information related to the allocation of the target persistent storage to the container group, such as a storage directory of the target persistent storage and a storage directory of the container group. In some embodiments, the persistent storage device plug-in may determine the target number of persistent storage devices requested based on the target storage space size and the storage space size of the persistent storage device. For example, if the target storage space size is 20G and the storage space size of each persistent storage device is 1G, the target number of requested persistent storage devices is 20. Then, the plug-in of the persistent storage device can acquire a target number of first persistent storage devices having the same identification of the non-uniform memory access architecture region as the processor used by the container group. In some embodiments, the container orchestration tool (e.g., K8s) has topology management (topology management) functionality on the computing device 141, and when the topology management functionality is turned on, the persistent storage device plug-in may obtain a target number of first persistent storage devices that have the same non-uniform memory access architecture region identification as the processor used by the container group and that are located in the same persistent storage. For example, if the container group uses processor 2 and the non-uniform memory access architecture region of processor 2 is identified as NUMA2, then the non-uniform memory access architecture region of the target number of first persistent storage devices is also identified as NUMA2 and is located in persistent storage 2. Then, in some embodiments, the persistent storage device plug-in may determine whether the target number of first persistent storage devices are located in the same persistent storage based on a one-to-one correspondence between the device identifiers of the persistent storage devices and the persistent storages, regard the persistent storage as the target persistent storage when the target number of first persistent storage devices are located in the same persistent storage, and may turn on a topology management function of a container orchestration tool (e.g., K8s) on the computing apparatus 141 to obtain the allocation information of the target persistent storage again when the target number of first persistent storage devices are not located in the same persistent storage.

In step S440, the allocation information is stored in the configuration information of the container group, so that the working node obtains the allocation information from the configuration information, and schedules the target persistent storage to the container group based on the allocation information.

In some embodiments, a file system may be provided on the working node where the container group is deployed, and allocation information of the target persistent storage and configuration information of the container group may be stored in the file system. The storage file of the allocation information of the target persistent storage may be mounted in the storage file of the configuration information of the container group. In this way, according to the storage path of the storage file of the allocation information of the target persistent storage, the working node may acquire the allocation information of the target persistent storage from the configuration information of the container group. In some embodiments, the scheduler of the control node may schedule the target persistent storage to the container group according to the allocation information of the target persistent storage, that is, the storage directory of the target persistent storage may be mounted under the storage directory of the container group. Based on this, the persistent storage device plug-in dispatches the target persistent storage located in the same non-uniform memory access architecture region as the processor used by the container group to the container group, so that the processor can access the target persistent storage belonging to the same non-uniform memory access architecture region, and the time delay of the processor for accessing the persistent storage of the container group is reduced, thereby improving the data access efficiency of the processor.

FIG. 5 is an interaction diagram of a worker node and a control node according to one embodiment of the present disclosure. In some embodiments, as shown in fig. 5, the interaction process of the working node 220 and the control node 210 specifically includes the following steps:

in step S501, the device information of at least one persistent storage on the work node of the deployment container group is obtained by using the persistent storage device plug-in 302 in the work node 220, and the persistent storage includes a plurality of persistent storage devices.

In step S502, the device information of the at least one persistent storage on the working node where the container group is deployed is registered with the service agent 304 using the persistent storage device plug-in 302 in the working node 220, and then the device information of the at least one persistent storage on the working node where the container group is deployed is registered with the application interface server 211 of the control node 210 using the service agent 304.

In step S503, the persistent memory device plug-in 302 in the worker node 220 is utilized to listen for resource scheduling requests for the group of containers.

In step S504, the client sends a resource scheduling request for the group of containers to the application program interface server 211 of the control node 210, the resource scheduling request including a target storage space size of the requested persistent memory. The resource scheduling request for the group of containers may then be sent by the application programming interface server 211 to the service proxy 304, which in turn sends the resource scheduling request for the group of containers to the persistent memory device plug-in 302.

In step S505, the persistent storage device plug-in 302 is used to determine the allocation information of the target persistent storage allocated for the container group in the work node based on the preset allocation condition and the size of the target storage space, where the target persistent storage and the processor used by the container group are located in the same non-uniform memory access architecture region.

In step S506, the persistent storage device plug-in 302 is used to store the allocation information into the configuration information of the container group, so that the working node acquires the allocation information from the configuration information.

In step S507, the scheduler 212 of the control node 210 schedules the target persistent memory to the container group based on the allocation information.

Since the step of dispatching the target persistent storage belonging to the same non-uniform memory access architecture region as the processor of the container group to the container group has been described in detail in the above device embodiments and method embodiments, it is not repeated here.

Fig. 6 is a block diagram of a resource scheduling apparatus according to one embodiment of the present disclosure. As shown in fig. 6, the resource scheduling apparatus includes: a scheduling request acquisition unit 610, a device information acquisition unit 620, an allocation information determination unit 630, and an allocation information storage unit 640.

A scheduling request obtaining unit 610, configured to obtain a resource scheduling request for a container group, where the resource scheduling request includes a target storage space size of a persistent storage requested by the container group; a device information obtaining unit 620, configured to obtain device information of at least one persistent storage on a working node where the container group is deployed, where the persistent storage includes a plurality of persistent storage devices; an allocation information determining unit 630, configured to determine, based on a preset allocation condition and the size of the target storage space, allocation information for allocating a target persistent storage to the container group in the working node, where the target persistent storage and the processor used by the container group are located in the same non-uniform memory access architecture region; an allocation information storage unit 640, configured to store the allocation information into configuration information of the container group, so that the working node obtains the allocation information from the configuration information, and schedules the target persistent storage to the container group based on the allocation information.

Commercial value of the disclosed embodiments

In the computing device provided by the embodiment of the disclosure, the target persistent storage, which belongs to the same non-uniform memory access architecture region as the processor for the container group, is allocated to the container group for use, so that the processor for the container group can access the persistent storage which belongs to the same non-uniform memory access architecture region, which reduces the time delay of accessing the persistent storage by the processor and improves the data access efficiency of the processor. Under the scene, the data access efficiency of the processor is improved, so that the data calculation cost of the calculation device is reduced, and the operation cost of the whole data center is reduced. The embodiment of the disclosure reduces the operation cost of the whole data center, thereby having good commercial value and economic value.

As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.

Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAn) or a wide area network (WAn), or the connection may be made to an external computer (for example, through the internet using an internet service provider).

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A resource scheduling method comprises the following steps:

2. The resource scheduling method according to claim 1, wherein the device information includes a device name and a device identifier of the persistent storage device and an identifier of a non-uniform memory access architecture region to which the device identifier belongs, and before the device information of at least one persistent storage on the working node that deploys the container group is acquired, the resource scheduling method further includes:

3. The resource scheduling method according to claim 1, wherein the preset allocation condition comprises:

4. The resource scheduling method according to claim 3, wherein the storage spaces of the plurality of persistent storage devices are equal in size, and the determining, based on a preset allocation condition and the target storage space size, allocation information for allocating a target persistent storage for the container group in the work node comprises:

5. The resource scheduling method according to claim 1, wherein after storing the allocation information into the configuration information of the container group, the resource scheduling method further comprises:

6. A resource scheduling apparatus, comprising:

a device information obtaining unit, configured to obtain device information of at least one persistent storage on a work node where the container group is deployed, where the persistent storage includes a plurality of persistent storage devices;

an allocation information determining unit, configured to determine, based on a preset allocation condition and the size of the target storage space, allocation information for allocating a target persistent storage to the container group in the work node, where the target persistent storage is located in the same non-uniform memory access architecture region as a processor used by the container group;

7. A persistent memory device plugin, comprising:

8. A computing device, comprising:

a processor;

a persistent memory;

the persistent storage device plug-in of claim 7, the persistent storage device plug-in to schedule a target persistent storage to a container group, the target persistent storage being in a same non-coherent memory access architecture region as a processor used by the container group.

9. A system on a chip, comprising:

a processor;

a persistent memory;

10. A computing device, comprising:

a memory for storing computer executable code;

a processor for executing the computer executable code such that the processor performs the resource scheduling method of any of the preceding claims 1-5.

11. A computer storage medium having computer executable code stored thereon which, when executed by a processor, implements the resource scheduling method of any of claims 1-5 above.