CN117593170A

CN117593170A - Video memory allocation method, device, equipment and readable storage medium

Info

Publication number: CN117593170A
Application number: CN202311601081.6A
Authority: CN
Inventors: 陈培; 刘慧兴
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-02-23

Abstract

The invention discloses a video memory distribution method, a device, equipment and a readable storage medium in the technical field of computer application, wherein the method comprises the following steps: acquiring a task of a video memory to be allocated; acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table; based on the host process identifier, obtaining the memory consumption of each process in the container; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; and determining the residual video memory usable by the container by using the total video memory consumption and the video memory quota of the container, and distributing the video memory to the task based on the residual video memory. The invention has the technical effects that: the accurate container video memory consumption can be obtained, the authority of a host is not granted to a user, and the multi-card multi-task scene can be effectively managed through the process identification relation table, so that the video memory allocation can be effectively and orderly carried out.

Description

Video memory allocation method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of computer applications, and in particular, to a method, an apparatus, a device, and a readable storage medium for allocating video memory.

Background

On a CPU+GPU computing architecture, the GPU (graphics processing unit, graphics processor) is primarily loaded with computationally intensive tasks to speed up computation. In the cpu+gpu computing architecture, multiple tasks running on the same GPU may contend for GPU resources, which may result in a failed exit of the task if the remaining resources are insufficient to run the present task. That is, when the lack of efficient, orderly management of GPU resources results in multitasking to share the GPU, the confusion is unordered, resulting in an increased rate of task failure.

At present, k8s (Kubernetes, application for managing cluster resources) or a container management system is generally utilized to manage GPU resources, and a technology of pre-allocating video memories is adopted to solve the problem, wherein the specific method is to hijack a video memory allocation API (application program of a unified computing device architecture, cuda: compute Unified Device Architecture, the unified computing device architecture is used for running parallel computing on a general computing device (GPU), and the driver, driving) judges whether the free video memories in the container meet the size of the video memories to be allocated or not when the video memories are allocated, so that the video memories are allocated.

However, when the consumption of the GPU process video memory in the container is counted, the size of the implicitly allocated video memory is often missed, so that the actual use of the video memory by the process in the container exceeds the quota, and the task execution is affected; under the condition of multi-card multi-task, the management of the video memory variation of each GPU card in the container is more complicated, especially when a user sets CUDA_VISIBLE_DEVICES parameters during the running of GPU tasks, the tasks can change GPU index, and the control of video memory logic is disordered; the nvidia drives an own nvidia-smi command, GPU process information cannot be displayed in a container, a user needs to log in a host to run the command in order to check the running condition of own tasks, and also needs to distinguish which tasks are own container from all tasks.

In summary, how to effectively solve the problems of memory allocation and the like is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a video memory allocation method, a device, equipment and a readable storage medium, which can acquire accurate container video memory consumption, can avoid granting host rights to users, and can effectively manage multi-card and multi-task scenes through a process identification relation table, so that the video memory allocation can be effectively and orderly carried out.

In order to solve the technical problems, the invention provides the following technical scheme:

a video memory allocation method comprises the following steps:

acquiring a task of a video memory to be allocated;

acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table;

based on the host process identifier, obtaining the video memory consumption of each process in the container;

superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container;

and determining the residual video memory of the container by using the total video memory consumption and the video memory quota of the container, and performing video memory allocation on the task based on the residual video memory.

Preferably, the process identification relation table is obtained, including:

acquiring a container process identifier of each process in the container;

calling an opening function to open the mounted custom character equipment;

calling the calling interface function of the custom character equipment, and inputting parameters to obtain a return result; the parameter is a predefined instruction and a container process identifier of each process in the container, and the returned result is a host process identifier of each process;

and writing the container process identifier and the host process identifier of the same process into the process identification relation table in pairs.

Preferably, the obtaining a container process identifier of each process in the container includes:

acquiring GPU card information in the container;

and acquiring the container process identifiers of all processes running on the GPU card by utilizing the GPU card information.

Preferably, based on the host process identifier, obtaining the memory usage of each process in the container includes:

acquiring GPU card information in the container;

acquiring the video memory information of all processes running on the GPU card by utilizing the GPU card information;

and finding out the video memory quantity corresponding to the host process identifier from the video memory information.

Preferably, the method further comprises:

reading the process identification relation table by using a GPU process display tool;

acquiring the video memory quota;

acquiring the quantity of the GPUs and the unique identifiers of the GPUs in the container;

based on the number of GPUs and the unique identifier of the GPU, circularly acquiring each process information corresponding to each GPU in the container;

matching the process identification relation table with GPU process information returned by calling a management library function, and recording the memory consumption of a host process identification;

based on the process identification relation table, parameter replacement processing is carried out on a return result of the computer running information checking function;

after parameter replacement is completed, obtaining the video memory use information of the container; the video memory usage information comprises the video memory consumption of each process, the video memory quota of the container and the residual capacity of the container;

based on the process identification relation table, parameter replacement processing is carried out on a return result of the computer running information checking function, and the method comprises the following steps:

replacing the host process identifier in the returned result with a corresponding container process identifier based on the process identifier relation table;

replacing the memory consumption in the container in the returned result with the sum of the memory consumption of all processes in the container;

And replacing the total amount of the video memory in the returned result with the video memory quota.

Preferably, the task of obtaining the video memory to be allocated includes:

hijacking a video memory allocation function to acquire the task.

Preferably, the allocating the video memory to the task based on the remaining video memory includes:

analyzing the task to obtain the size of the video memory which is requested to be allocated;

judging whether the residual video memory is larger than or equal to the video memory size;

if yes, calling a video memory allocation function, and allocating video memory to the task;

if not, determining that the memory overflows, and reporting the task by mistake and returning.

A video memory allocation apparatus comprising:

the task acquisition unit is used for acquiring tasks of the video memory to be allocated;

the device comprises an identification conversion unit, a storage unit and a storage unit, wherein the identification conversion unit is used for acquiring a container process identifier of each process in a container and inquiring a host process identifier of each process in the container according to a process identification relation table;

the memory amount obtaining unit is used for obtaining the memory amount of each process in the container based on the host process identifier; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; determining the residual video memory of the container by using the total video memory consumption and the video memory quota of the container;

And the video memory distribution unit is used for distributing the video memory to the task based on the residual video memory.

An electronic device, comprising:

a memory for storing a computer program;

and the processor is used for realizing the steps of the video memory allocation method when executing the computer program.

A readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the video memory allocation method described above.

By applying the method provided by the embodiment of the invention, the task of the video memory to be allocated is acquired; acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table; based on the host process identifier, obtaining the memory consumption of each process in the container; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; and determining the residual video memory usable by the container by using the total video memory consumption and the video memory quota of the container, and distributing the video memory to the task based on the residual video memory.

After acquiring the task to be allocated with the video memory, firstly acquiring the container process identifiers of all the processes in the container, and then finding out the host process identifiers of all the processes based on the identifier relation table. The host process identifier can be used to obtain the memory usage of each process. The actual total memory consumption in the container can be obtained by accumulating the memory consumption of all processes in the container. Based on the total consumption of the video memory and the video memory quota, the residual video memory which can be used by the container can be determined. Based on the remaining memory, the task may be assigned memory.

The technical effects are as follows: the memory usage of each process is obtained by obtaining the host process identifiers of all processes, and the accurate container memory usage can be obtained by superposing the memory usage of all processes; in the process of acquiring the video memory consumption, a user does not need to log in a host for checking, the permission of the host is avoided from being granted to the user, and potential safety hazards are avoided; the multi-card multi-task scene can be effectively managed through the process identification relation table. That is, the present invention can realize the effective and orderly memory allocation.

Correspondingly, the embodiment of the invention also provides a video memory distribution device, a device and a readable storage medium corresponding to the video memory distribution method, which have the technical effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flow chart showing a method for allocating video memory according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a software system of a host+GPU;

FIG. 3 is a schematic diagram of a software system of a host+GPU according to an embodiment of the present invention;

FIG. 4 is a timing diagram illustrating an embodiment of a method for allocating video memory according to the present invention;

FIG. 5 is a schematic diagram of a memory allocation device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a specific structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a video memory allocation method according to an embodiment of the present invention, where the method can be applied to a container in a gpu+cpu computing architecture, and the method includes the following steps:

S101, acquiring a task of a display memory to be allocated.

Under the condition that a user submits a task requiring the GPU, the task of acquiring the video memory to be allocated can be determined. For example, when a user submits a task to the container that requires the use of the GPU, it is determined that the task to which the memory is to be allocated is acquired.

In a specific embodiment of the present invention, the task of obtaining the video memory to be allocated includes: hijacking the video memory allocation function to obtain the task. That is, the task may be obtained by hijacking the video memory allocation function. For specific implementation of hijacking of the function, specific reference may be made to the relevant hijacking method, which is not described here in detail.

S102, acquiring a container process identifier of each process in the container, and inquiring a host process identifier of each process in the container according to the process identifier relation table.

The container is the container which currently receives the task of the video memory to be allocated. In the GPU+CPU computing architecture, after the container receives the task needing to be subjected to video memory task allocation, the container process identifiers of all processes of the container can be obtained, then the process identification relation table is searched for, and the host process identifiers of the processes in the host are found.

It is noted that for the same process, there is a unique identifier in the container, i.e. the container process identifier, denoted as pid1, and a unique identifier in the host, i.e. the host process identifier, denoted as pid2 (i.e. host pid).

In the process identification relationship table, the container process identifier and the host process identifier of each process may be stored in advance.

By querying the process identification relationship table, the host process identifier of each process in the container can be quickly obtained. Compared with the communication mode of the container and the host, the host process identifier of each process is defined, and convenience and rapidness can be improved.

Specifically, the process identification relationship table may be that the container communicates with the host, and is stored after the obtained host process identifier. A custom character device can also be developed to suspend the custom character device and the host process identifier can be obtained for storage by invoking the corresponding interface.

In one embodiment of the present invention, obtaining a process identifier relationship table includes:

acquiring a container process identifier of each process in the container;

calling an opening function to open the mounted custom character equipment;

Calling interface functions of the custom character equipment, and inputting parameters to obtain a return result; the parameters are predefined instructions and container process identifiers of all processes in the container, and the returned result is a host process identifier of each process;

the container process identifier and the host process identifier of the same process are written into the process identification relation table in pairs.

For convenience of description, the steps described above are combined.

Typically, since a namespace isolation mechanism (linux pid namespace) is employed within a container or pod (the smallest unit of execution of a K8s managed resource), a process (application) cannot see in the host pid2 (process ID), process identifier) within the container. Based on this, in the embodiment of the present invention, the character device may be customized, and the main function of the customized character device is to return, in the container, the pid2 of the process on the host according to the input instruction and the GPU process pid 1. The method for acquiring the host process identifier based on the custom character equipment is simple in implementation and use mode, can be conveniently integrated in a video memory distribution control function library, and is easy to deploy and good in performance compared with a real-time communication mode of a host and a container.

Wherein obtaining a container process identifier for each process within the container comprises:

obtaining GPU card information in a container;

Specifically, the GPU card information may be obtained through cuda context (context of unified computing device architecture), cuda uuid, and GPU uuid, and then based on the GPU card information, container process identifiers of all processes running on the GPU card may be obtained.

In practical application, when a new task is created in the container, the process corresponding relation of the new task is not established in the relation table at this time, so that the relation is created by calling the custom character equipment, and the relation table is updated. When the task process in the container is finished, the information in the relation table of the process needs to be deleted. Thus, the relationship information in the relationship table can be ensured to correspond to the current process in real time.

S103, based on the host process identifier, the video memory amount of each process in the container is obtained.

After the host process identifier is obtained, the actual memory usage of the corresponding process can be obtained based on the host process identifier. That is, based on the host process identifier, the amount of memory for all processes within the container can be obtained.

In one embodiment of the present invention, obtaining the memory usage of each process in the container based on the host process identifier includes:

obtaining GPU card information in a container;

and finding out the memory quantity corresponding to the host process identifier from the memory information.

The memory information may be specifically how much memory is used. For example, the process uses 10GB of video memory.

Specifically, when the container uses 1 or more GPU cards, GPU card information may be obtained first, and then, based on the GPU card information, video memory information of all processes running on the GPU cards may be obtained. Then, the memory amount corresponding to the host process identifier is found out from the memory information. That is, the memory usage can be obtained through the relationship between the container and the GPU and the pid2 of the process.

For example, assume that there are 2 application processes in container A, with pid1 in its container being 2 and 3, respectively; the corresponding pid2 on the host is 14456 and 14457, respectively, and this relationship table can be recorded in the following two-dimensional table:

2	14456
		3	14457

since Nvml runs on the host, only the memory information used by 14456 and 14457 can be obtained, and in the container a, to obtain how much memory has been used by the process of the container a, it is possible to first query pid 2=14456 of the process through pid 1=2, and then obtain the memory value (i.e. the memory amount) used by 14456 through Nvml. Pid1=3 queries about the used memory value of its process, so that it is possible to obtain how much memory is used in total by all processes in container A.

S104, overlapping the video memory consumption of all processes in the container to obtain the total video memory consumption of the container.

The superposition of the memory consumption of all processes in the container, namely the total consumption of the memory used by the actual container.

It should be noted that, because the memory usage of all processes in the container is overlapped, the overlapped result is the total memory usage of the actual container, rather than neglecting the GPU process memory usage in the container corresponding to the implicitly allocated memory size.

S105, determining the residual video memory of the container by using the total consumption of the video memory and the video memory quota of the container, and distributing the video memory to the task based on the residual video memory.

The memory quota may be a size of the memory allocated to the container when the container is created, for example, when the memory quota of the container is 8G, the process in the container may use at most 8G of memory.

And after the total consumption of the video memory is obtained, the total consumption of the video memory is differed from the video memory quota of the container, so that the residual video memory which can be used by the container can be determined.

After the remaining memory of the container is clarified, the task can be allocated with the memory based on the remaining memory.

In one specific embodiment of the present invention, the task video memory allocation based on the remaining video memory includes:

Analyzing the task to obtain the size of the video memory required to be allocated;

That is, the size of the memory allocated to the task request is first determined. Then, comparing the residual video memory with the video memory, if the residual video memory is larger than or equal to the video memory, indicating that the current residual video memory meets the allocation, and calling a video memory allocation function to allocate the video memory for the task; if the residual memory is smaller than the memory size, the current residual memory cannot meet the allocation, and the memory overflow can be determined, namely the OOM (out of memory) overflow can be returned.

After acquiring the task to be allocated with the video memory, firstly acquiring the container process identifiers of all the processes in the container, and then finding out the host process identifiers of all the processes based on the identifier relation table. The host process identifier can be used to obtain the memory usage of each process. The actual total memory consumption in the container can be obtained by accumulating the memory consumption of all processes in the container. And determining the residual video memory of the container based on the total video memory consumption and the video memory quota. Based on the remaining memory, the task may be assigned memory.

It should be noted that, based on the above embodiments, the embodiments of the present invention further provide corresponding improvements. The preferred/improved embodiments relate to the same steps as those in the above embodiments or the steps corresponding to the steps may be referred to each other, and the corresponding advantages may also be referred to each other, so that detailed descriptions of the preferred/improved embodiments are omitted herein.

In a specific embodiment of the present invention, a user may also be provided with a function of viewing the use condition of the container video memory, and a specific implementation process includes:

reading a process identification relation table by using a GPU process display tool;

acquiring a video memory quota;

matching the process identification relation table with GPU process information returned by calling the management library function, and recording the memory consumption of the host process identification;

based on the process identification relation table, replacing the host process identification in the returned result with a corresponding container process identification;

replacing the consumption of the memory in the container in the returned result with the sum of the used memories of all processes in the container;

For convenience of description, the above steps are described in combination.

In this embodiment, by analyzing the implementation logic and the output format of nvidia-smi, a GPU process display tool (inais-smi) in a container can be customized, by reading a process identifier relationship table of a GPU process in a storage module, replacing a process pid2 returned by an nvml function with a corresponding pid1 in the container, replacing the used memory in the container with the sum of all GPU processes in the container, replacing the total memory with the quota of the memory in the container, statistics of GPU process information in the container can be realized, so that a user can see own GPU process information in the container, host authority of an authorized user is avoided, the method is independent of the change of nvidia-smi commands, is more stable than the direct hijack of nvidia-smi commands, and the output information can be customized.

Wherein nvml: NVIDIA Management Library, NVIDIA manager library, is used to manage NVIDIA GPU devices, also referred to herein as manager library functions.

In order to facilitate the understanding and implementation of the video memory allocation method provided by the embodiments of the present invention by those skilled in the art, the following describes the video memory allocation method in detail with reference to a specific application scenario as an example.

Referring to fig. 2 and 3, fig. 2 is a schematic diagram of a software system of a host+gpu; FIG. 3 is a schematic diagram of a software system of a host+GPU according to an embodiment of the present invention. That is, fig. 2 is a software system of a related host+gpu, and fig. 3 is a software system of a host+gpu to which the video memory allocation method provided by the embodiment of the present invention is applied. That is, compared with the software system of the host+gpu shown in fig. 2, the software system of the host+gpu to which the video memory allocation method provided by the embodiment of the present invention is applied has the following modules:

the custom character device module adopts a namespace isolation mechanism in the container or pod, so that the pid2 of the process cannot be seen in the container. The main function of the custom character device module is to return the pid2 of the process on the host according to the input instruction and the GPU process pid1 in the container.

Regarding the isolation mechanism: a process identifier (pid) namespace isolation mechanism is used in Linux systems. Thus, when an application/process is launched in the container, the pid1 of the application is visible in the container, e.g. 1, but the pid2 of the application on the host (host) is not visible in the container.

The GPU process pid storage module in the container can acquire the pid relation of the GPU process by calling the interface of the custom character device module and store the relation table in the storage module, and the GPU process pid storage module in the container can read and write all GPU processes in the container and cannot be accessed among different containers, so that the GPU process pid storage module can be stored in a container sharing memory or file mode, and the storage module can avoid multiple parsing of the container GPU process.

The pid relation is that the process pid2 on the host corresponding to the pid1 of the application process in the container is stored in the form of a relation table. That is, the pid relationship table may be embodied as a relationship table. For example, the intra-container process pid1 is 1, the corresponding on-host process pid2 is 14456, and this relationship table may be specifically: when 1- >14456 is used, the pid2 corresponding to the pid1 can be conveniently obtained through a table look-up, and each time the pid2 is not required to be analyzed independently.

The main function of the GPU process video memory control module in the container is to control the video memory of all GPU processes in the container not to exceed the quota of the video memory of the container. Specifically, when the video memory allocation function is called, firstly, utilizing cuda context, cuda uuid and GPU uuid to obtain GPU cards with video memories to be allocated, wherein the obtained GPU information is accurate, and is applicable to the situation that a plurality of GPU cards exist in a container, then utilizing nvml corresponding function to obtain the video memory information of all running tasks on the GPU cards, and finally utilizing a process identification relation table in a storage module and combining all task information of the GPU cards to obtain the GPU process video memory information in the container; adding all GPU processes in the container by using video memories, accurately obtaining the consumption of the GPU in the container, wherein the video memories comprise video memories implicitly distributed by cuda context and the like of each GPU process, judging whether the residual video memories are met by combining the video memory quota of the container and the size of the video memory to be distributed, calling a real cuda driver function to complete the distribution when the residual video memories are met, and returning to OOM when the residual video memories are not met.

And a GPU process statistics module in the container. By analyzing the realization logic and output format of nvidia-smi, a GPU process display tool in a container, namely inais-smi, is customized, by reading a pid relation table (process identification relation table, also referred to as a relation table for short herein) of GPU processes in a storage module, replacing a process pid2 returned by an nvml function with a corresponding pid1 in the container, replacing the used video memory in the container with the sum of the used video memories of all GPU processes in the container, replacing the total video memory with the quota of the video memory in the container, statistics of GPU process information in the container is realized, users can see own GPU process information in the container, host permission of authorized users is avoided, the method is independent of the change of nvidia-smi commands, is more stable than the direct hijack of nvidia-smi commands, and the output information can be customized.

In a specific implementation, deployment and implementation may follow the following steps.

Firstly, a custom character device module, namely writing character device module processing logic, mainly defines an unlock_ioctl (custom character device call interface function) interface, and realizes logic of acquiring host pid in a container through a kernel function. When the resource management platform is deployed, a corresponding ko file (a linux kernel module file, such as vcuda_dev.ko) can be compiled and generated according to a specific system kernel version module, and an insmod command (a linux kernel module file installation command) is used for installing the equipment module.

The vcuda_dev character device is mounted, specifically, when a user creates a container or POD through the resource management platform, the character device is mounted in the corresponding container, and the corresponding dock parameter is a dock command, for example, dock run-it-device/dev/vcuda_dev < container mirror >.

And mounting a GPU process statistics tool in the container. The implementation logic and output format of nvidia-smi can be analyzed, a GPU process display tool in a container can be customized, and executable files such as inais-smi can be compiled and generated. When a user creates a container or POD through a resource management platform, an inais-smi executable file is mounted under a/usr/bin (a directory address in the container), and then the user can execute an inais-smi command under any directory in the container.

And mounting a video memory allocation control function library and setting a preferential loading logic. Redefining a video memory allocation related function in the cuda driver, adding video memory control logic at the beginning of the video memory allocation function, and compiling the whole video memory allocation control logic into a so dynamic library, such as libvcuda. So (name of the video memory allocation control logic dynamic library); the user creates a container or POD through the resource management platform, mounts the container or POD into the container, and sets the priority loading logic of the video memory allocation control function library by using parameters such as ld.so.prelod or LD_preload (the internal command of linux).

Referring to fig. 4, fig. 4 is a timing diagram illustrating an embodiment of a memory allocation method according to the present invention. Task starting, and acquisition and storage of a GPU process pid table. When a user starts a GPU task in a container or POD, the user-defined GPU process pid acquisition logic is triggered due to the fact that the preferential loading logic of the video memory allocation control function library is set; firstly, an open function is called to open the mounted vcuda_dev character device, secondly, an ioctl function is called, parameters are an instruction defined in advance and the pid of the GPU process in the container, a returned result is the host pid of the process, and finally, a pid relation table of the GPU process is stored in the sharing module.

GPU process video memory control in the container: when the GPU task runs and calls the GPU video memory allocation function, the video memory allocation function in the libvcuda.so is preferentially called due to the fact that the preferential loading logic of the libvcuda.so is set, and accordingly the video memory control logic is triggered.

That is, in the invention, the host pid of the GPU process can be obtained by calling the custom ioctl instruction in the container through the custom character equipment module, so that the method is simple and efficient; by hijacking a video memory allocation function of the cuda driver, the video memory control problem of a plurality of GPU cards and multiple GPU tasks in the container is solved by utilizing the conversion relation of cuda context, cuda uuid and GPU uuid; based on the GPU process pid relation table in the container storage module, the problem that a user cannot acquire statistical information of each GPU process in the container is solved by hijacking the output of the nvml function and the conversion strategy of the process pid between the host and the container.

In addition, the container GPU process pid relation table storage and screening module; the storage module can read and write each GPU process in the container, particularly can adopt modes of shared memory, files and the like, and stores a pid relation table of the GPU process in the container in the storage unit, so that multiple analyses of the GPU process in the container are avoided; that is, the filtering module compares the acquired pid information and deletes the pid information corresponding to the process which is not running in the container. That is, the filtering module can remove invalid pid information, and ensure that the information of the storage unit is consistent with the actually running program. For example, it may be determined whether the host pid in the container is located in the nvml_table, if so, GPU information of the corresponding process in the nvml_table is retained, if not, it is indicated that the task/process has ended, and it is deleted from the storage module.

When the GPU cards exist in the container, tasks may run on the GPU cards; alternatively, if the user-initiated task sets the CUDA_VISIBLE_DEVICES parameter, the pure GPU index may fail. By applying the method and the strategy provided by the embodiment of the invention, the current video memory allocation cuda context is obtained, then the cuda context is utilized to obtain the cuda uuid, finally the conversion relationship between the cuda uuid and the GPU uuid is utilized to obtain the GPU uuid allocated by the current video memory allocation, and the nvml function can be called according to the GPU uuid to obtain more GPU card information.

And the use of the video memory is counted, and the use information of the video memory on the corresponding GPU card in the container needs to be obtained when the video memory is distributed. Because the size of the implicitly allocated video memory such as the cuda context cannot be obtained, but the excessive amount of the container quota is easily caused by ignoring the part of the video memory, the method and the strategy provided by the embodiment of the invention firstly acquire the GPU uuid through the GPU card identification strategy when controlling the video memory allocation, secondly call the nvmlDeviceGetHandleByUUID function to acquire the nvidia device, finally call the nvmlDeviceGetComputeRunningProcesses function (a function for checking the process information) to acquire the GPU process information of all users on the corresponding GPU card, the information is maintained by nvidia drive, the size of the video memory is very accurate, and then the GPU process pid relation table in the container storage module is combined, and the total amount of the video memory used by all GPU processes in the container is accurately acquired through matching the two hotpids.

Counting and refreshing a strategy by a container GPU process, and recording GPU information returned by an nvml DeviceGetComputeRunningProcesses function as nvml_table; and acquiring a GPU process pid relation table from a storage module in the container, judging whether a host pid in the container is positioned in the nvml_table, if so, reserving GPU information of a corresponding process in the nvml_table, if not, indicating that the task is finished, deleting the task from the storage module, and finally, writing back to the storage module for refreshing.

That is, the method and strategy provided by the embodiment of the invention can utilize the custom character device module to obtain the host pid of the GPU process in the container, and utilize the relation among the CUDA context, the CUDA uuid and the GPU uuid to obtain the information of the real GPU card when the video memory is distributed, thereby solving the problems of multiple GPU card tasks and CUDA_VISIBLE_DEVICES special variables in the container; meanwhile, the pid relation table and the nvml function in the storage module are utilized to acquire the video memory occupation information of the corresponding GPU card in the container, so that the accurate control of the video memory in the container is realized; finally, the problem that GPU process information cannot be counted and displayed in the container is solved by hijacking the output of the nvml function and the conversion strategy of the process pid between the host and the container.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a video memory allocation device, where the video memory allocation device described below and the video memory allocation method described above may be referred to correspondingly.

Referring to fig. 5, the apparatus includes the following modules:

a task obtaining unit 101, configured to obtain a task of a video memory to be allocated;

the identifier conversion unit 102 is configured to obtain a container process identifier of each process in the container, and query a host process identifier of each process in the container according to the process identifier relationship table;

A memory amount obtaining unit 103, configured to obtain the memory amounts of the processes in the container based on the host process identifier; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; determining the residual video memory of the container by using the total video memory consumption and the video memory quota of the container;

and the video memory allocation unit 104 is used for allocating the video memory to the task based on the residual video memory.

The device provided by the embodiment of the invention is applied to acquire the task of the video memory to be allocated; acquiring a container process identifier of each process in a container, and inquiring a host process identifier of each process in the container according to a process identifier relation table; based on the host process identifier, obtaining the memory consumption of each process in the container; superposing the video memory consumption of all processes in the container to obtain the total video memory consumption of the container; and determining the residual video memory usable by the container by using the total video memory consumption and the video memory quota of the container, and distributing the video memory to the task based on the residual video memory.

The technical effects are as follows: the memory usage of each process is obtained by obtaining the host process identifiers of all processes, and the accurate container memory usage can be obtained by superposing the memory usage of all processes; in the process of acquiring the video memory consumption, a user does not need to log in a host for checking, the permission of the host is avoided from being granted to the user, and potential safety hazards are avoided; the multi-card multi-task scene can be effectively managed through the pid relation table. That is, the present invention can realize the effective and orderly memory allocation.

In a specific embodiment of the present invention, a table building unit is configured to obtain a process identifier relationship table, including: acquiring a container process identifier of each process in the container; calling an opening function to open the mounted custom character equipment; calling interface functions of the custom character equipment, and inputting parameters to obtain a return result; the parameters are predefined instructions and container process identifiers of all processes in the container, and the returned result is a host process identifier of each process; the container process identifier and the host process identifier of the same process are written into the process identification relation table in pairs.

In one embodiment of the present invention, the table building unit is specifically configured to obtain GPU card information in the container;

In one specific embodiment of the present invention, the memory usage obtaining unit is configured to obtain GPU card information in the container;

In one embodiment of the present invention, a display and viewing unit is configured to read a process identifier relationship table by using a GPU process display tool;

acquiring a video memory quota;

In one embodiment of the present invention, the task obtaining unit is specifically configured to hijack the video memory allocation function to obtain the task.

In one specific embodiment of the present invention, the video memory allocation unit is specifically configured to parse a task to obtain a video memory size requested to be allocated;

Corresponding to the above method embodiment, the embodiment of the present invention further provides an electronic device, where an electronic device described below and a video memory allocation method described above may be referred to correspondingly.

Referring to fig. 6, the electronic device includes:

A memory 332 for storing a computer program;

a processor 322, configured to implement the steps of the video memory allocation method of the above method embodiment when executing the computer program.

In one embodiment, the electronic device may include a processor and a memory, with a GPU (e.g., a plug-in GPU) coupled to the electronic device.

In another embodiment, the electronic device may include a processor, a memory, and a GPU (not depicted in fig. 6). Specifically, referring to fig. 7, fig. 7 is a schematic diagram of a specific structure of an electronic device according to the present embodiment, where the electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 322 (e.g., one or more processors), a memory 332, and a GPU (not shown in fig. 7), where the memory 332 stores one or more computer programs 342 or data 344. Wherein the memory 332 may be transient storage or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a series of instruction operations in the data processing apparatus. Still further, the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the electronic device 301.

The electronic device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341.

The steps in the video memory allocation method described above may be implemented by the structure of the electronic device.

Corresponding to the above method embodiments, the embodiments of the present invention further provide a readable storage medium, where a readable storage medium described below and a video memory allocation method described above may be referred to correspondingly.

A readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the video memory allocation method of the above method embodiment.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, and the like.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms include, comprise, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the idea of the present invention, the present disclosure should not be construed as limiting the present invention in summary.

Claims

1. A video memory allocation method is characterized by comprising the following steps:

Acquiring a task of a video memory to be allocated;

2. The method of claim 1, wherein obtaining the process identification relationship table comprises:

acquiring a container process identifier of each process in the container;

calling an opening function to open the mounted custom character equipment;

3. The method of claim 2, wherein the obtaining a container process identifier for each process within the container comprises:

acquiring GPU card information in the container;

4. The method of claim 1, wherein obtaining the memory usage of each process in the container based on the host process identifier comprises:

acquiring GPU card information in the container;

5. The method as recited in claim 1, further comprising:

acquiring the video memory quota;

6. The method of claim 1, wherein the task of obtaining the memory to be allocated comprises:

hijacking a video memory allocation function to acquire the task.

7. The method of any of claims 1 to 6, wherein allocating the task based on the remaining memory comprises:

8. A memory allocation apparatus, comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

A processor for implementing the steps of the memory allocation method according to any one of claims 1 to 7 when executing the computer program.

10. A readable storage medium, wherein a computer program is stored on the readable storage medium, the computer program implementing the steps of the video memory allocation method according to any one of claims 1 to 7 when executed by a processor.