CN116188240A - GPU virtualization method and device for container and electronic equipment - Google Patents

GPU virtualization method and device for container and electronic equipment Download PDF

Info

Publication number
CN116188240A
CN116188240A CN202211699151.1A CN202211699151A CN116188240A CN 116188240 A CN116188240 A CN 116188240A CN 202211699151 A CN202211699151 A CN 202211699151A CN 116188240 A CN116188240 A CN 116188240A
Authority
CN
China
Prior art keywords
video memory
gpu
target container
system call
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211699151.1A
Other languages
Chinese (zh)
Other versions
CN116188240B (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202211699151.1A priority Critical patent/CN116188240B/en
Publication of CN116188240A publication Critical patent/CN116188240A/en
Application granted granted Critical
Publication of CN116188240B publication Critical patent/CN116188240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Stored Programmes (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The disclosure relates to a GPU virtualization method and device for a container, and electronic equipment, wherein the method comprises the following steps: intercepting a first IOCTL system call request initiated by a target container, wherein the target container is loaded with virtual GPU equipment which is obtained based on virtualization of one physical GPU equipment, and the first IOCTL system call request is used for acquiring a GPU video memory state corresponding to the target container; based on the first IOCTL system call request, obtaining corresponding video memory state information, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment; and returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated. The embodiment of the disclosure realizes GPU resource sharing of a container by intercepting an IOCTL system call request to realize GPU virtualization for the container based on the IOCTL in a kernel state.

Description

GPU virtualization method and device for container and electronic equipment
Technical Field
The disclosure relates to the technical field of computers, and in particular relates to a method and a device for virtualizing a GPU (graphics processing Unit) of a container and electronic equipment.
Background
GPU virtualization, namely slicing GPU resources (such as video memory resources and computing resources) of one physical GPU device, dividing the GPU resources into a plurality of virtual GPU devices on logic, and further distributing the GPU resources of the physical GPU device by taking the virtual GPU devices as units. A single physical GPU device can be distributed to a plurality of clients for use by taking the virtual GPU device as a unit, so that the utilization rate of the physical GPU device can be greatly improved.
Disclosure of Invention
The disclosure provides a GPU virtualization method and device for a container and a technical scheme of electronic equipment.
According to an aspect of the present disclosure, there is provided a GPU virtualization method for a container, comprising: intercepting a first IOCTL system call request initiated by a target container, wherein virtual GPU equipment is mounted on the target container, the virtual GPU equipment is obtained based on virtualization of one physical GPU equipment, and the first IOCTL system call request is used for obtaining a GPU video memory state corresponding to the target container; based on the first IOCTL system call request, obtaining corresponding video memory state information, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment; and returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated.
In one possible implementation, the method further includes: intercepting a second IOCTL system call request initiated by the target container, wherein the second IOCTL system call request is used for performing GPU video memory application; judging whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory hanging amount corresponding to the virtual GPU equipment mounted on the target container; and when the video memory application amount exceeds the video memory mounting amount, returning prompt information to the target container, wherein the prompt information is used for prompting the current video memory application to overflow the video memory.
In one possible implementation, the method further includes: and when the video memory application amount does not exceed the video memory hanging amount, invoking video memory resources of the physical GPU equipment for the target container based on the video memory application amount.
In one possible implementation manner, the target container initiates the first IOCTL system call request/the second IOCTL system call request based on the IOCTL interface corresponding to the target component; the target assembly includes at least one of: openCL, CUDA.
In one possible implementation, the number of the target containers is a plurality; and aiming at any one target container, the virtual GPU equipment mounted on the target container is provided with a corresponding GPU time slice, wherein the GPU time slice is used for indicating the computing resources of the physical GPU equipment.
In one possible implementation manner, the computing resource scheduling modes corresponding to the target containers are weak isolation modes; and in the weak isolation mode, when no load is carried on the ith target container, carrying out operation on a load corresponding to the jth target container on a GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
In one possible implementation manner, the computing resource scheduling modes corresponding to the target containers are strong isolation modes; in the strong isolation mode, when no load is carried on the ith target container, the load corresponding to the jth target container cannot be carried on the GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
According to an aspect of the present disclosure, there is provided a GPU virtualization apparatus for a container, comprising: the system comprises an interception module, a first IOCTL system call request and a second IOCTL system call module, wherein the interception module is used for intercepting a first IOCTL system call request initiated by a target container, the target container is loaded with virtual GPU equipment, the virtual GPU equipment is obtained based on virtualization of one physical GPU equipment, and the first IOCTL system call request is used for obtaining a GPU video memory state corresponding to the target container; the acquisition module is used for acquiring corresponding video memory state information based on the first IOCTL system call request, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment; and the sending module is used for returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In the embodiment of the disclosure, a first IOCTL system call request initiated by a target container is intercepted, wherein virtual GPU equipment obtained based on virtualization of one physical GPU equipment is mounted on the target container, and the first IOCTL system call request is used for acquiring a GPU video memory state corresponding to the target container; based on the first IOCTL system call request, obtaining corresponding video memory state information, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment; and returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated. And intercepting the IOCTL system call request to realize GPU virtualization for the container based on the IOCTL in the kernel state, thereby realizing GPU resource sharing of the container.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
FIG. 1 illustrates a flowchart of a method for GPU virtualization for a container, according to an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of GPU vessel virtualization in accordance with an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of a target container performing a video memory application in accordance with an embodiment of the present disclosure;
FIG. 4 illustrates a block diagram of a GPU virtualization apparatus for a container, according to an embodiment of the present disclosure;
fig. 5 shows a block diagram of an electronic device, according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
In the related art, the GPU virtualization method includes: multi-Process Service (MPS), vCUDA (virtual Compute Unified Device Architecture), etc.
In some related art, MPS is another binary compatible implementation of CUDA application programming interface (Application Program Interface, API). The MPS runtime architecture is designed to transparently enable collaborative multi-process CUDA applications (typically MPI jobs) to take advantage of the Hyper-Q functionality on the latest kepler-based GPUs. Hyper-Q allows CUDA kernels to be processed in parallel on the same physical GPU device, which may improve performance in cases where the computing power of a single physical GPU device is underutilized by a single application process. MPS is a binary compatible client-server runtime implemented CUDAAPI that consists of several components: the control daemon is responsible for starting and stopping the server and coordinating the connection between the client and the server; when the client runs, the MPS client is constructed into a CUDA driver library during running, and can be transparently used by any CUDA application program; and a server process, wherein the server is a shared connection between the client and the GPU, and concurrency is provided between the clients.
vCUDA is a GPU resource restriction component, belonging to CUDA hijacking. The vCUDA manages the memory usage of each container by hijacking the memory application and release request of the vCUDA, thereby realizing memory isolation. It should be noted that vCUDA applies for context and does not pass through malloc function, and it is not known how much memory is used in context by the process. Thus, vCUDA queries the GPU for current memory usage each time. In terms of computational power isolation, a user may specify GPU utilization for a container. vCUDA will monitor utilization and do some processing when the limit utilization is exceeded. vCUDA may support hard and soft isolation. The difference between the two is that if there are resources free, soft isolation allows tasks to exceed the setting, while hard isolation does not. Because the monitoring and adjusting scheme is used, the vCUDA cannot limit the calculation force in a short time, can only ensure long-time efficiency fairness, and is not suitable for scenes with extremely short task time such as reasoning.
MPS and vCUDA, while enabling GPU virtualization, interact among multiple users and are difficult to maintain.
The application scenarios of GPU virtualization include AI reasoning, an Zhuoyun gaming, etc. The characteristics of AI reasoning include: the calculation force utilization rate is generally not high, and typical wave crests and wave troughs are provided; mainly aims at the resource consumption of the video memory and the computing power; typically based on TensorRT or Trion Inference Server; micro-services are typically built based on container scenarios. Thus, the goal need to achieve GPU virtualization in AI reasoning application scenarios includes: GPU resource sharing is realized through virtualization, including virtualization of video memory resources and computing resources; the shared GPU resources (video memory resources and computing resources) need to be limited; to guarantee a higher service level objective (Service Level Objectives, SLO), mainly the effects of latency and throughput; the support container has stronger isolation security. The An Zhuoyun game features include: a single GPU device can support as many ways as possible; the quality of service (Quality of Service, qoS) can be guaranteed to be optimal. Thus, the goals of achieving GPU virtualization in an android cloud gaming application scenario need to include: GPU multiplexing is achieved through virtualization; virtualization of video memory and codec computation is provided for the requirements of An Zhuoyun game QoS.
The embodiment of the disclosure provides a GPU virtualization method for a container, which can be applied to the application scenes such as AI reasoning, an Zhuoyun games and the like. The GPU virtualization method for the container, provided by the embodiment of the disclosure, can realize GPU virtualization when the container is operated, can support resource isolation, and has higher security. The GPU virtualization method for a container provided by the embodiments of the present disclosure is described in detail below.
FIG. 1 illustrates a flowchart of a method for GPU virtualization for a container, according to an embodiment of the present disclosure. The method may be performed by an electronic device, such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, etc., and the method may be implemented by a processor invoking computer readable instructions stored in a memory. Alternatively, the method may be performed by a server. As shown in fig. 1, the method includes:
in step S11, a first IOCTL system call request initiated by a target container is intercepted, where a virtual GPU device is mounted on the target container, the virtual GPU device is obtained based on virtualization of a physical GPU device, and the first IOCTL system call request is used to obtain a GPU video memory state corresponding to the target container.
The target container is a container applying for GPU resource sharing, and when the target container is created, a virtual GPU device is dynamically created for the target container based on the GPU resource sharing application of the target container. The GPU resource sharing application of the target container may include a video memory resource/computing resource sharing application, so that the video memory resource/computing resource size required by the target container is set for the virtual GPU device dynamically created by the target container, and then the virtual GPU device is mounted to the target container.
Fig. 2 shows a schematic diagram of GPU container virtualization according to an embodiment of the present disclosure. As shown in fig. 2, when a target container for which GPU resource sharing is applied is created, GPU resources (v/gpu_ipc,/dev/dri/card 0,/dev/dri/render 128 shown in fig. 2) of a physical GPU device are virtualized based on a procfc configuration interface based on the GPU resource sharing application of the target container, and virtual GPU devices (v/ipc 0,/dev/card 0,/dev/render 128 shown in fig. 2) are dynamically created for the target container. The virtual GPU equipment is provided with corresponding video memory resources/computing resource sizes. For example, if the video memory resource sharing application of the target container is 2GiB, the video memory resource size is set to 2GiB for the virtual GPU device.
As shown in fig. 2, the virtual GPU device obtained by virtualization is renamed to be the same name as the GPU resource of the physical GPU device, so that the target container in the user state does not need to pay attention to the virtualization process of the physical GPU device, and the virtual GPU device is mounted without perception.
Kernel Mode (Kernel Mode) and User Mode (User Mode) are two running states of the operating system. Kernel mode: and operating an operating system program and operating hardware. User mode: and running the user program. The kernel mode and the user mode are in the same process space, but the function interface cannot be directly called, and the communication mode between the kernel mode and the user mode mainly comprises the following steps: file system interfaces, IOCTL system calls, etc.
The target container initiates a first IOCTL system call request for acquiring a GPU video memory state corresponding to the mounted virtual GPU equipment in a kernel state based on an IOCTL system call interface in a user state. The target container may be a container running AI reasoning tasks, a container running An Zhuoyun games, or a container running other tasks, which is not particularly limited in this disclosure.
And acquiring the corresponding GPU video memory state of the virtual GPU equipment in the kernel mode by intercepting the first IOCTL system call request.
In step S12, based on the first IOCTL system call request, corresponding video memory state information is obtained, where the video memory state information is video memory information corresponding to the virtual GPU device in the physical GPU device.
Taking the example of fig. 2 as an example, based on the first IOCTL system call request, an extensible (scalable) GPU kernel module (scalable GPU kernel module shown in fig. 2) is called to obtain corresponding video memory status information.
In step S13, the video memory status information is returned to the target container to implement GPU virtualization based on the video memory status information when the target container is running.
According to the embodiment of the disclosure, the GPU resource sharing of the container is realized by intercepting the IOCTL system call request to realize the GPU container virtualization based on the IOCTL in a kernel state.
In order to realize the video memory isolation in the process of the GPU container virtualization, the video memory state function needs to be realized in the process of the GPU container virtualization, and the first IOCTL system call request corresponds to the video memory state function.
In one possible implementation, the method further includes: intercepting a second IOCTL system call request initiated by the target container, wherein the second IOCTL system call request is used for applying for the GPU video memory; judging whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory mount amount corresponding to the virtual GPU equipment mounted on the target container; and when the application amount of the video memory exceeds the video memory mounting amount, returning prompt information to the target container, wherein the prompt information is used for prompting the current video memory application to overflow the video memory.
When the target container operates, the target container initiates a second IOCTL system call request for performing GPU video memory call based on the IOCTL system call interface in a user state, intercepts the second IOCTL system call request, further judges whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory hanging amount corresponding to the virtual GPU equipment mounted on the target container, and returns prompt information for prompting the current video memory application to overflow to the target container when the video memory application amount exceeds the video memory hanging amount, so that container operation failure caused by video memory overflow can be avoided. The judgment is performed in the kernel mode, so that the phenomenon of breaking through the limit of the video memory in the user mode can be avoided, and the safety of the virtualization process of the GPU container is improved.
In order to realize the video memory isolation in the process of the GPU container virtualization, the video memory application function needs to be realized in the process of the GPU container virtualization, and the second IOCTL system call request corresponds to the video memory application function.
In one possible implementation, the target container initiates the first IOCTL system call request/the second IOCTL system call request based on the IOCTL interface corresponding to the target component; the target assembly includes at least one of: openCL, CUDA.
Taking the above fig. 2 as an example, as shown in fig. 2, components that can be operated on the target container include: android OS/App, tensorFlow/Pytorch, smi, openGL ES, CUDA/Open CL, MTML.
And initiating a first IOCTL system call request based on an IOCTL interface corresponding to the target component OpenCL or CUDA, and realizing a video memory application function by intercepting the first IOCTL system call request.
And initiating a second IOCTL system call request based on an IOCTL interface corresponding to the target component OpenCL or CUDA or MTML, and realizing a video memory state function by intercepting the second IOCTL system call request.
The IOCTL interfaces corresponding to different target components are different.
In addition to the first IOCTL system call request/second IOCTL system call request being initiated based on OpenCL, CUDA, other components may be initiated, as this disclosure is not particularly limited.
FIG. 3 illustrates a schematic diagram of a target container performing a video memory application according to an embodiment of the present disclosure. As shown in FIG. 3, a second IOCTL system call request is initiated based on the IOCTL interface corresponding to the component TensorFlow/Pytorch or smi or CUDA/OpenCL or MTML running on the target container, so as to realize the video memory application. The scalable GPU kernel module judges whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory hanging amount corresponding to at least one virtual GPU device mounted on the target container. When the amount Of the video Memory application exceeds the video Memory mount amount, a prompt message (Out Of Memory, OOM, as shown in FIG. 3) for prompting the current video Memory application for video Memory overflow is returned to the target container.
In one possible implementation, the method further includes: and when the video memory application amount does not exceed the video memory mounting amount, calling video memory resources of the physical GPU equipment for the target container based on the video memory application amount.
Taking fig. 2 as an example, as shown in fig. 2, a GPU Kernel Module (KMD) is used to schedule a physical GPU device. And when the video memory application amount does not exceed the video memory mounting amount, calling a GPU kernel module KMD based on the video memory application amount corresponding to the second IOCTL system call request, and calling video memory resources of the physical GPU equipment for the target container.
In one possible implementation, the number of target containers is a plurality; for any one target container, each virtual GPU device mounted on the target container is provided with a corresponding GPU time slice, wherein the GPU time slices are used for indicating the computing resources of the physical GPU devices.
To achieve computational isolation during GPU container virtualization, different target containers are allocated specified shares of GPU time slices for indicating computing resources of a physical GPU device. For example, GPU time slices may be allocated to target containers based on preset proportions of different target containers and whether or not to run a load. The GPU time slices may be set to a minimum of 0.1ms, with different shares of time slices being used to indicate different GPU computation durations.
After corresponding GPU time slices are allocated for different target containers, scheduling of GPU computing resources is completed through scheduling aiming at the time slice level, so that computational isolation among different target containers is achieved.
In one possible implementation, the computing resource scheduling modes corresponding to the plurality of target containers are weakly isolated modes; in the weak isolation mode, when no load is carried on the ith target container, the load corresponding to the jth target container is carried on the GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
In the weak isolation mode, when no load is operated on one target container, loads corresponding to other target containers can be operated on GPU time slices corresponding to the target containers operated without the loads, so that the GPU utilization rate is effectively improved.
In one possible implementation, the computing resource scheduling modes corresponding to the plurality of target containers are strong isolation modes; in the strong isolation mode, when no load is operated on the ith target container, the load corresponding to the jth target container cannot be operated on the GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
In the strong isolation mode, when no load is operated on one target container, loads corresponding to other target containers cannot be operated on GPU time slices corresponding to the target containers operated without the loads, so that complete computational power isolation among different target containers is effectively realized.
According to the GPU virtualization method for the container, GPU virtualization of a physical GPU device for the container can be achieved in a kernel state, and video memory isolation and computational power isolation are achieved. When the GPU virtualization method for the container is applied in an actual application scene, the GPU virtualization method for the container has high processing performance. For example, the performance loss is within 5% compared to the ideal performance data.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides a GPU virtualization device for a container, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any GPU virtualization method for a container provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions of method parts are not repeated.
Fig. 4 illustrates a block diagram of a GPU virtualization device for a container, according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 40 includes:
the interception module 41 is configured to intercept a first IOCTL system call request initiated by the target container, where the target container is loaded with a virtual GPU device, the virtual GPU device is obtained by virtualization based on a physical GPU device, and the first IOCTL system call request is used to obtain a GPU video memory state corresponding to the target container;
the obtaining module 42 is configured to obtain corresponding video memory status information based on the first IOCTL system call request, where the video memory status information is video memory information corresponding to the virtual GPU device in the physical GPU device;
and the sending module 43 is configured to return the video memory status information to the target container, so as to implement GPU virtualization based on the video memory status information when the target container is running.
In one possible implementation, the obtaining module 42 is configured to intercept a second IOCTL system call request initiated by the target container, where the second IOCTL system call request is used to apply for GPU video memory;
the apparatus 40 further comprises:
the judging module is used for judging whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory hanging amount corresponding to the virtual GPU equipment mounted on the target container;
and the sending module 43 is configured to return a prompt message to the target container when the video memory application amount exceeds the video memory hanging amount, where the prompt message is used to prompt that the video memory is applied for video memory overflow at the present time.
In one possible implementation, the apparatus 40 further includes:
and the resource allocation module is used for calling the video memory resource of the physical GPU equipment for the target container based on the video memory application amount when the video memory application amount does not exceed the video memory hanging amount.
In one possible implementation, the target container initiates the first IOCTL system call request/the second IOCTL system call request based on the IOCTL interface corresponding to the target component;
the target assembly includes at least one of: openCL, CUDA.
In one possible implementation, the number of target containers is a plurality;
for any one target container, the virtual GPU equipment mounted on the target container is provided with a corresponding GPU time slice, wherein the GPU time slice is used for indicating the computing resources of the physical GPU equipment.
In one possible implementation, the computing resource scheduling modes corresponding to the plurality of target containers are weakly isolated modes;
the apparatus 40 further comprises:
and the operation module is used for operating the load corresponding to the j-th target container on the GPU time slice corresponding to the i-th target container when no load is operated on the i-th target container in the weak isolation mode, wherein i and j are different positive integers.
In one possible implementation, the computing resource scheduling modes corresponding to the plurality of target containers are strong isolation modes;
and the operation module is used for enabling the load corresponding to the j-th target container to be unable to operate on the GPU time slice corresponding to the i-th target container when the load is not operated on the i-th target container in the strong isolation mode, wherein i and j are different positive integers.
The method has specific technical association with the internal structure of the computer system, and can solve the technical problems of improving the hardware operation efficiency or the execution effect (including reducing the data storage amount, reducing the data transmission amount, improving the hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system which accords with the natural law.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
The electronic device may be provided as a terminal, server or other form of device.
Fig. 5 shows a block diagram of an electronic device, according to an embodiment of the disclosure. Referring to fig. 5, an electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 5, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958. Electronic device 1900 may operate an operating system based on memory 1932, such as the Microsoft Server operating system (Windows Server) TM ) Apple Inc. developed graphical user interface based operating System (Mac OS X TM ) Multi-user multi-process computer operating systemSystem (Unix) TM ) Unix-like operating system (Linux) of free and open source code TM ) Unix-like operating system (FreeBSD) with open source code TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains independent consent of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method for GPU virtualization of a container, comprising:
intercepting a first IOCTL system call request initiated by a target container, wherein virtual GPU equipment is mounted on the target container, the virtual GPU equipment is obtained based on virtualization of one physical GPU equipment, and the first IOCTL system call request is used for obtaining a GPU video memory state corresponding to the target container;
based on the first IOCTL system call request, obtaining corresponding video memory state information, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment;
and returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated.
2. The method according to claim 1, wherein the method further comprises:
intercepting a second IOCTL system call request initiated by the target container, wherein the second IOCTL system call request is used for performing GPU video memory application;
judging whether the video memory application amount corresponding to the second IOCTL system call request exceeds the video memory hanging amount corresponding to the virtual GPU equipment mounted on the target container;
and when the video memory application amount exceeds the video memory mounting amount, returning prompt information to the target container, wherein the prompt information is used for prompting the current video memory application to overflow the video memory.
3. The method according to claim 2, wherein the method further comprises:
and when the video memory application amount does not exceed the video memory hanging amount, invoking video memory resources of the physical GPU equipment for the target container based on the video memory application amount.
4. A method according to any one of claims 1 to 3, wherein the target container initiates the first IOCTL system call request/the second IOCTL system call request based on an IOCTL interface corresponding to the target component;
the target assembly includes at least one of: openCL, CUDA.
5. The method of claim 1, wherein the number of target containers is a plurality;
and aiming at any one target container, the virtual GPU equipment mounted on the target container is provided with a corresponding GPU time slice, wherein the GPU time slice is used for indicating the computing resources of the physical GPU equipment.
6. The method of claim 5, wherein the computing resource scheduling patterns corresponding to the plurality of target containers are weakly isolated patterns;
and in the weak isolation mode, when no load is carried on the ith target container, carrying out operation on a load corresponding to the jth target container on a GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
7. The method of claim 5, wherein the computing resource scheduling patterns corresponding to the plurality of target containers are strongly isolated patterns;
in the strong isolation mode, when no load is carried on the ith target container, the load corresponding to the jth target container cannot be carried on the GPU time slice corresponding to the ith target container, wherein i and j are different positive integers.
8. A GPU virtualization apparatus for a container, comprising:
the system comprises an interception module, a first IOCTL system call request and a second IOCTL system call module, wherein the interception module is used for intercepting a first IOCTL system call request initiated by a target container, the target container is loaded with virtual GPU equipment, the virtual GPU equipment is obtained based on virtualization of one physical GPU equipment, and the first IOCTL system call request is used for obtaining a GPU video memory state corresponding to the target container;
the acquisition module is used for acquiring corresponding video memory state information based on the first IOCTL system call request, wherein the video memory state information is the video memory information corresponding to the virtual GPU equipment in the physical GPU equipment;
and the sending module is used for returning the video memory state information to the target container so as to realize GPU virtualization based on the video memory state information when the target container is operated.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 17.
CN202211699151.1A 2022-12-28 2022-12-28 GPU virtualization method and device for container and electronic equipment Active CN116188240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211699151.1A CN116188240B (en) 2022-12-28 2022-12-28 GPU virtualization method and device for container and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211699151.1A CN116188240B (en) 2022-12-28 2022-12-28 GPU virtualization method and device for container and electronic equipment

Publications (2)

Publication Number Publication Date
CN116188240A true CN116188240A (en) 2023-05-30
CN116188240B CN116188240B (en) 2024-04-05

Family

ID=86448087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211699151.1A Active CN116188240B (en) 2022-12-28 2022-12-28 GPU virtualization method and device for container and electronic equipment

Country Status (1)

Country Link
CN (1) CN116188240B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572509A (en) * 2014-12-26 2015-04-29 中国电子科技集团公司第十五研究所 Method for realizing discrete display card video memory distribution on Godson computing platform
US20170132002A1 (en) * 2015-11-10 2017-05-11 International Business Machines Corporation Instruction stream modification for memory transaction protection
CN110196753A (en) * 2019-01-21 2019-09-03 腾讯科技(北京)有限公司 Graphics processor GPU vitualization method, apparatus and readable medium based on container
CN111913794A (en) * 2020-08-04 2020-11-10 北京百度网讯科技有限公司 Method and device for sharing GPU, electronic equipment and readable storage medium
CN112231048A (en) * 2020-09-25 2021-01-15 苏州浪潮智能科技有限公司 GPU resource using method and system based on X86 server platform
WO2021098182A1 (en) * 2019-11-20 2021-05-27 上海商汤智能科技有限公司 Resource management method and apparatus, electronic device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572509A (en) * 2014-12-26 2015-04-29 中国电子科技集团公司第十五研究所 Method for realizing discrete display card video memory distribution on Godson computing platform
US20170132002A1 (en) * 2015-11-10 2017-05-11 International Business Machines Corporation Instruction stream modification for memory transaction protection
CN110196753A (en) * 2019-01-21 2019-09-03 腾讯科技(北京)有限公司 Graphics processor GPU vitualization method, apparatus and readable medium based on container
WO2021098182A1 (en) * 2019-11-20 2021-05-27 上海商汤智能科技有限公司 Resource management method and apparatus, electronic device and storage medium
CN111913794A (en) * 2020-08-04 2020-11-10 北京百度网讯科技有限公司 Method and device for sharing GPU, electronic equipment and readable storage medium
US20210208951A1 (en) * 2020-08-04 2021-07-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for sharing gpu, electronic device and readable storage medium
CN112231048A (en) * 2020-09-25 2021-01-15 苏州浪潮智能科技有限公司 GPU resource using method and system based on X86 server platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网友: ""深度剖析:针对深度学习的GPU共享"", Retrieved from the Internet <URL:https://mp.weixin.qq.com/s/o-pfieZ_j1Gr_Igrsqud0w> *

Also Published As

Publication number Publication date
CN116188240B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
US10467725B2 (en) Managing access to a resource pool of graphics processing units under fine grain control
US10109030B1 (en) Queue-based GPU virtualization and management system
US10897428B2 (en) Method, server system and computer program product for managing resources
CN113849312B (en) Data processing task allocation method and device, electronic equipment and storage medium
US9910704B1 (en) Run time task scheduling based on metrics calculated by micro code engine in a socket
CN112988346B (en) Task processing method, device, equipment and storage medium
US10733022B2 (en) Method of managing dedicated processing resources, server system and computer program product
US10037225B2 (en) Method and system for scheduling computing
CN116185554A (en) Configuration device, scheduling device, configuration method and scheduling method
US20130219386A1 (en) Dynamic allocation of compute resources
CN115237589A (en) SR-IOV-based virtualization method, device and equipment
CN116188240B (en) GPU virtualization method and device for container and electronic equipment
US20200278890A1 (en) Task management using a virtual node
EP3430510B1 (en) Operating system support for game mode
Gupta et al. Load balancing using genetic algorithm in mobile cloud computing
CN114385351A (en) Cloud management platform load balancing performance optimization method, device, equipment and medium
US10223153B1 (en) Accounting and enforcing non-process execution by container-based software transmitting data over a network
CN117176963B (en) Virtualized video encoding and decoding system and method, electronic equipment and storage medium
CN117176964B (en) Virtualized video encoding and decoding system and method, electronic equipment and storage medium
US20240160492A1 (en) System and method for radio access network baseband workload pool resizing
CN115422530A (en) Method and device for isolating tenant and electronic equipment
CN116719605A (en) GPU load deployment method, cloud computing platform and electronic equipment
CN117742957A (en) Memory allocation method, memory allocation device, electronic equipment and storage medium
CN115373752A (en) Service processing method, device and storage medium
CN117056041A (en) Fine granularity resource pool scheduling method and system for GPU remote call

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant