CN117667377A

CN117667377A - Container calculation force scheduling method, device, electronic equipment and storage medium

Info

Publication number: CN117667377A
Application number: CN202211056435.9A
Authority: CN
Inventors: 徐进; 冯敦超
Original assignee: Shenzhen Huantai Technology Co Ltd
Current assignee: Shenzhen Huantai Technology Co Ltd
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2024-03-08

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for container calculation scheduling. The method comprises the following steps: determining a current container and a next container of the current container from a plurality of containers on the recorded physical GPU card under the condition of GPU virtualization computing power dispatching thread circulation; acquiring a context identifier of a current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container. According to the method and the device, through a virtualization technology, a plurality of virtual GPU cards are created to flexibly segment the calculation power of the whole physical GPU card, so that a plurality of containers multiplex the calculation power of the same GPU card, and the utilization rate of the GPU is improved.

Description

Container calculation force scheduling method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method, apparatus, electronic device, and storage medium for container power dispatch.

Background

As the application of artificial intelligence becomes wider, the use scale of computing power of an image processor (Graphic Processing Unit, GPU) becomes larger, and the computing efficiency and the use cost of the image processor are more and more concerned. In order to improve GPU utilization, how to allow multiple containers to effectively reuse the same GPU card has been a challenge.

Disclosure of Invention

In view of the above problems, the present application provides a method, an apparatus, an electronic device, and a storage medium for scheduling computing power of a container, which can create multiple virtual GPU cards through a virtualization technology to flexibly segment computing power of a physical GPU whole card, so that multiple containers multiplex computing power of the same GPU card, and improve the utilization rate of the GPU.

In a first aspect, embodiments of the present application provide a method for container power dispatch, the method comprising: determining a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card under the condition that a graphic processor GPU virtualizes a computing power dispatching thread cycle; acquiring a context identifier of the current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling condition based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

In a second aspect, embodiments of the present application provide an apparatus for container power dispatch, the apparatus comprising: the device comprises a current container determining module, a context identification obtaining module and a context switching module. The system comprises a current container determining module, a current container determining module and a current container, wherein the current container determining module is used for determining a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card under the condition that a graphic processor GPU virtualizes a computing power dispatching thread to circulate; a context identifier obtaining module, configured to obtain a context identifier of the current container as a first context identifier, and obtain a context identifier of a next container of the current container as a second context identifier; and the context switching module is used for stopping the context of the process of the current container and starting the context of the process of the next container of the current container if the current container and the next container of the current container are determined to meet the preset scheduling condition based on the first context identifier and the second context identifier.

In a third aspect, embodiments of the present application provide an electronic device comprising a memory coupled to a processor and a processor, the memory storing instructions that when executed by the processor perform the above-described method.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, the program code being callable by a processor to perform the above method.

In the method, the device, the electronic equipment and the storage medium for container computing power dispatching provided by the embodiment of the application, under the condition that a graphic processor GPU virtualizes computing power dispatching thread to circulate, determining a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card; acquiring a context identifier of a current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container. According to the method and the device, through a virtualization technology, a plurality of virtual GPU cards are created to flexibly segment the calculation power of the whole physical GPU card, so that a plurality of containers multiplex the calculation power of the same GPU card, and the utilization rate of the GPU is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a graphics processor according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for container power dispatch provided in an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for container power dispatch provided in an embodiment of the present application;

FIG. 4 is a flow chart illustrating loading of kernel modules according to an embodiment of the present application;

FIG. 5 illustrates a flow diagram of container creation provided by an embodiment of the present application;

FIG. 6 is a flow diagram of a record context identification provided by an embodiment of the present application;

FIG. 7 is a schematic flow diagram of a GPU virtualized computing thread according to an embodiment of the present application;

FIG. 8 illustrates a block diagram of an apparatus for container power dispatch provided in an embodiment of the present application;

FIG. 9 illustrates a block diagram of an electronic device for performing a method of container power dispatch according to an embodiment of the present application;

fig. 10 illustrates a memory unit for storing or carrying program code implementing a method of container power dispatch according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

The following describes terms related to the present application:

GPU, graphics Processing Unit, graphics processor.

mGpu is a GPU graphics processor (graphics card processor) on the motherboard of the integrated chipset "NVIDIA".

The container is a lightweight operating system level virtualization technology, and the main flow technical scheme is a dock. Containers are techniques for binding applications and all of their necessary files into one runtime environment, with which the software can be isolated so that it can run independently in different operating systems, hardware, networks, storage systems, and security policies; using containers users can avoid crashes due to environmental incompatibilities and can achieve consistent performance across different machines. The size of the container is only tens of MB, and the container can be operated in any environment and has extremely low cost. Among them, docker is a well-known software platform supporting containerization. On a Docker, a developer may design and build applications within a container, test the applications, and deliver them to other machines and environments.

run, a lightweight portable container runtime, is a component for creating and managing containers.

Linux, open source operating system.

cuda, compute Unified Device Architecture is a general parallel computing architecture proposed by NVIDIA, and a user can utilize a GPU after GeForce 8 of NVIDIA and a newer Quadro GPU for computation, or can use the GPU as a development environment of a C-compiler for the first time. The cuda function library may include a plurality of functions, such as cuctxCreate functions, cudaMalloc functions, and the like.

ioctl, in a computer, is a system call dedicated to device input/output operations, which call passes into a request code associated with the device, the function of the system call being entirely dependent on the request code.

In many Unix-like computer systems, procfs are abbreviations for process file systems (file systems), which contain a pseudo file system (dynamically generated file system at startup) for accessing process information through the kernel.

The mknod system is used for creating a file system node with a specified file name and treating all devices as files.

As the application scene of artificial intelligence is wider, the use scale of the GPU computing power is larger and larger, and the computing efficiency and the use cost are more and more concerned. When the AI falls to the ground, the calculation power requirement of the AI model is not required to occupy the whole card of GPU in training or reasoning under certain scenes, for example, the requirement can be met by only 0.5 card of GPU. In this case, in order to improve GPU utilization, some GPU virtualization technologies have also been developed in the related art. The common NVIDIA GPU virtualization technical scheme comprises the following steps: NVIDIA GRID (vcpu), NVIDIA MPS, cGPU, vcuda, etc.

The vcuda is realized based on hijacking forwarding of a user state cuda interface, and when the business program is in intrusion, a customized function library is needed to be preloaded before the business program is started; and the corresponding cuda interface needs to be adapted, so that the compatibility problem exists; and the calculation force cannot be accurately scheduled and limited, and the isolation is poor. Therefore, in the related art, the GPU utilization rate is not high, and there is an unreasonable problem in the power scheduling of the GPU.

In view of the above problems, the inventor finds out through long-term research and proposes a method, a device, an electronic device and a storage medium for container power scheduling, which are provided by the embodiments of the present application, by determining a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card under the condition that a graphics processor GPU virtualizes a power scheduling thread cycle; acquiring a context identifier of a current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container. According to the method and the device, through a virtualization technology, a plurality of virtual GPU cards are created to flexibly segment the calculation power of the whole physical GPU card, so that a plurality of containers multiplex the calculation power of the same GPU card, the calculation power scheduling and limitation of the containers are realized, and the utilization rate of the GPU is improved. The method of the specific container calculation scheduling is described in detail in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic diagram of a graphics processor to which a method for container computing power scheduling according to an embodiment of the present application is applied. The architecture of the graphics processor (mGPU) comprises a kernel module and a user mode component. The kernel module is used for distributing the primary equipment number and the secondary equipment number of the virtual GPU card, so as to realize calculation scheduling and limitation of the container; the user state component is used for configuring the calculation weight of the virtual GPU card when the container is created, creating the virtual GPU card according to the primary equipment number and the secondary equipment number distributed by the kernel module, and mounting the virtual GPU card into the container. The virtual GPU cards are created by the dashed line frame of the kernel module, a plurality of virtual GPU cards can be created on each physical GPU card, one virtual GPU card can be at least mounted on one container, and a plurality of containers can multiplex the same physical GPU card.

The user state component creates a container A, a container B and a container C; mGPU0 in the kernel module is mounted on a container A, mGPU1 is mounted on a container B, and mGPU2 is mounted on a container C; the system comprises a physical GPU card, an mGPU0 computing power, an mGPU1 computing power and an mGPU2 computing power, wherein the mGPU0 computing power accounts for 1/2 computing power of the whole physical GPU card, the mGPU1 computing power accounts for 1/2 computing power of the whole physical GPU card, and the mGPU2 computing power accounts for 1/4 computing power of the whole physical GPU card. Each container may include applications such as Tensorflow, pytorch, CUDA Runtimes & Driver, and interfaces to functions such as/dev/nvidia 0. Wherein NVIDIA GPU Kernel Driver is the driver of the physical GPU card. Wherein, the mGPUs can comprise a plurality of physical GPU cards.

Referring to fig. 2, fig. 2 is a flow chart illustrating a method for applying container computing force scheduling according to an embodiment of the present application. According to the method for scheduling the computational power of the container, a plurality of virtual GPU cards are created to flexibly segment the computational power of the physical GPU whole card through a virtualization technology, so that a plurality of containers multiplex the computational power of the same GPU card, the computational power scheduling and limitation of the containers are realized, and the utilization rate of the GPU is improved. In a specific embodiment, the method of container computing power scheduling may be applied to the apparatus 200 for container computing power scheduling as shown in fig. 8 and the electronic device 100 (fig. 9) configured with the apparatus 200 for container computing power scheduling. The specific flow of the present embodiment will be described below by taking an electronic device as an example. Of course, it is understood that the electronic device applied in this embodiment may include a smart phone, a tablet computer, a vehicle, a wearable electronic device, and the like, which is not limited herein. As will be described in detail below with respect to the flow shown in fig. 2, the method for computing a force schedule for a container may specifically include the following steps:

step S110: in the case of a graphics processor GPU virtualized power dispatch thread loop, a current container and a next container to the current container are determined from among a plurality of containers on a recorded physical GPU card.

In some embodiments, the electronic device may be preset with a main function of the GPU virtualized computing power scheduling thread, where the main function may enable virtual machine instances running on a data center server in the electronic device to share and use the same GPU processor or processors to perform graphics operations by using a GPU virtualization technology, which is a safe and efficient desktop access manner; two AI training or AI reasoning application services can be executed on one physical GPU card at the same time, so that the computing power resource is greatly squeezed, and the cost is reduced.

The GPU virtualization computing power dispatching thread can realize GPU virtualization in a kernel state and comprises a kernel module and a user state component, and computing power of the whole physical GPU card is flexibly segmented by creating a plurality of virtual GPU card devices so as to be convenient for a plurality of containers to use. For Linux of the multi-task system, the operating system codes and the application program codes can be isolated by distinguishing the design of the kernel space and the user space, and even if a single application program has errors, other programs can normally run, so that the stability of the operating system is improved.

In some embodiments, when the kernel module is initialized by the operating system of the electronic device, the kernel module creates a kernel thread for the corresponding physical GPU card, and the kernel thread is used for scheduling computation power of all containers on the physical GPU card. For example, linux in an electronic device may develop kernel modules and user-state components virtualized by a GPU, and after developing the kernel modules and user-state components, the developed kernel modules and user-state components may be loaded into processes virtualized by an image processor. Wherein, the kernel module can comprise an opening function, a closing function, an ioctl function and the like; the user-state components may include hook programs, process file systems, runc components, and the like, without limitation.

The computing power scheduling can be to switch the context of the process in each container on the physical GPU; the kernel module may record all containers on the physical GPU to which the GPU virtualized computing power scheduling thread corresponds. The containers on the physical GPU card may be created according to the needs of the user, and one or more containers may be created, which is not limited in this embodiment of the present application. Wherein, there is a running precedence relationship between the multiple containers on each physical GPU card.

In some embodiments, during the continuous cycling of the GPU virtualized power dispatch thread, an operating system in the electronic device may determine, from all containers on the physical GPU card recorded by the kernel module, a container that is running as a current container, and determine, as a next container to the current container, a container that is running on the physical GPU card. The method comprises the steps that a current container is a container in which a process context is running when a GPU virtualized computing power dispatching thread runs on a physical GPU card; when the next container of the current container is the running of the GPU virtualized computing power dispatching thread, stopping the current container process context, and switching the container running the process context.

In the embodiment of the application, a virtual GPU card created according to a preset virtual GPU specification is pre-loaded in a container. The preset virtual GPU specification may be the power weight of the virtual GPU, the video memory size of the virtual GPU, or the power weight and the video memory size of the virtual GPU, which is not limited herein. In the embodiment of the application, when a plurality of containers multiplex the same physical GPU card, the calculation power duty ratio of the containers cannot be limited, and the preset virtual GPU specification can be the calculation power weight of the virtual GPU, so that the calculation power scheduling and limitation of the containers are realized based on the GPU virtualization technology realized in the kernel mode.

Step S120: the context identifier of the current container is obtained as a first context identifier, and the context identifier of the next container of the current container is obtained as a second context identifier.

In some embodiments, during a GPU virtualized power dispatch thread cycle, an operating system of an electronic device may control an computing application (e.g., a cuda application) within a container to initialize, the cuda application may open a virtual GPU card mounted on the container via a system call open function (open function). The operating system may also send a request to create a context (ctx) to the kernel module through a cuda application driver by a system call ioctl function. The operating system can also forward the request for creating the context to the driver of the physical GPU card through the kernel module, and acquire feedback data after the request for creating the context is successful driven by the driver of the physical GPU card. The operating system may also extract, by the kernel module, a Context ID (Context ID) of the process of the container from the data structure of the feedback data that the request to create the Context was successful, and record for use by the subsequent GPU virtualized computing power scheduling thread.

Further, during the process of GPU virtualization power scheduling thread loop, the operating system of the electronic device may obtain, through the kernel module, a context identifier of a current container as a first context identifier, and obtain, as a second context identifier, a context identifier of a next container of the current container.

Step S130: if the current container and the next container of the current container meet the preset scheduling condition based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

In some embodiments, an operating system of the electronic device may determine, according to a scheduling policy, whether the current container and a next container of the current container satisfy a preset scheduling condition based on the first context identification and the second context identification. The scheduling policy may be preset in an operating system of the electronic device. The scheduling policy may be set according to the weight of the container, or may be set according to the average preemption weight of the container, which is not limited herein.

The weight is the proportion of the calculated force of one set container to the calculated force of the whole physical GPU card; the average is to equally divide the calculation force of the whole physical GPU card to all containers on the whole physical GPU card; preemption is the computation of a container that does not use computation forces when the container on the physical GPU is not using computation forces, other containers on the physical GPU card can be used, pre-assigned to the computation forces of the container that does not use computation forces. For example, container a allocates 50% of the power of the entire physical GPU card, and container B allocates 50% of the power of the entire physical GPU card, so that container a can use 100% of the power of the entire physical GPU card when container B does not use the power.

In some embodiments, the operating system of the electronic device may pre-allocate the computing power of each container on the physical GPU card according to the scheduling policy, considering that multiple processes within one container on the physical GPU card may occupy a majority of the computing power of the physical GPU card, resulting in a low computing power ratio of other containers on the physical GPU card. After the operating system of the electronic device allocates the computing power of each container on the physical GPU card in advance according to the scheduling policy, if the current container and the next container of the current container meet the preset scheduling condition based on the first context identifier and the second context identifier, further, the operating system of the electronic device can stop the context of the process of the current container and start the context of the process of the next container of the current container. The method and the device have the advantages that the limitation of the calculation force weight of the container is realized, the situation that the calculation force duty ratio of the container cannot be limited when a plurality of containers share the same GPU card is avoided, the utilization rate of the GPU is improved, and the rationality of the calculation force scheduling of the GPU is improved.

The operation of the electronic device may obtain, according to the first context identifier, that the time slice count of the current period of the current container exceeds a preset weight, and obtain, according to the second context identifier, that the current container and the next container of the current container have schedulable contexts, and determine that the current container and the next container of the current container satisfy the preset scheduling conditions.

According to the technical scheme, under the condition that the GPU virtualizes the computing power dispatching thread circulation, a current container and a next container of the current container are determined from a plurality of containers on the recorded current physical GPU card; acquiring a context identifier of a current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container. Through a virtualization technology, a plurality of virtual GPU cards are created to flexibly segment the calculation power of the whole physical GPU card, so that a plurality of containers multiplex the calculation power of the same GPU card, and the utilization rate of the GPU is improved.

Referring to fig. 3, fig. 3 is a flow chart illustrating a method for container computing power scheduling according to an embodiment of the present application. The method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 3, and the method for calculating the force schedule of the container specifically includes the following steps:

step S210: initializing an operation application program in a container on a current physical GPU card, and opening a virtual GPU card mounted to the container.

In some embodiments, an operating system of an electronic device may initialize an compute application (e.g., a cuda application) within a container through a user-mode component, which may drive a system call kernel module open functions to open a virtual GPU card mounted to the container.

It will be appreciated that after a virtual GPU card is created and installed on a container, an computing application in the container may drive a system call open function to open this virtual GPU card. The system call is a mode of interaction between the user mode component and the kernel module, and an application program in the user mode component can complete access to resources managed by the kernel module through the system call. Wherein the open function is a system call that can open a device or a file, e.g., the open function opens a virtual GPU card that is mounted to the container. Further, after the virtual GPU card mounted to the container is opened, more operations such as a read operation, a write operation, etc. may be performed on the virtual GPU card.

In some embodiments, the method of container power dispatch may further include steps S211-S215 prior to step S210.

Step S211: and loading a kernel module virtualized by the GPU.

In some embodiments, the kernel module for loading GPU virtualization may be a kernel module for loading GPU virtualization by an operating system of the electronic device, and when the kernel module is initialized, a data structure for GPU virtualization is created. The data structure is used for managing resources of the virtual GPU card; the resources of the virtual GPU card may include process information of a container corresponding to the virtual GPU card, context information (computing power scheduling information) of a cuda application in the container, and the like.

Step S212: and applying for and recording the number of the main equipment for virtualizing the physical GPU card.

In some embodiments, after the kernel module is initialized by the operating system of the electronic device, the kernel module may apply for registering multiple functions, such as a character device interface function, an open function, an ioctl function, etc., of the virtual GPU card to the Linux operating system.

In some embodiments, the kernel module may apply for registering the character device interface function of the virtual GPU card to the Linux operating system, and record the master device number major of the virtualized physical GPU card. When the kernel module is loaded, registering a character device interface of the virtual GPU card to the Linux operating system, recording a main device number of the virtual physical GPU card, and preparing for creating the virtual GPU card.

In some embodiments, an operating system in the electronic device may open all the physical GPU cards on the electronic device and record the file descriptors fd for each physical GPU card. The file descriptor is used for identifying the physical GPU card, and can be used for forwarding a request in the virtual GPU card to the corresponding physical GPU card by the subsequent kernel module.

In some embodiments, an operating system in an electronic device may load a kernel module of a GPU virtualization, create a data structure of the GPU virtualization and a graphics processor GPU virtualization power scheduling thread, wherein the data structure is used to manage resources of a virtual GPU card, and the virtualized power scheduling thread is used to power schedule containers on a physical GPU card. Each physical GPU card corresponds to a GPU virtualized computing power dispatching thread, and the GPU virtualized computing power dispatching thread can be used for dispatching the computing power of a plurality of containers on the physical GPU.

Referring to fig. 4, a flow chart of kernel module initialization according to an embodiment of the present application is shown. The method comprises the steps that an operating system of the electronic equipment loads an mGPU kernel module; when the kernel module is initialized, a related data structure of GPU virtualization is created and used for managing virtual card resources, wherein the related data structure comprises process information, calculation scheduling information and the like, and a GPU virtualization calculation scheduling thread is created for each physical GPU card and used for carrying out calculation scheduling on a container on the physical card; when the kernel module is initialized, all physical GPU cards on the electronic equipment can be opened, and file descriptors fd of the physical GPU cards are recorded and used for forwarding a request of the virtual GPU cards to the physical GPU cards subsequently; then, a kernel thread is created for each physical GPU card and is used for carrying out calculation scheduling on each container on the physical GPU; the kernel module can register file system interface functions of the mGPU virtual GPU card to the Linux operating system, and acquire and record the major number of the physical GPU card.

Step S213: and setting the computing power weight of the virtual GPU card corresponding to the container on the physical GPU card based on the environment variable.

In some embodiments, an operating system in an electronic device may set, by a user-mode component, a computing power weight of a virtual GPU card corresponding to a container on a physical GPU card based on an environment variable. When the user state component creates the container, the user state component configures the calculation force weight limit of the container through the environment variable, and requests the kernel module to allocate the secondary device (minor) number of the virtual GPU card according to the calculation force weight limit of each container.

Wherein an operating system in the electronic device may use the environment variables to communicate the setting of the computational power weights for the virtual GPU cards mounted to the container. When the operating system in the electronic equipment creates the container through the user state component, the computing power weight of the virtual GPU card can be obtained through obtaining environment variables defined outside the user state component, and the setting is carried out. It can be understood that the environment variable is defined outside the user mode component, and the computing power weight of the virtual GPU card corresponding to the container on the physical GPU card can be set more flexibly based on the environment variable, so that the change of the environment variable for obtaining the computing power weight of the virtual GPU card of the container when the user mode component program is changed is avoided.

Step S214: and calling a hook program, and sending the computing power weight of the virtual GPU card to the kernel module through a process file system so that the kernel module distributes the secondary equipment number of the virtual GPU card according to the computing power weight of the virtual GPU card.

Wherein creation of the container may be accomplished in common by a plurality of components. In some implementations, the user-state component that creates the container can include a run component. Wherein, the run component can interact with an operating system of the electronic device to complete creation of the container. The hook program may be a component in a user mode, and the run component may support a hook plug-in program, that is, the run runtime may call the hook program.

Further, an operating system in the electronic device may call a hook program through a user mode component when the run creates a container, and the hook program may obtain the computational weight of the virtual GPU card by parsing the environment variable.

Further, the hook program can transmit the computing power weight of the virtual GPU card to the kernel module through a progress file system procfs; the kernel module can distribute the secondary equipment number (minor number) of the virtual card according to the calculation weight of the virtual GPU card; the hook program can also acquire the major and minor numbers of the virtual GPU card from the kernel module through the procfs, and complete the creation of the virtual GPU card through the mknod system call; the hook program may also mount the virtual GPU card into the created container and complete the creation of the container.

In some embodiments, an operating system of the electronic device may call a hook program when the container runs, and send the video memory size of the virtual GPU card to the kernel module through a process file system procfs, so that the kernel module allocates a secondary device number of the virtual GPU card according to the video memory size of the virtual GPU card.

It will be appreciated that when creating a container, the run component creates a virtual GPU card mount into the container along with the kernel module.

Step S215: creating a virtual GPU card based on the main equipment number and the secondary equipment number of the virtual GPU card, and mounting the virtual GPU card to a corresponding container on the physical GPU card.

It will be appreciated that the primary device number of the virtual GPU card on each physical GPU card is the same, but the secondary device numbers are different when the virtual GPU card is created. When the operating system of the electronic equipment interacts with the virtual GPU card, the kernel module loads a corresponding driver through the main equipment number of the virtual GPU card; the secondary device number is entered as a parameter at the time the kernel module loads the driver, and how the secondary device number is interpreted as a parameter depends on the driver itself. The corresponding document of the driver will typically write how the driver reacts to the different secondary device numbers.

Referring to fig. 5, a flow chart of container creation according to an embodiment of the present application is shown. The user state component creates a container, and the computing power weight limit of the virtual GPU of the container is set through an environment variable; when the runc creates a container, a hook program of the mGPU is invoked, and the hook program analyzes the environment variable to obtain the computational power weight of the virtual GPU card. The hook program transmits the computing power weight of the virtual GPU card to the kernel module of the mGPU through the procfs; the kernel module distributes the minor equipment number of the virtual GPU card according to the calculation weight of the virtual GPU card; the hook program obtains the major and minor numbers of the virtual GPU card from the kernel module through the procfs, and completes the creation of the virtual GPU card through the mknod system call; finally, the hook program mounts the virtual GPU card into the created container, and the creation of the container is completed.

The kernel module distributes major and minor numbers of the GPU card according to the calculation weight set by the user state component; when the user-state component creates the container, a virtual GPU card is created through the mknod system call and the major and minor equipment numbers distributed by the kernel module, and the virtual GPU card equipment is mounted in the container.

Step S220: sending a request to create a context of the container to the virtual GPU card to forward the request to the physical GPU card through the virtual GPU card.

In some embodiments, an operating system in the electronic device may control the in-container computing application cuda application initialization through the user-mode component, the cuda driver opening the virtual GPU card through a system call open function. The cuda program itself starts an initialization process, and creates a software concept context similar to the process, where the context includes information about the running of the application program, such as a memory.

The application program in the electronic equipment can drive the system to call the open function to open the virtual GPU card, and the open function can call the interface of the ioctl function. The interface of the ioctl function may refer to a file descriptor fd of the physical GPU card, and based on fd and the ioctl function, a request or an instruction on the virtual GPU card is sent to a driver of the physical GPU card corresponding to the fd.

It can be understood that the kernel module can manage the resources of the virtual GPU card, that is, the kernel module manages the resources of the virtual GPU card, and a request or an instruction sent by an operation application program in the container to call the ioctl function to the physical GPU card passes through the kernel module.

Further, the in-container cuda application creates a Context (e.g., invokes a cuctxCreate function, etc.), and the cuda driver sends a request to create a Context (ctx) to the kernel module by system calling the ioctl function. The kernel module forwards the ctx creating request to a driver of the physical GPU card, and drives the ctx creating request by the driver of the physical GPU card.

The kernel module can acquire and record the process of the running program in the container and CUDA context information through the ioctl function of the virtual GPU card, and forwards the ctx creating request to the driver of the physical GPU card so as to intercept the context information of the process through the CUDA program.

Step S230: and acquiring driving return data fed back by the physical GPU card based on the request, and extracting the context identifier of the container from the driving return data, wherein the driving return data is sent by the physical GPU card when the context of the container is successfully created by the physical GPU card based on the request.

In some embodiments, after the driver of the physical GPU card successfully drives the creation Context request forwarded by the kernel module, the operating system in the electronic device may obtain the driving return data of the physical GPU card based on the request feedback, further, the kernel module may extract the Context identifier (Context ID) of the container from the driving return data, and record the Context identifier for use of the kernel module when the GPU virtualized computing power dispatching thread runs. Wherein the drive return data is sent by the physical GPU card upon successful creation of the context of the container based on the request.

Referring to fig. 6, a flowchart of recording context identifiers according to an embodiment of the present application is shown. The method comprises the steps that a CUDA application program in a container is initialized, and a CUDA driver opens a virtual GPU card through a system call open function; the CUDA application program creates a Context (such as calling a cuCtxCreate), and the CUDA driver sends a request for creating ctx to the mGPU kernel module through system call ioctl; the kernel module forwards the request to an nvidia kernel driver (driver of the physical GPU card); the physical GPU card is based on the driving return data fed back after the Context of the container is successfully created by the request, the kernel module extracts the Context ID from the data structure of the driving return data, records the use of the kernel module when the kernel module is used for running a subsequent GPU virtualized computing power dispatching thread, namely, after the driving return is successful, the mGPU kernel magic square extracts the Context ID from the data structure of the driving return, and records the dispatching use for the subsequent kernel module.

Step S240: in the case of a graphics processor GPU virtualized power dispatch thread loop, a current container and a next container to the current container are determined from among a plurality of containers recorded on a current physical GPU card.

Step S250: the context identifier of the current container is obtained as a first context identifier, and the context identifier of the next container of the current container is obtained as a second context identifier.

For a specific description of steps S240-S250, please refer to the previous descriptions of steps S110-S120, and the detailed descriptions are omitted herein.

Step S260: if the current container and the next container of the current container meet the preset scheduling condition based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

In some embodiments, the GPU virtualizes the power dispatch thread to cycle all the time, and the operating system of the electronic device obtains the current container and the next container from all the containers on the current physical card recorded by the kernel module, and obtains the context identifier of the current container as a first context identifier, and obtains the context identifier of the next container of the current container as a second context identifier. If the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

The determining that the current container and the next container of the current container meet the preset scheduling condition may be determining that the computing power of the process of the current container occupies the whole physical GPU card with a weight greater than a preset computing power weight, and the computing power of the process of the next container of the current container occupies the whole physical GPU card with a weight less than or equal to the preset computing power weight, determining that the current container and the next container of the current container meet the preset scheduling condition, stopping the context of the process of the current container, and starting the context of the process of the next container of the current container.

In some implementations, kernel threads of the physical GPU card are used to schedule computing power for containers on the physical GPU card, which are scheduled (process context switched) according to the computing power weights and scheduling policies of the containers. The calculation scheduling of the container can be to control the movement and stop of the process context in the container. The logic of the context of the process running in the container can be calculated on the physical GPU card, so that the computing power resource of the physical GPU card occupied by the running process is obtained. And for different physical GPU cards, special registers of the physical GPU cards can be operated to control the starting or stopping of the process context of the container corresponding to the physical GPU cards, so that the dispatching of the computing power of the container is realized.

In some embodiments, step S260 may include step S261-step S263.

Step S261: and determining the time slice count of the current container in the current period according to the first context identifier.

Where time slices (timeslices), also referred to as "quanta" or "processor slices", are a microscopic period of CPU time (in the preemptive kernel: the time from the start of running of a process until preempted) allocated by the timeshared operating system to each running process. The time slices are assigned to each process by the scheduler of the operating system kernel. First, the kernel assigns an equal initial time slice to each process, and then each process executes the corresponding time in turn, and when all processes are in a state where the time slices are exhausted, the kernel recalculates and assigns a time slice to each process, and so on.

In some implementations, the operating system of the electronic device can determine a time slice count for the current container at the current period based on a first Context identification (Context ID) of the current container. The time slice count of the current period of the current container is a variable in the running process of the GPU virtualization computing power scheduling thread, and the number of the time slices currently used by each container can be recorded. If the physical GPU card has 100 time slices in one period, then the physical GPU card comprises two containers, each container is allocated with 50 time slices in one period, and the time slices in each container in each period are counted in the running process of the GPU virtualized computing power dispatching thread; and then, scheduling the calculation force of the container occupying the physical GPU card according to the time slice count of the current container in the current period.

Step S262: and determining the schedulable condition of the context of the next container of the current container according to the second context identifier, wherein the schedulable condition of the context comprises the existence of the schedulable context or the absence of the schedulable context.

In some implementations, the operating system of the electronic device can determine the schedulable instance of the Context of the next container of the current container based on a second Context identification (Context ID) of the next container of the current container. Among other things, the schedulable case of a context includes the presence of a schedulable context (context) or the absence of a schedulable context (context). Wherein there is a schedulable context that the computational power of the container is less than the computational power weight limit; there is a schedulable context in which the computational power of the container is greater than or equal to the computational power weight limit.

Step S263: if the time slice count reaches the preset weight duty ratio and the next container of the current container has a schedulable context, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

In some embodiments, to avoid that multiple processes in one container occupy most of the computing power of the physical GPU card, resulting in low computing power duty ratio of other containers, the operating system of the electronic device may stop the context of the process of the current container and start the context of the process of the next container of the current container when the time slice count of the current period of the current container is greater than the preset weight duty ratio and the next container of the current container has a schedulable context, and multiplex the computing power of the same physical GPU card with multiple containers, thereby improving the utilization rate of the physical GPU card.

The electronic device may be preset with a preset weight, where the preset weight may be obtained through third-party experimental data, or may be set by a user independently, which is not limited herein.

In some embodiments, step S263 may include steps S2631-S2632.

Step S2631: resetting the time slice count of the current container and taking the next container of the current container as the current container.

In some embodiments, when the time slice count of the current period of the current container reaches the preset weight ratio and the next container of the current container has a schedulable context, the operating system of the electronic device stops the context of the process of the current container, after starting the context of the process of the next container of the current container, resets the time slice count of the current container and takes the next container of the current container as the current container, so as to ensure that available time slices of all containers are greater than or equal to the preset weight, limit the calculation force weight of the containers, avoid that a plurality of processes in one container may occupy most calculation force, resulting in low calculation force ratio of other containers, and improve the utilization rate of the physical GPU card.

Step S2632: waiting for a next time slice and returning to execute the steps from the recorded steps of the current container and the next container of the current container to the step of stopping the process of the current container and starting the process of the next container of the current container until the end of the GPU virtualization computing power dispatching thread.

In some embodiments, the electronic device stops the context of the process of the current container, starts the context of the process of the next container of the current container, resets the time slice count of the current container, takes the next container of the current container as the current container, may wait for the next time slice to count the time slices of the current cycle for the new current container, loops executing the GPU virtualized power dispatching thread, and stops executing the container power dispatching method of the present embodiment when the GPU virtualized power dispatching thread ends.

The end of the GPU virtualized power scheduling thread may be that the kernel module is unloaded, a user state component is deleted, or a physical GPU card is damaged, which is not limited herein.

In some embodiments, the method of container power dispatch may further include step S270.

Step S270: and if the current container and the next container of the current container are determined not to meet the preset scheduling condition based on the first context identifier and the second context identifier, continuing to run the context of the process of the current container.

In some embodiments, based on the first context identifier and the second context identifier, it is determined that the current container and the next container of the current container do not satisfy the preset scheduling condition, which may be that a time slice count of a current period of the current container is less than or equal to a preset weight duty cycle, or that no schedulable context exists in the next container of the current container, and it is determined that the current container and the next container of the current container do not satisfy the preset scheduling condition. Further, when the current container and the next container of the current container are determined to not meet the preset scheduling condition based on the first context identifier and the second context identifier, the context of the process of the current container is continued to be operated.

It can be understood that, based on the first context identifier and the second context identifier, determining that the current container and the next container of the current container do not meet the preset scheduling condition can be used as a judging basis for that a plurality of processes in one container occupy most of calculation forces, resulting in low calculation force ratio of other containers, that is, based on the first context identifier and the second context identifier, determining that the current container and the next container of the current container do not meet the preset scheduling condition, determining that a plurality of processes in one container do not occupy most of calculation forces on the physical GPU card, resulting in low calculation force ratio of other containers; based on the first context identifier and the second context identifier, if it is determined that the current container and the next container of the current container do not meet the preset scheduling condition, it is determined that a plurality of processes in the current container on the physical GPU card occupy most of computing power, and the computing power of other containers is low.

Referring to fig. 7, a flowchart of a GPU virtualized power scheduling thread according to an embodiment of the present application is shown. Wherein, the GPU virtualizes the main logic of the power dispatch thread: the GPU virtualization power dispatch thread loops until stopped, such as kernel module offloading. In the GPU virtualized computing power dispatching thread circulation process, obtaining a current container and a next container from all containers on a current physical card recorded by a kernel module; judging whether the time slice count of the current period of the current container reaches a preset weight ratio or not and whether a next container has a schedulable context or not according to a scheduling strategy, and determining whether the next container is scheduled to run or not; if the time slice count of the current period of the current container reaches the preset weight ratio and the next container has schedulable context, determining to schedule the next container to run, stopping all contexts of all processes of the current container, starting all contexts of all processes of the next container, and resetting the time slice count of the current period; waiting for the next time slice. If the next container is not required to be scheduled for running, the step of waiting for the next time slice is skipped.

Compared with the method for dispatching the container computing power shown in fig. 2, the method provided by the embodiment of the invention also initializes the computing application program in the container on the physical GPU card and opens the virtual GPU card mounted to the container before determining the current container and the next container of the current container from the plurality of containers on the current physical GPU card under the condition that the graphics processor GPU virtualizes the computing power dispatching thread to circulate; sending a request for creating the context of the container to the virtual GPU card so as to forward the request to the physical GPU card through the virtual GPU card; and acquiring the driving return data fed back by the physical GPU card based on the request, and extracting the context identifier of the container from the driving return data, wherein the driving return data is sent by the physical GPU card when the context of the container is successfully created by the physical GPU card based on the request. The computing power weight of the container can be limited by virtualizing the GPU in a kernel mode, so that the computing power of the container is limited when a plurality of containers share the same GPU card, the utilization rate of the physical GPU card is improved, and the rationality of computing power distribution of the physical GPU card is improved.

Referring to fig. 8, fig. 8 is a block diagram illustrating an apparatus for container computing power dispatching according to an embodiment of the present application. The apparatus 200 for container force calculation scheduling is applied to the above electronic device, and will be described in detail below with respect to the flow shown in fig. 8, where the apparatus 200 for container force calculation scheduling includes: a current container determination module 210, a context identification acquisition module 220, and a context switch module 230, wherein:

The current container determining module 210 is configured to determine, in a case where the graphics processor GPU virtualizes the power scheduling thread loop, a current container and a container next to the current container from the plurality of containers on the recorded physical GPU card.

A context identifier obtaining module 220, configured to obtain a context identifier of the current container as a first context identifier, and obtain a context identifier of a next container of the current container as a second context identifier.

And a context switching module 230, configured to stop a context of a process of the current container and start a context of a process of a next container of the current container if it is determined that the current container and the next container of the current container satisfy a preset scheduling condition based on the first context identifier and the second context identifier.

Further, the context switch module 230 includes: the device comprises a time slice counting module, a schedulable determining module and a context switching sub-module, wherein:

and the time slice counting module is used for determining the time slice count of the current container in the current period according to the first context identifier.

And the schedulable determining module is used for determining the schedulable condition of the context of the next container of the current container according to the second context identifier, wherein the schedulable condition of the context comprises the existence of the schedulable context or the absence of the schedulable context.

And the context switching sub-module is used for stopping the context of the process of the current container and starting the context of the process of the next container of the current container if the time slice count reaches the preset weight ratio and the next container of the current container has the schedulable context.

Further, the context switch submodule includes: a time slice resetting unit and a thread ending unit, wherein:

and the time slice resetting unit is used for resetting the time slice count of the current container and taking the next container of the current container as the current container.

And the thread ending unit is used for waiting for the next time slice and returning to execute the steps from the step of determining the current container and the next container of the current container to the step of stopping the process of the current container and starting the process of the next container of the current container until the GPU virtualization computing power dispatching thread is ended.

Further, the apparatus 200 for calculating force of the container further comprises: a power holding unit, wherein:

and the power maintaining unit is used for continuing to run the context of the process of the current container if the current container and the next container of the current container are determined not to meet the preset scheduling condition based on the first context identifier and the second context identifier.

Further, the apparatus 200 for calculating force of the container further comprises: program initialization module, request sending module and context identification acquisition module, wherein:

and the program initialization module is used for initializing an operation application program in a container on the physical GPU card and opening a virtual GPU card mounted on the container.

And the request sending module is used for sending a request for creating the context of the container to the virtual GPU card so as to forward the request to the physical GPU card through the virtual GPU card.

The context identification acquisition module is used for acquiring the driving return data fed back by the physical GPU card based on the request and extracting the context identification of the container from the driving return data, wherein the driving return data is sent by the physical GPU card when the context of the container is successfully created by the physical GPU card based on the request.

Further, the program initialization module includes: the system comprises a kernel module loading module, a main equipment number recording module, a calculation weight determining module, a secondary equipment number obtaining module and a virtual card mounting module, wherein:

the kernel module loading module is used for loading the kernel module virtualized by the GPU.

And the master equipment number recording module is used for applying for and recording the master equipment number of the current physical GPU card virtualized.

And the computing power weight determining module is used for setting the computing power weight of the virtual GPU card corresponding to the container on the physical GPU card based on the environment variable.

The secondary equipment number acquisition module is used for calling a hook program and sending the computing power weight of the virtual GPU card to the kernel module through the process file system so that the kernel module distributes the secondary equipment number of the virtual GPU card according to the computing power weight of the virtual GPU card.

And the virtual card mounting module is used for creating a virtual GPU card based on the main equipment number and the secondary equipment number of the virtual GPU card, and mounting the virtual GPU card to a corresponding container on the physical GPU card.

Further, the kernel module loading module includes: a data structure creation module, wherein:

the system comprises a data structure creation module, a kernel module, a GPU virtualization management module and a power management module, wherein the data structure creation module is used for loading a kernel module of the GPU virtualization, creating a data structure of the GPU virtualization and a power management thread of the GPU virtualization, wherein the data structure is used for managing resources of a virtual GPU card, and the power management thread of the GPU virtualization is used for carrying out power management on a container on the physical GPU card.

Further, before recording the master device number of the virtualized physical GPU card, the master device number recording module includes: a file descriptor recording module, wherein:

And the file descriptor recording module is used for opening the physical GPU card and recording the file descriptor of the physical GPU card, wherein the file descriptor is used for identifying the physical GPU card.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided herein, the coupling of the modules to each other may be electrical, mechanical, or other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Referring to fig. 9, a block diagram of an electronic device 100 according to an embodiment of the present application is shown. The electronic device 100 may be a smart phone, a tablet computer, an electronic book, or the like capable of running an application program. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, wherein the one or more application programs may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more program(s) configured to perform the method as described in the foregoing method embodiments.

Wherein the processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device 100, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing the content to be displayed; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The Memory 120 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the electronic device 100 in use (e.g., phonebook, audiovisual data, chat log data), and the like.

Referring to fig. 10, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 300 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 300 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 300 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 300 has storage space for program code 310 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 310 may be compressed, for example, in a suitable form.

In summary, the method, the device, the electronic device and the storage medium for container computing power scheduling provided in the embodiments of the present application determine a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card under the condition that a graphics processor GPU virtualizes a computing power scheduling thread to circulate; acquiring a context identifier of a current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier; if the current container and the next container of the current container meet the preset scheduling conditions based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container. According to the method and the device, through a virtualization technology, a plurality of virtual GPU cards are created to flexibly segment the calculation power of the whole physical GPU card, so that a plurality of containers multiplex the calculation power of the same GPU card, and the utilization rate of the GPU is improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of container power dispatch, the method comprising:

determining a current container and a next container of the current container from a plurality of containers on a recorded physical GPU card under the condition that a graphic processor GPU virtualizes a computing power dispatching thread cycle;

acquiring a context identifier of the current container as a first context identifier, and acquiring a context identifier of a next container of the current container as a second context identifier;

if the current container and the next container of the current container meet the preset scheduling condition based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

2. The method according to claim 1, wherein if it is determined that the current container and the next container of the current container satisfy a preset scheduling condition based on the first context identifier and the second context identifier, stopping the context of the process of the current container and starting the context of the process of the next container of the current container, comprises:

Determining a time slice count of the current container in the current period according to the first context identifier;

determining a schedulable condition of a context of a next container of the current container according to the second context identifier, wherein the schedulable condition of the context comprises the presence of the schedulable context or the absence of the schedulable context;

if the time slice count reaches the preset weight duty ratio and the next container of the current container has a schedulable context, stopping the context of the process of the current container and starting the context of the process of the next container of the current container.

3. The method of claim 2, wherein after the stopping the context of the process of the first container and starting the context of the process of the second container, the method further comprises:

resetting the time slice count of the current container and taking the next container of the current container as the current container;

waiting for a next time slice and returning to execute the steps from the steps of determining the current container and the next container of the current container to the step of stopping the process of the current container, and starting the process of the next container of the current container until the GPU virtualization computing power dispatching thread is finished.

4. The method according to claim 1, wherein the method further comprises:

and if the current container and the next container of the current container are determined not to meet the preset scheduling condition based on the first context identifier and the second context identifier, continuing to run the context of the process of the current container.

5. The method of any of claims 1-4, wherein in the event that the graphics processor GPU virtualizes a power dispatch thread loop, prior to determining a current container and a container next to the current container from among the plurality of containers on the current physical GPU card that are recorded, the method further comprises:

initializing an operation application program in a container on a physical GPU card, and opening a virtual GPU card mounted to the container;

sending a request to create a context of the container to the virtual GPU card to forward the request to the physical GPU card through the virtual GPU card;

and acquiring driving return data fed back by the physical GPU card based on the request, and extracting the context identifier of the container from the driving return data, wherein the driving return data is sent by the physical GPU card when the context of the container is successfully created by the physical GPU card based on the request.

6. The method of claim 5, wherein prior to initializing an computing application in a container on a physical GPU card, opening a virtual GPU card mounted to the container, the method further comprises:

loading a kernel module virtualized by the GPU;

applying for and recording a main equipment number for virtualizing the physical GPU card;

setting the computing power weight of a virtual GPU card corresponding to a container on the physical GPU card based on an environment variable;

calling a hook program, and sending the computing power weight of the virtual GPU card to the kernel module through a process file system so that the kernel module distributes the secondary equipment number of the virtual GPU card according to the computing power weight of the virtual GPU card;

creating a virtual GPU card based on the main equipment number and the secondary equipment number of the virtual GPU card, and mounting the virtual GPU card to a corresponding container on the physical GPU card.

7. The method of claim 6, wherein loading the GPU virtualized kernel module comprises:

loading a kernel module of GPU virtualization, creating a data structure of GPU virtualization and a GPU virtualization computing power dispatching thread, wherein the data structure is used for managing resources of a virtual GPU card, and the virtualization computing power dispatching thread is used for computing power dispatching of a container on the physical GPU card.

8. The method of claim 6, wherein prior to the recording virtualizing the master device number of the physical GPU card, the method further comprises:

and opening the physical GPU card, and recording a file descriptor of the physical GPU card, wherein the file descriptor is used for identifying the physical GPU card.

9. An apparatus for power dispatch of a container, the apparatus comprising:

the current container determining module is used for determining a current container and a next container of the current container from a plurality of containers on the recorded physical GPU card under the condition that the GPU virtualizes a computing power dispatching thread to circulate;

a context identifier obtaining module, configured to obtain a context identifier of the current container as a first context identifier, and obtain a context identifier of a next container of the current container as a second context identifier;

and the context switching module is used for stopping the context of the process of the current container and starting the context of the process of the next container of the current container if the current container and the next container of the current container are determined to meet the preset scheduling condition based on the first context identifier and the second context identifier.

10. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-8.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-8.