CN111223036A

CN111223036A - GPU virtualization sharing method and device, electronic equipment and storage medium

Info

Publication number: CN111223036A
Application number: CN201911386438.7A
Authority: CN
Inventors: 刘云飞
Original assignee: Guangdong Inspur Big Data Research Co Ltd
Current assignee: Guangdong Inspur Smart Computing Technology Co Ltd
Priority date: 2019-12-29
Filing date: 2019-12-29
Publication date: 2020-06-02
Anticipated expiration: 2039-12-29
Also published as: CN111223036B

Abstract

The application discloses a GPU virtualization sharing method and device, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: determining a target physical GPU, and dividing the target physical GPU into a plurality of virtual GPUs; when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU; in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijack library; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU. According to the GPU virtualization sharing method, one physical GPU resource is divided into a plurality of virtual GPU resources through a hijack technology and an isolation technology, and therefore multi-user multi-task sharing of the GPU is achieved.

Description

GPU virtualization sharing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for GPU virtualization sharing, an electronic device, and a computer-readable storage medium.

Background

At present, with the technical progress of deep learning, the rapid development of artificial intelligence is greatly promoted, and both deep learning training and reasoning greatly depend on a GPU (Chinese full name: graphics processor, English full name: graphics processing Unit). Some deep learning models are small and cannot enable the GPU to run at full load, and the current mainstream framework executes deep learning tasks in a mode of monopolizing the GPU, which inevitably causes waste of GPU resources.

Therefore, how to implement the sharing of the GPU is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a GPU virtualization sharing method and device, an electronic device and a computer readable storage medium, and GPU sharing is achieved.

In order to achieve the above object, the present application provides a GPU virtualization sharing method, including:

determining a target physical GPU, and dividing the target physical GPU into a plurality of virtual GPUs;

when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU;

in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijack library; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU.

Wherein, still include:

and determining a file path of the hijack library, and setting an LD _ PRELOAD environment variable as the file path.

Redirecting the call request to a hijack library comprises the following steps:

loading the hijacking library through a loader based on the LD-PRELOAD environment variable, and searching a target function corresponding to the calling request in the hijacking library;

and executing the target function based on the information of the target virtual GPU to obtain a response result of the calling request.

Wherein the dividing the target physical GPU into a plurality of virtual GPUs comprises:

determining a ratio of the target physical GPU to the virtual GPUs, and dividing the target physical GPU into a plurality of virtual GPUs based on the ratio.

To achieve the above object, the present application provides a GPU virtualization sharing device, including:

the system comprises a dividing module, a processing module and a display module, wherein the dividing module is used for determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs;

the selection module is used for selecting a target virtual GPU from the target physical GPU when a target task is received so as to execute the target task by using the target virtual GPU;

the redirection module is used for redirecting the call request to a hijack library when receiving the call request of the CUDA function in the process of executing the target task; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU.

Wherein, still include:

and the setting module is used for determining a file path of the hijack library and setting the LD _ PRELOAD environment variable as the file path.

Wherein the redirection module comprises:

a searching unit, configured to, when a call request for a CUDA function is received in a process of executing the target task, load the hijacking library through a loader based on the LD _ PRELOAD environment variable, and search a target function corresponding to the call request in the hijacking library;

and the execution unit is used for executing the target function based on the information of the target virtual GPU to obtain a response result of the calling request.

The dividing module is specifically a module for determining a target physical GPU, determining the proportion of the target physical GPU to the virtual GPU, and dividing the target physical GPU into a plurality of virtual GPUs based on the proportion.

To achieve the above object, the present application provides an electronic device including:

a memory for storing a computer program;

and the processor is used for realizing the steps of the GPU virtualization sharing method when the computer program is executed.

To achieve the above object, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the GPU virtualization sharing method as described above.

According to the scheme, the GPU virtualization sharing method comprises the following steps: determining a target physical GPU, and dividing the target physical GPU into a plurality of virtual GPUs; when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU; in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijack library; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU.

According to the GPU virtualization sharing method, through a hijacking technology, call of a user to a CUDA (computer Unified Device Architecture) function is redirected to call of a hijacking library function. Through an isolation technology, one physical GPU resource is divided into a plurality of virtual GPU resources, the virtual GPUs are isolated from one another, and a user can use the virtual GPU as a complete physical GPU, so that multi-user multi-task sharing of the GPU is achieved. The application also discloses a GPU virtualization sharing device, electronic equipment and a computer readable storage medium, and the technical effects can be achieved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram illustrating a method for GPU virtualization sharing in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating another method for GPU virtualization sharing in accordance with an illustrative embodiment;

FIG. 3 is a flowchart of an embodiment of an application provided in the present application;

FIG. 4 is a block diagram illustrating a GPU virtualization sharing device in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application discloses a GPU virtualization sharing method, which realizes GPU sharing.

Referring to fig. 1, a flowchart of a GPU virtualization sharing method according to an exemplary embodiment is shown, as shown in fig. 1, including:

s101: determining a target physical GPU, and dividing the target physical GPU into a plurality of virtual GPUs;

in this step, the target physical GPU is first divided into a plurality of virtual GPUs, each virtual GPU is isolated from each other, and a user can use each virtual GPU as a complete physical GPU. It can be understood that, the most important of the GPUs is the memory resource, the GPU virtualization is necessary to separate the memory resource for different virtual GPUs, and different virtual GPUs cannot use the memory resource of other virtual GPUs. All GPU memories need to be managed through the CUDA function, in this embodiment, the memory of the target physical GPU is divided into a plurality of parts by redefining the CUDA function in the hijack library, and each virtual GPU obtains one part to provide a service for one user.

It should be noted that, in this embodiment, a specific dividing manner is not limited, and preferably, the step of dividing the target physical GPU into a plurality of virtual GPUs includes: determining a ratio of the target physical GPU to the virtual GPUs, and dividing the target physical GPU into a plurality of virtual GPUs based on the ratio. In a specific implementation, the ratio of the target physical GPU and the virtual GPU may be determined based on the memory size of the target physical GPU, for example, if the ratio of the target physical GPU and the virtual GPU is 1:5, the target physical GPU is divided into 5 virtual GPUs, the memory size of each virtual GPU is not limited here, and the memory sizes of the virtual GPUs may be the same or different.

S102: when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU;

in this step, when a target task is received, a target virtual GPU is selected from the target physical GPU, and the target task is implemented using resources of the target virtual GPU.

S103: in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijack library; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU.

It can be understood that in all deep learning software frameworks, parallel computation is implemented on the GPU through the CUDA toolkit. Taking Tensorflow as an example, Tensorflow calls a number of functions in the CUDA toolkit in source code. If the CUDA toolkit is hijacked, the requests of the Tensorflow to the GPU and the feedback of the GPU to the Tensorflow can be completely taken over at the bottom layer, so that data can be disguised and modified, and the virtualization of the GPU is realized.

During the starting process of any task process, the dynamic link library is loaded through the loader. The loader will look up the required dynamic link library in the file system based on the environment variables and the system settings. The function call to CUDA can be redirected to the hijack library by the LD _ PRELOAD, an environment variable. That is, the present embodiment further includes: and determining a file path of the hijack library, and setting an LD _ PRELOAD environment variable as the file path.

The loader acquires the LD _ PRELOAD environment variable, and loads the dynamic link library specified by the LD _ PRELOAD environment variable before loading all the dynamic link libraries, namely the hijacking library. When the function is called, if a plurality of dynamic link libraries are stored in the same-name function, the task process uses the dynamic link library loaded first, namely the hijack library in the step. In the hijack library, a large number of functions of the same name as the CUDA function are implemented, for example, a cuMemALLo function (allocating device memory), a cuMemALLoManaged function (allocating memory to be automatically managed by a unified memory system), a cuMemALLoPitch (allocating tilted device memory), a cumdevicetotalmem (returning the total amount of memory on the device), a cuMemGetInfo (acquiring available memory and total memory), and the like. Therefore, the CUDA toolkit can be hijacked, and the call of the user to the CUDA function is redirected to the call to the hijacked library function. Redirecting the call request to a hijacking library comprises the following steps: loading the hijacking library through a loader based on the LD-PRELOAD environment variable, and searching a target function corresponding to the calling request in the hijacking library; and executing the target function based on the information of the target virtual GPU to obtain a response result of the calling request.

For example, when the target task process calls the cumemGetInfo function to obtain the memory size, the memory size of the virtual GPU is returned instead of the memory size of the physical GPU. For another example, when the target task process calls the cuMemAlloc function, if the memory size of the virtual GPU is exceeded, an error is returned.

According to the GPU virtualization sharing method, the call of the CUDA function by the user is redirected to the call of the hijack library function through the hijack technology. Through an isolation technology, one physical GPU resource is divided into a plurality of virtual GPU resources, the virtual GPUs are isolated from one another, and a user can use the virtual GPU as a complete physical GPU, so that multi-user multi-task sharing of the GPU is achieved.

The embodiment of the application discloses a GPU virtualization sharing method, and compared with the previous embodiment, the embodiment further describes and optimizes the technical scheme. Specifically, the method comprises the following steps:

referring to fig. 2, a flowchart of another GPU virtualization sharing method according to an exemplary embodiment is shown, as shown in fig. 2, including:

s201: determining a file path of the hijack library, and setting an LD-PRELOAD environment variable as the file path;

s202: determining a target physical GPU, and dividing the target physical GPU into a plurality of virtual GPUs;

s203: when a target task is received, determining the proportion of the target physical GPU to the virtual GPU, and dividing the target physical GPU into a plurality of virtual GPUs based on the proportion.

S204: in the process of executing the target task, when a call request for a CUDA function is received, loading the hijack library through a loader based on the LD _ PRELOAD environment variable, and searching a target function corresponding to the call request in the hijack library;

s205: and executing the target function based on the information of the target virtual GPU to obtain a response result of the calling request.

An embodiment of the application provided by the present application is described below, and as shown in fig. 3, the application may include the following steps:

the method comprises the following steps: configuring an LD-PRELOAD environment variable in a system to enable the LD-PRELOAD environment variable to point to a file path of a hijacking library;

step two: configuring the proportion of a physical GPU and a virtual GPU, and determining that one physical GPU is shared by several users;

step three: and starting a deep learning task through software frameworks such as Tensorflow and configuring the virtual GPU to the corresponding task.

Step four: after the Tensorflow is started, calling of the CUDA function is hijacked by us, and the virtual GPU provided by us is used for executing tasks.

In the following, a GPU virtualization sharing device provided by the embodiments of the present application is introduced, and a GPU virtualization sharing device described below and a GPU virtualization sharing method described above may be referred to each other.

Referring to fig. 4, a block diagram of a GPU virtualization sharing device according to an exemplary embodiment is shown, as shown in fig. 4, including:

a dividing module 401, configured to determine a target physical GPU and divide the target physical GPU into a plurality of virtual GPUs;

a selecting module 402, configured to select a target virtual GPU from the target physical GPU when a target task is received, so as to execute the target task by using the target virtual GPU;

a redirection module 403, configured to, in the process of executing the target task, redirect, when receiving a call request for a CUDA function, the call request to a hijack library; the hijack library comprises a plurality of renaming functions corresponding to the CUDA functions, and the return value of each renaming function is determined based on the information of the target virtual GPU.

The GPU virtualization sharing device provided by the embodiment redirects the call of the CUDA function by the user into the call of the hijack library function through the hijack technology. Through an isolation technology, one physical GPU resource is divided into a plurality of virtual GPU resources, the virtual GPUs are isolated from one another, and a user can use the virtual GPU as a complete physical GPU, so that multi-user multi-task sharing of the GPU is achieved.

On the basis of the above embodiment, as a preferred implementation, the method further includes:

On the basis of the foregoing embodiment, as a preferred implementation, the redirection module 402 includes:

On the basis of the foregoing embodiments, as a preferred implementation manner, the dividing module 401 is specifically a module that determines a target physical GPU, determines a ratio between the target physical GPU and the virtual GPU, and divides the target physical GPU into a plurality of virtual GPUs based on the ratio.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present application further provides an electronic device, and referring to fig. 5, a structure diagram of an electronic device 500 provided in an embodiment of the present application may include a processor 11 and a memory 12, as shown in fig. 5. The electronic device 500 may also include one or more of a multimedia component 13, an input/output (I/O) interface 14, and a communication component 15.

The processor 11 is configured to control the overall operation of the electronic device 500, so as to complete all or part of the steps in the GPU virtualization sharing method. The memory 12 is used to store various types of data to support operation at the electronic device 500, such as instructions for any application or method operating on the electronic device 500, and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 13 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 12 or transmitted via the communication component 15. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 14 provides an interface between the processor 11 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 15 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding communication component 15 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components, for performing the GPU virtualization sharing method described above.

In another exemplary embodiment, a computer readable storage medium is also provided, which includes program instructions, which when executed by a processor, implement the steps of the above-described GPU virtualization sharing method. For example, the computer readable storage medium may be the memory 12 described above including program instructions that are executable by the processor 11 of the electronic device 500 to perform the GPU virtualization sharing method described above.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A GPU virtualization sharing method is characterized by comprising the following steps:

2. The GPU virtualization sharing method of claim 1, further comprising:

3. The GPU virtualization sharing method of claim 2, wherein redirecting the invocation request to a hijacking library comprises:

4. A GPU virtualization sharing method as claimed in claim 1, wherein the dividing of the target physical GPU into a plurality of virtual GPUs comprises:

5. A GPU virtualization sharing device, comprising:

6. A GPU virtualization sharing device according to claim 5, further comprising:

7. The GPU virtualization sharing device of claim 6, wherein the redirection module comprises:

8. The GPU virtualization sharing device of claim 5, wherein the partitioning module is specifically a module that determines a target physical GPU, determines a ratio of the target physical GPU to the virtual GPU, and partitions the target physical GPU into a plurality of virtual GPUs based on the ratio.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the GPU virtualization sharing method of any of claims 1 to 4 when executing said computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of the GPU virtualization sharing method of any of claims 1 to 4.