CN111223036B

CN111223036B - GPU (graphics processing unit) virtualization sharing method and device, electronic equipment and storage medium

Info

Publication number: CN111223036B
Application number: CN201911386438.7A
Authority: CN
Inventors: 刘云飞
Original assignee: Guangdong Inspur Big Data Research Co Ltd
Current assignee: Guangdong Inspur Smart Computing Technology Co Ltd
Priority date: 2019-12-29
Filing date: 2019-12-29
Publication date: 2023-11-03
Anticipated expiration: 2039-12-29
Also published as: CN111223036A

Abstract

The application discloses a GPU virtualization sharing method and device, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs; when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU; in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU. According to the GPU virtualization sharing method provided by the application, one physical GPU resource is separated into a plurality of virtual GPU resources through the hijacking technology and the isolation technology, so that multi-user and multi-task sharing of the GPU is realized.

Description

GPU (graphics processing unit) virtualization sharing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and more particularly, to a GPU virtualization sharing method and apparatus, and an electronic device and a computer readable storage medium.

Background

With the technical progress of deep learning, the rapid development of artificial intelligence is greatly promoted, and the training and reasoning of the deep learning are greatly dependent on GPU (Chinese full name: graphic processor, english full name: graphics Processing Unit). Some deep learning models are smaller, so that the GPU cannot run at full load, and the current main stream framework executes the deep learning task in a mode of monopolizing the GPU, which is likely to cause waste of GPU resources.

Therefore, how to implement sharing of GPUs is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a GPU (graphics processing unit) virtualization sharing method and device, electronic equipment and a computer readable storage medium, and GPU sharing is achieved.

In order to achieve the above object, the present application provides a GPU virtualization sharing method, including:

determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs;

when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU;

in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU.

Wherein, still include:

and determining a file path of the hijacked library, and setting an LD_PRELOAD environment variable as the file path.

Wherein redirecting the call request to a hijacking library comprises:

loading the hijacking library through a loader based on the LD_PRELOAD environment variable, and searching an objective function corresponding to the call request in the hijacking library;

and executing the target function based on the information of the target virtual GPU to obtain a response result of the call request.

The dividing the target physical GPU into a plurality of virtual GPUs includes:

determining a ratio of the target physical GPU to the virtual GPU and dividing the target physical GPU into a plurality of virtual GPUs based on the ratio.

To achieve the above object, the present application provides a GPU virtualization sharing device, including:

the dividing module is used for determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs;

the selecting module is used for selecting a target virtual GPU from the target physical GPU when a target task is received, so that the target task is executed by the target virtual GPU;

the redirection module is used for redirecting the call request to the hijacking library when receiving the call request to the CUDA function in the process of executing the target task; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU.

Wherein, still include:

and the setting module is used for determining the file path of the hijacked library and setting the LD_PRELOAD environment variable as the file path.

Wherein the redirection module comprises:

the searching unit is used for loading the hijacking library through a loader based on the LD_PRELOAD environment variable when receiving a call request for the CUDA function in the process of executing the target task, and searching the target function corresponding to the call request in the hijacking library;

and the execution unit is used for executing the target function based on the information of the target virtual GPU to obtain a response result of the call request.

The dividing module is specifically a module for determining a target physical GPU, determining a proportion of the target physical GPU to the virtual GPU, and dividing the target physical GPU into a plurality of virtual GPUs based on the proportion.

To achieve the above object, the present application provides an electronic device including:

a memory for storing a computer program;

and the processor is used for realizing the steps of the GPU virtualization sharing method when executing the computer program.

To achieve the above object, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the GPU virtualization sharing method as described above.

According to the scheme, the GPU virtualization sharing method provided by the application comprises the following steps: determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs; when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU; in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU.

According to the GPU virtualization sharing method provided by the application, through the hijacking technology, the call of a user to a CUDA (Chinese full name: unified computing device architecture, english full name: compute Unified Device Architecture) function is redirected into the call of the hijacked library function. Through the isolation technology, one physical GPU resource is separated into a plurality of virtual GPU resources, each virtual GPU is isolated from each other, and a user can use the virtual GPU as a complete physical GPU, so that multi-user and multi-task sharing of the GPU is realized. The application also discloses a GPU virtualization sharing device, electronic equipment and a computer readable storage medium, and the technical effects can be realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:

FIG. 1 is a flowchart illustrating a method for GPU virtualization sharing, according to an example embodiment;

FIG. 2 is a flowchart illustrating another method of GPU virtualization sharing, according to an example embodiment;

FIG. 3 is a flow chart of an embodiment of an application provided by the present application;

FIG. 4 is a block diagram of a GPU virtualization sharing device, according to an example embodiment;

fig. 5 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application discloses a GPU virtualization sharing method, which realizes the sharing of GPUs.

Referring to fig. 1, a flowchart of a GPU virtualization sharing method is shown according to an exemplary embodiment, as shown in fig. 1, including:

s101: determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs;

the objective of the present embodiment is to achieve sharing of a target physical GPU, in this step, the target physical GPU is first divided into multiple virtual GPUs, each virtual GPU is isolated from each other, and a user can use each virtual GPU as a complete physical GPU. It will be appreciated that, most important for a GPU is memory resources, and GPU virtualization necessarily isolates memory resources separately for use by different virtual GPUs, and that different virtual GPUs cannot use memory resources of other virtual GPUs. All GPU memories are managed through CUDA functions, and in the embodiment, the CUDA functions are redefined in a hijacking library, so that the memory of the target physical GPU is divided into a plurality of parts, and each virtual GPU obtains a part to provide service for one user.

It should be noted that, the specific partitioning manner is not limited in this embodiment, and preferably, the step of partitioning the target physical GPU into a plurality of virtual GPUs includes: determining a ratio of the target physical GPU to the virtual GPU and dividing the target physical GPU into a plurality of virtual GPUs based on the ratio. In a specific implementation, the ratio of the target physical GPU to the virtual GPU may be determined based on the memory size of the target physical GPU, for example, if the ratio of the target physical GPU to the virtual GPU is 1:5, the target physical GPU is divided into 5 virtual GPUs, where the memory size of each virtual GPU is not limited, and the memory sizes of each virtual GPU may be the same or different.

S102: when a target task is received, selecting a target virtual GPU from the target physical GPU so as to execute the target task by using the target virtual GPU;

in this step, when a target task is received, a target virtual GPU is selected from the target physical GPU, and the target task is implemented by using the resources of the target virtual GPU.

S103: in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU.

It will be appreciated that in all deep learning software frameworks, parallel computation is implemented on the GPU via the CUDA toolkit. Taking Tensorflow as an example, tensorflow calls a large number of functions in CUDA toolkit in source code. If the CUDA toolkit is hijacked, the request of the Tensorflow to the GPU and the feedback of the GPU to the Tensorflow can be comprehensively taken over at the bottom layer, so that the data are camouflaged and modified, and the virtualization of the GPU is realized.

During the starting process of any task process, the dynamic link library is loaded through the loader. The loader searches the file system for the required dynamic link library according to the environment variables and the system settings. Function calls to CUDA may be redirected into the hijacking library by ld_reload, an environment variable. Namely, the present embodiment further includes: and determining a file path of the hijacked library, and setting an LD_PRELOAD environment variable as the file path.

The loader acquires the ld_reload environment variable, and loads the dynamic link library specified by the ld_reload environment variable, i.e., the hijacked library, before loading all the dynamic link libraries. When the function is called, if a plurality of dynamic link libraries are stored in the same name function, the task process can use the dynamic link library loaded first, namely the hijacking library in the step. In the hijack library, a large number of functions with the same name as the CUDA function are implemented, such as a cuMemALLo function (allocating device memory), a cuMemALLo managed function (allocating memory to be automatically managed by the unified memory system), a cuMemALLoPitch (allocating tilting device memory), a cumevicetotal mem (total amount of memory on the return device), a cumemagmeinfo (acquiring available memory and total memory), and the like. Therefore, the hijacking of the CUDA toolkit can be realized, and the call of the user to the CUDA function is redirected to the call to the hijacked library function. The step of redirecting the call request to a hijacking library comprises: loading the hijacking library through a loader based on the LD_PRELOAD environment variable, and searching an objective function corresponding to the call request in the hijacking library; and executing the target function based on the information of the target virtual GPU to obtain a response result of the call request.

For example, when the target task process invokes the cumemegetinfo function to obtain the memory size, the memory size of the virtual GPU is returned instead of the memory size of the physical GPU. For another example, when the target task process invokes the cumelloc function, an error is returned if the memory size of the virtual GPU is exceeded.

According to the GPU virtualization sharing method provided by the embodiment of the application, the call of the user to the CUDA function is redirected into the call of the hijacked library function through the hijacking technology. Through the isolation technology, one physical GPU resource is separated into a plurality of virtual GPU resources, each virtual GPU is isolated from each other, and a user can use the virtual GPU as a complete physical GPU, so that multi-user and multi-task sharing of the GPU is realized.

The embodiment of the application discloses a GPU virtualization sharing method, which further describes and optimizes a technical scheme relative to the previous embodiment. Specific:

referring to fig. 2, a flowchart of another GPU virtualization sharing method is shown according to an exemplary embodiment, as shown in fig. 2, including:

s201: determining a file path of the hijacked library, and setting an LD_PRELOAD environment variable as the file path;

s202: determining a target physical GPU and dividing the target physical GPU into a plurality of virtual GPUs;

s203: when a target task is received, determining a ratio of the target physical GPU to the virtual GPU, and dividing the target physical GPU into a plurality of virtual GPUs based on the ratio.

S204: in the process of executing the target task, when a call request for a CUDA function is received, loading the hijacking library through a loader based on the LD_PRELOAD environment variable, and searching a target function corresponding to the call request in the hijacking library;

s205: and executing the target function based on the information of the target virtual GPU to obtain a response result of the call request.

An embodiment of the present application, as shown in fig. 3, may include the following steps:

step one: configuring an LD_PRELOAD environment variable in the system to point to a file path of the hijacking library;

step two: configuring the proportion of a physical GPU and a virtual GPU, and determining that one physical GPU is shared by a plurality of users;

step three: and starting a deep learning task through a Tensorflow software framework and the like, and configuring the virtual GPU to the corresponding task.

Step four: after Tensorflow is started, the call to CUDA function is hijacked by us, and the virtual GPU provided by us is used for executing tasks.

The following describes a GPU virtualization sharing device according to an embodiment of the present application, and the GPU virtualization sharing device and the GPU virtualization sharing method described above may be referred to with each other.

Referring to fig. 4, a structural diagram of a GPU virtualized sharing device is shown according to an exemplary embodiment, as shown in fig. 4, including:

the dividing module 401 is configured to determine a target physical GPU and divide the target physical GPU into a plurality of virtual GPUs;

a selecting module 402, configured to, when receiving a target task, select a target virtual GPU at the target physical GPU so as to execute the target task by using the target virtual GPU;

a redirecting module 403, configured to redirect, when receiving a call request to a CUDA function during execution of the target task, the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU.

According to the GPU virtualization sharing device provided by the embodiment of the application, through the hijacking technology, the call of the user to the CUDA function is redirected into the call of the hijacking library function. Through the isolation technology, one physical GPU resource is separated into a plurality of virtual GPU resources, each virtual GPU is isolated from each other, and a user can use the virtual GPU as a complete physical GPU, so that multi-user and multi-task sharing of the GPU is realized.

On the basis of the above embodiment, as a preferred implementation manner, the method further includes:

Based on the above embodiment, as a preferred implementation manner, the redirecting module 402 includes:

On the basis of the foregoing embodiment, as a preferred implementation manner, the dividing module 401 is specifically a module that determines a target physical GPU, determines a ratio of the target physical GPU to the virtual GPU, and divides the target physical GPU into a plurality of virtual GPUs based on the ratio.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The present application also provides an electronic device, referring to fig. 5, and a block diagram of an electronic device 500 provided in an embodiment of the present application, as shown in fig. 5, may include a processor 11 and a memory 12. The electronic device 500 may also include one or more of a multimedia component 13, an input/output (I/O) interface 14, and a communication component 15.

The processor 11 is configured to control the overall operation of the electronic device 500 to complete all or part of the steps in the GPU virtualization sharing method. The memory 12 is used to store various types of data to support operation at the electronic device 500, which may include, for example, instructions for any application or method operating on the electronic device 500, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 12 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 13 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 12 or transmitted through the communication component 15. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 14 provides an interface between the processor 11 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 15 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, the corresponding communication component 15 may thus comprise: wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic device 500 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processors (Digital Signal Processor, abbreviated as DSP), digital signal processing devices (Digital Signal Processing Device, abbreviated as DSPD), programmable logic devices (Programmable Logic Device, abbreviated as PLD), field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), controllers, microcontrollers, microprocessors, or other electronic components for performing the GPU virtualization sharing method described above.

In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the GPU virtualization sharing method described above. For example, the computer readable storage medium may be the memory 12 described above including program instructions executable by the processor 11 of the electronic device 500 to perform the GPU virtualization sharing method described above.

In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for GPU virtualization sharing, comprising:

starting a deep learning task through a Tensorflow software framework, and configuring a virtual GPU to the corresponding task;

in the process of executing the target task, when a call request for a CUDA function is received, redirecting the call request to a hijacking library; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU;

wherein, still include:

determining a file path of the hijacked library, and setting an LD_PRELOAD environment variable as the file path;

correspondingly, redirecting the call request to a hijacking library comprises the following steps:

2. The GPU virtualization sharing method of claim 1, wherein the partitioning the target physical GPU into a plurality of virtual GPUs comprises:

3. A GPU virtualization sharing device, comprising:

the module is used for starting a deep learning task through a Tensorflow software framework and configuring a virtual GPU to the corresponding task;

the redirection module is used for redirecting the call request to the hijacking library when receiving the call request to the CUDA function in the process of executing the target task; the hijacking library comprises a plurality of rename functions corresponding to CUDA functions, and the return value of each rename function is determined based on the information of the target virtual GPU;

wherein, still include:

the setting module is used for determining a file path of the hijacking library and setting an LD_PRELOAD environment variable as the file path;

correspondingly, the redirection module comprises:

4. The GPU virtualization sharing device of claim 3, wherein the partitioning module is specifically a module that determines a target physical GPU, determines a ratio of the target physical GPU to the virtual GPU, and partitions the target physical GPU into a plurality of virtual GPUs based on the ratio.

5. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the GPU virtualization sharing method as claimed in claim 1 or 2 when executing the computer program.

6. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the GPU virtualization sharing method as claimed in claim 1 or 2.