KR101435772B1 - Gpu virtualizing system - Google Patents
Gpu virtualizing system Download PDFInfo
- Publication number
- KR101435772B1 KR101435772B1 KR1020130071605A KR20130071605A KR101435772B1 KR 101435772 B1 KR101435772 B1 KR 101435772B1 KR 1020130071605 A KR1020130071605 A KR 1020130071605A KR 20130071605 A KR20130071605 A KR 20130071605A KR 101435772 B1 KR101435772 B1 KR 101435772B1
- Authority
- KR
- South Korea
- Prior art keywords
- gpu
- code
- gpus
- actual
- virtual
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The present invention relates to a variable number transformable GPU virtualization system. A first aspect of the present invention is a master node including a GPU virtual device driver receiving a GPU code and transferring the received GPU code to the slave node; And a plurality of cluster nodes divided into a slave node including a GPU virtual server for receiving the GPU code and being executed through an actual GPU, the GPU virtual device driver comprising: The GPU code for N (N is natural number) actual GPUs from OpenCL (Open Computing Language), a universal parallel computing framework, is converted into a form for transferring M GPUs (M is a natural number equal to or different from N) An emulator; And a dispatcher for delivering the GPU code converted by the emulator to the M virtual GPUs; To a variable number of GPU virtualization systems.
According to a second aspect of the present invention, there is provided a GPU virtualization system for performing GPU virtualization for GPU parallel processing, the system comprising: a master node which is a node through which a user executes GPU code for implementation as M virtual GPUs; And a slave node communicating with the master node to receive the GPU code and having N actual GPUs to perform a GPU operation; Wherein the master node includes an analyzer for computing a distribution of data to use over the GPU code based on a work item size and a work group size.
Virtual GPUs can be combined or redistributed through virtualization, providing virtual GPUs with memory and computing power that do not exist in real life, making it easier to run larger programs more quickly. to provide.
Description
The present invention relates to a variable number of convertible GPU virtualization system, and more particularly, to a GPU virtualization system that suggests a way to combine or redistribute actual GPUs through virtualization.
GPU is an abbreviation of Graphic Processing Unit. It is a graphic processing unit that loads the burden of complicated graphics processing to relieve computation and processing load (road) of graphic by CPU (Central Processing Unit) Developed.
However, in recent years, technology development has been actively pursued to utilize the parallel processing function of the GPU for use in complicated calculations.
As a result, many supercomputers are using GPUs to speed up processing. In this trend, methods to add virtualization to the GPU in order to use the GPU more efficiently have been studied and are being developed to alleviate the burden of complicated graphics processing from the CPU.
As one example, in recent years, technology development has been actively pursued to utilize the parallel processing function of the GPU for complicated calculation, and many recent supercomputers are rapidly increasing the processing speed by using the GPU. Therefore, it is possible to achieve high performance by using GPU in case of parallelizing well-computed algorithm.
Accordingly, in the related technology field, it is required to further divide the GPU through the GPU virtualization to form a virtual desktop environment, or to develop a technique for displaying the GPU as if there is one per virtual machine.
[Related Technical Literature]
1. METHOD AND APPARATUS FOR COMPILING AND EXECUTING APPLICATION USING VIRTUALIZATION IN HETEROGENEOUS SYSTEM USING CPU AND GPU IN VARIOUS SYSTEMS USING CPU AND GPU (Patent Application No. 10-2010-0093327 number)
2. Virtual GPU (VIRTUAL GPU) (Patent Application No. 10-2012-0078202)
On the other hand, the background art described above is technical information acquired by the inventor for the derivation of the present invention or obtained in the derivation process of the present invention, and can not necessarily be a known technology disclosed to the general public before the application of the present invention .
In order to solve the above problems, the present invention provides a GPU virtualization system capable of virtualizing N actual GPUs into M GPUs so that a user can intuitively and easily perform parallel programming.
In addition, according to another embodiment of the present invention, a GPU virtualization system that can be converted into a variable number can provide a GPU virtualization system for dividing a GPU according to a user's request to increase the GPU utilization rate.
However, the objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.
According to an aspect of the present invention, there is provided a variable number-convertible GPU virtualization system including: a master node including a GPU virtual device driver for receiving a GPU code and transferring the GPU code to a slave node; And a plurality of cluster nodes divided into a slave node including a GPU virtual server for receiving the GPU code and being executed through an actual GPU, the GPU virtual device driver comprising: An emulator for converting M (M is a natural number) virtual GPU code from a universal parallel computing framework into a form for transferring GPU code to N (N is a natural number equal to or different from M) actual GPU; And a dispatcher for passing the GPU code converted by the emulator to the N actual GPUs; . ≪ / RTI >
The dispatcher may also store the location of the N actual GPUs and deliver the converted GPU code based on the location of the stored actual GPUs.
The GPU virtual device driver may further include an analyzer analyzing whether the GPU memory is capable of being divided into the actual GPU memory for data used through the GPU code based on the work item size and the work group size; As shown in FIG.
In addition, when the data to be used through the GPU code can be divided, the analyzer causes the emulator and the dispatcher to carry out transmission of the GPU code divided into the N actual GPUs, If the data to be used can not be divided, a GPU code that can not be divided by a shared virtual memory (Shared Virtual Memory) method or a CPU execution method can be executed.
If the data to be used through the GPU code is divided into N actual GPUs, the analyzer may divide the workgroup size into N pieces and distribute the N pieces to the N actual GPUs to be executed .
In addition, the analyzer may overlap the N actual GPUs using the shared virtual memory of the MMU of the actual GPU if the data to be used through the GPU code is not divided into the N actual GPUs It is possible to perform processing on the buffer area for reading or writing.
Also, the master node may be a node where a program is executed by a user, and the slave node may be a node having the N actual GPUs and performing a GPU operation.
The GPU virtual device driver may support CUDA and OpenCL.
Preferably, communication between the GPU virtual server and the GPU virtual device driver uses an interconnection network connected between the master node and the slave node.
According to another aspect of the present invention, there is provided a GPU virtualization system including: a master node, which is a node for executing GPU codes for implementing M virtual GPUs; And one or more slave nodes communicating with the master node to receive the GPU code and having N actual GPUs to perform GPU operations; And the master node may include an analyzer for computing a distribution of data to be used over the GPU code based on a work item size and a work group size.
An emulator configured to convert the GPU code for N real GPUs by operation by the analyzer; And a dispatcher configured to provide the converted GPU code to each of the N GPUs.
The GPU virtualization system capable of converting a variable number according to an embodiment of the present invention provides a method of combining or redistributing an actual GPU through virtualization to provide a virtual GPU having a memory and a computing capability that do not exist in actual size Thereby providing an effect that a larger program can be driven more quickly and more easily.
Also, according to another embodiment of the present invention, the variable number-convertible GPU virtualization system provides an effect of increasing the GPU usage rate by dividing the GPU according to the demand of the user.
In addition, according to another embodiment of the present invention, the variable number-convertible GPU virtualization system can virtualize N real GPUs into M GPUs, thereby providing an effect that a user can easily perform parallel programming intuitively do.
The effects obtained by the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description will be.
FIG. 1 is a diagram for explaining a basic structure of an actual GPU used in a variable number-convertible GPU virtualization system according to an embodiment of the present invention.
2 is a diagram for explaining a virtual device driver layer in a variable number-convertible GPU virtualization system according to an embodiment of the present invention.
3 is a diagram illustrating a GPU virtual device driver in a variable number-convertible GPU virtualization system according to an embodiment of the present invention.
4 is a diagram illustrating an example of a GPU virtualization technique in a variable number-convertible GPU virtualization system according to an embodiment of the present invention.
FIG. 5 is a conceptual diagram for explaining an implementation method when data accessed by a virtualized GPU in a GPU virtualization system that can be converted to a variable number according to an embodiment of the present invention can not be divided.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a detailed description of preferred embodiments of the present invention will be given with reference to the accompanying drawings. In the following description of the present invention, detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.
In the present specification, when any one element 'transmits' data or signals to another element, the element can transmit the data or signal directly to the other element, and through at least one other element Data or signal can be transmitted to another component.
FIG. 1 is a diagram for explaining a basic structure of an
An MMU (Memory Management Unit) 13 is installed to implement a virtual memory like an MMU of a CPU. The MMU 13 is implemented in a recent GPU to support virtualization of the
2 is a diagram illustrating a virtual device driver layer in a variable number-convertible GPU virtualization system according to an embodiment of the present invention. Referring to FIG. 2, a cluster node of a GPU virtualization system convertible into a variable number is divided into a master node (master node) 100a and a slave node (slave node) 100b.
The
The
In a cluster node environment separated into the
Accordingly, the
3 is a diagram showing a GPU
Referring to FIG. 3, the GPU
First, an
Also, the GPU code is transferred from the OpenCL 100a-5 to the
For example, in a case where N (N is a natural number)
Next, the
Finally, the analyzer 130a-3 analyzes whether the data used by the GPU code in the additional module constituting the GPU
In this specification, a module may mean a functional and structural combination of hardware for carrying out the technical idea of the present invention and software for driving the hardware. For example, the module may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and it does not necessarily mean a physically connected code or a kind of hardware. Can be easily deduced to the average expert in the field of < / RTI >
The analyzer 130a-3 allows the data to be transferred to the N
Conversely, if the data used by the GPU code by the analyzer 130a-3 can not be divided, the GPU code can be executed through the following two methods. First, the
4 is a diagram illustrating an example of a GPU virtualization technique in a variable number-convertible GPU virtualization system according to an embodiment of the present invention. Referring to (1) of FIG. 4, two GPUs 10-0, 10-1, and 10-2 are installed. In an environment having three cluster nodes, a plurality of
In this specification, CPUs 2-0, 2-1, and 2-2 on a cluster node that can be divided into a
2, the
The GPU
The GPU
4 (1) is an example of GPU virtualization, in which a plurality of GPUs 10: 10-0, 10-1, 10-2 are virtualized into one
When the programmer executes the program implemented by the OpenCL programming model, the GPU
At this time, the analyzer 130a-3 checks whether the data used by the GPU code can be divided into the GPU 10 (10-0, 10-1, 10-2).
For example, the analyzer 130a-3 analyzes the GPU code using the input values of the workitem size and the workgroup size, which are input when executing on the
The analyzer 130a-3 divides the workgroup into 64 pieces if the size of the workgroup size is 1024 and the number of actual GPUs 10 (for example, 10-0 to 10-15) do.
In the case where the worker is divided into 64 workgroups, the analyzer 130a-3 divides the workgroup into 64 GPUs, and if the buffer areas are not overlapping, 0 to 10 to 15).
In this case, the divided buffer areas may be respectively copied to the
FIG. 5 is a conceptual diagram for explaining an implementation method when data accessed by a virtualized GPU in a GPU virtualization system that can be converted to a variable number according to an embodiment of the present invention can not be divided. If the data accessed by the virtualized GPU can not be partitioned, it is performed in two ways.
First, the
When attempting to access page r from GPU0 {10 (0)}, MMU {13 (0)} of GPU {10 (0)} recognizes that the page is not in its local GPU.
In this case, the corresponding page is fetched from the
If the page is to be modified, a twin page is generated as shown in FIG. 5 as GPU1 {10 (1)} to find the modified contents. After the code execution is completed, the modified data (diff) based on the twin page is reflected in the
Since the
Second, the
As described above, preferred embodiments of the present invention have been disclosed in the present specification and drawings, and although specific terms have been used, they have been used only in a general sense to easily describe the technical contents of the present invention and to facilitate understanding of the invention , And are not intended to limit the scope of the present invention. It is to be understood by those skilled in the art that other modifications based on the technical idea of the present invention are possible in addition to the embodiments disclosed herein.
10: Actual GPU
11: PE (Processing Element)
12: GPU Memory (GPU Memory)
13: Memory Management Unit (MMU)
100a: master node
100a-1: Hardware
100a-2: Operating System (Operating System)
100a-3: GPU virtual device driver
110a-3: Emulator
120a-3: Dispatcher
130a-3:
100a-4: CUDA
100a-5: OpenCL
100a-6: CUDA App.
100a-7: OpenCL App.
100b: Slave node
100b-3: GPU device driver
100b-6: GPU virtual server
Claims (11)
An emulator for converting M (M is a natural number) virtual GPU code from a parallel computing framework into a form for transferring GPU code to N (N is a natural number equal to or different from M) actual GPU; And
A dispatcher for delivering the GPU code converted by the emulator to the N actual GPUs; Wherein the variable-number-convertible GPU virtualization system comprises:
The dispatcher comprising:
Wherein the location of the N actual GPUs is stored and the converted GPU code is delivered based on the location of the stored actual GPUs.
The GPU virtual device driver comprising:
An analyzer for analyzing, based on the work item size and the work group size, whether the GPU memory is divisible into the actual GPU memory for data used through the GPU code; The GPU virtualization system further comprising:
The above-
The GPU code being divided into N actual GPUs through the emulator and the dispatcher when the data to be used through the GPU code can be divided,
A GPU code that is not divisible by a shared virtual memory method or a CPU execution method is executed when data to be used through the GPU code can not be divided, Possible GPU virtualization system.
The above-
Dividing the workgroup size into N pieces of the N actual GPUs for data to be used through the GPU code, and distributing the divided N pieces to the N actual GPUs, GPU virtualization system.
The above-
If the data to be used through the GPU code is not divided into the N actual GPUs, the N real GPUs using the shared virtual memory of the MMU of the actual GPU may read or write in overlapping buffer areas The GPU virtualization system comprising:
The master node,
A node where a program is executed by a user,
The slave node,
Wherein the GPU is a node that has the N actual GPUs and performs GPU operations.
The GPU virtual device driver comprising:
A variable number of convertible GPU virtualization system characterized by supporting a parallel computing framework.
Wherein the communication between the GPU virtual server and the GPU virtual device driver uses an interconnection network connected between the master node and the slave node.
A master node that executes a GPU code for implementing M virtual GPUs; And
One or more slave nodes communicating with the master node to receive the GPU code and having N actual GPUs to perform GPU operations; / RTI >
Wherein the master node comprises an analyzer for computing a distribution of data to be used over the GPU code based on a work item size and a work group size.
The master node,
An emulator configured to convert the GPU code for N slave nodes by operation by the analyzer; And
A dispatcher configured to provide the converted GPU code to each of the N slave nodes; The GPU virtualization system further comprising:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130071605A KR101435772B1 (en) | 2013-06-21 | 2013-06-21 | Gpu virtualizing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130071605A KR101435772B1 (en) | 2013-06-21 | 2013-06-21 | Gpu virtualizing system |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101435772B1 true KR101435772B1 (en) | 2014-08-29 |
Family
ID=51751616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130071605A KR101435772B1 (en) | 2013-06-21 | 2013-06-21 | Gpu virtualizing system |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101435772B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3385850A1 (en) * | 2017-04-07 | 2018-10-10 | INTEL Corporation | Apparatus and method for memory management in a graphics processing environment |
KR20220035626A (en) * | 2020-09-14 | 2022-03-22 | 한화시스템 주식회사 | Method for managing gpu resources and computing device for executing the method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060079088A (en) * | 2004-12-30 | 2006-07-05 | 마이크로소프트 코포레이션 | Systems and methods for virtualizing graphics subsystems |
WO2012083012A1 (en) * | 2010-12-15 | 2012-06-21 | Advanced Micro Devices, Inc. | Device discovery and topology reporting in a combined cpu/gpu architecture system |
WO2012082423A1 (en) * | 2010-12-13 | 2012-06-21 | Advanced Micro Devices, Inc. | Graphics compute process scheduling |
-
2013
- 2013-06-21 KR KR1020130071605A patent/KR101435772B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060079088A (en) * | 2004-12-30 | 2006-07-05 | 마이크로소프트 코포레이션 | Systems and methods for virtualizing graphics subsystems |
WO2012082423A1 (en) * | 2010-12-13 | 2012-06-21 | Advanced Micro Devices, Inc. | Graphics compute process scheduling |
WO2012083012A1 (en) * | 2010-12-15 | 2012-06-21 | Advanced Micro Devices, Inc. | Device discovery and topology reporting in a combined cpu/gpu architecture system |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3385850A1 (en) * | 2017-04-07 | 2018-10-10 | INTEL Corporation | Apparatus and method for memory management in a graphics processing environment |
US10380039B2 (en) | 2017-04-07 | 2019-08-13 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
EP3614270A1 (en) * | 2017-04-07 | 2020-02-26 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
US10769078B2 (en) | 2017-04-07 | 2020-09-08 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
US11360914B2 (en) | 2017-04-07 | 2022-06-14 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
EP4109283A1 (en) * | 2017-04-07 | 2022-12-28 | INTEL Corporation | Apparatus and method for memory management in a graphics processing environment |
US11768781B2 (en) | 2017-04-07 | 2023-09-26 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
EP4325371A3 (en) * | 2017-04-07 | 2024-03-20 | Intel Corporation | Apparatus and method for memory management in a graphics processing environment |
KR20220035626A (en) * | 2020-09-14 | 2022-03-22 | 한화시스템 주식회사 | Method for managing gpu resources and computing device for executing the method |
KR102417882B1 (en) | 2020-09-14 | 2022-07-05 | 한화시스템 주식회사 | Method for managing gpu resources and computing device for executing the method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105830026B (en) | Apparatus and method for scheduling graphics processing unit workload from virtual machines | |
US8943584B2 (en) | Centralized device virtualization layer for heterogeneous processing units | |
Burgio et al. | A software stack for next-generation automotive systems on many-core heterogeneous platforms | |
CN103034524A (en) | Paravirtualized virtual GPU | |
CN113495857A (en) | Memory error isolation techniques | |
CN112817738A (en) | Techniques for modifying executable graphs to implement workloads associated with new task graphs | |
CN116783578A (en) | Execution matrix value indication | |
CN118339538A (en) | Application programming interface for indicating graph node execution | |
KR101435772B1 (en) | Gpu virtualizing system | |
CN112817739A (en) | Techniques for modifying executable graphs to implement different workloads | |
Cabezas et al. | Runtime and architecture support for efficient data exchange in multi-accelerator applications | |
CN113568734A (en) | Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment | |
CN118119924A (en) | Application programming interface for performing operations with reusable threads | |
CN118103817A (en) | Application programming interface for performing selective loading | |
CN116401039A (en) | Asynchronous memory deallocation | |
CN117222984A (en) | Application programming interface for disassociating virtual addresses | |
Miliadis et al. | VenOS: A virtualization framework for multiple tenant accommodation on reconfigurable platforms | |
CN116225676A (en) | Application programming interface for limiting memory | |
CN116724292A (en) | Parallel processing of thread groups | |
CN116243921A (en) | Techniques for modifying graph code | |
CN117178261A (en) | Application programming interface for updating graph code signal quantity | |
CN116802613A (en) | Synchronizing graphics execution | |
CN115509736A (en) | Memory allocation or de-allocation using graphs | |
CN116097224A (en) | Simultaneous boot code | |
Moore et al. | VForce: An environment for portable applications on high performance systems with accelerators |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20170724 Year of fee payment: 4 |