CN114463159A - GPU resource sharing method - Google Patents
GPU resource sharing method Download PDFInfo
- Publication number
- CN114463159A CN114463159A CN202210010700.3A CN202210010700A CN114463159A CN 114463159 A CN114463159 A CN 114463159A CN 202210010700 A CN202210010700 A CN 202210010700A CN 114463159 A CN114463159 A CN 114463159A
- Authority
- CN
- China
- Prior art keywords
- kernel
- ipc
- kernels
- information table
- resource sharing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000010187 selection method Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multi Processors (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The method comprises the steps of establishing a first information table and a second information table, and determining an optimal kernel group according to the first information table and the second information table through a kernel group selection method, wherein the first information table stores kernel information when kernels are executed independently, the second information table stores kernel information when two kernels are executed together, and the kernels comprise memory intensive kernels and calculation intensive kernels. Compared with the prior art, the method selects the optimal kernel group from the multi-kernel kernels, and further improves the resource utilization rate of the GPU through an efficient resource sharing method among multiple kernels of the GPU.
Description
Technical Field
The invention belongs to the field of resource sharing, and particularly relates to a GPU resource sharing method.
Background
A Graphics processor (abbreviated as GPU), also called a display core, a visual processor, and a display chip, is a microprocessor specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer and a smart phone). The display control circuit is used for converting and driving display information required by a computer system, providing a line scanning signal for a display and controlling the display of the display correctly, is an important element for connecting the display and a personal computer mainboard, and is also one of important equipment for man-machine conversation.
The GPU is mainly used for carrying out floating point operation and parallel operation, the floating point operation speed and the parallel operation speed of the GPU can be hundreds of times higher than those of a CPU, after the GPU virtualization technology is used, virtual machine instances running on a data center server can share one or more GPU processors to carry out graphic operation, and the safe and efficient desktop access mode is pursued by more and more users. While this strategy increases the overall resource utilization of the physical server, the overall performance may drop dramatically due to the large number of requests from multiple virtual machines.
In order to improve the resource utilization rate of the GPU, a multi-core technology is provided, computing resources are divided into a plurality of cores by the multi-core technology, and the resource utilization rate is improved through efficient GPU resource sharing among the cores. However, with the increase of the number of virtual machines, another challenge is how to select an optimal core group from the multiple cores, in the prior art, most of the memory intensive cores and the computation intensive cores are configured and operated, however, in some cases, the configuration and operation performance of the memory intensive cores and the memory intensive cores or the configuration and operation performance of the computation intensive cores and the computation intensive cores are better than the configuration of the memory demanding cores and the computation demanding cores, and therefore, it is not possible to determine which cores to schedule by the type of the cores to maximally improve the resource utilization rate.
Disclosure of Invention
The invention provides a GPU resource sharing method, and aims to solve the technical problem of low resource utilization rate in the GPU resource sharing process.
In order to achieve the above object, the present invention provides a GPU resource sharing method, comprising the steps of:
a GPU resource sharing method is characterized in that: the method comprises the steps of constructing a first information table and a second information table, determining an optimal kernel group through a kernel group selection method according to the first information table and the second information table, and realizing GPU resource sharing according to the obtained optimal kernel group, wherein the first information table stores kernel information when kernels are independently executed, the second information table stores kernel information when two kernels are executed together, and the kernels comprise memory intensive kernels and calculation intensive kernels.
The first information table includes: ID. IPC _ S, K _ NO, wherein ID is serial number, IPC _ S is instruction number of each cycle corresponding to the kernel, and K _ NO is kernel identification.
The second information table includes: ID. IPC _ G, K _ NO1, K _ NO2 and TPS, wherein ID is a serial number, IPC _ G represents the number of instructions per cycle when two cores are executed together, and TPS is system throughput.
The system throughput TPS is calculated by using the information of the IPC _ S when the kernel in the first information table is executed independently and the IPC _ G when the two kernels in the second information table are executed in parallel.
The system throughput TPS is calculated by the following formula: TPS ═ IPC _ s (i) + IPC _ s (j))/IPC _ G (s, j), where i, j represent two different cores, respectively. The GPU resource sharing method of claim 1, wherein: the kernel group selection method comprises the following steps: when n new kernels wait in the kernel queue, sequentially collecting IPC _ S when the kernels are executed independently and a kernel group IPC _ G of a possible kernel group by using a hardware performance counter, calculating the throughput TPS of the kernel group by using the IPC when the kernels are executed independently and the kernel group IPC of the possible kernel group, and selecting the kernel group with the largest TPS as an optimal kernel group.
The method comprises the steps of establishing a first information table and a second information table, and determining an optimal kernel group according to the first information table and the second information table through a kernel group selection method, wherein the first information table stores kernel information when kernels are executed independently, the second information table stores kernel information when two kernels are executed together, and the kernels comprise memory intensive kernels and calculation intensive kernels. Compared with the prior art, the method selects the optimal kernel group from the multi-kernel kernels, and further improves the resource utilization rate of the GPU through an efficient resource sharing method among multiple cores of the GPU.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings:
as shown in fig. 1, fig. 1 is a schematic flowchart of a process according to an embodiment of the present invention, and the present invention relates to a GPU resource sharing method, which is characterized in that: the method comprises the steps of constructing a first information table and a second information table, determining an optimal kernel group through a kernel group selection method according to the first information table and the second information table, and realizing GPU resource sharing according to the obtained optimal kernel group, wherein the first information table stores kernel information when kernels are independently executed, the second information table stores kernel information when two kernels are executed together, and the kernels comprise memory intensive kernels and calculation intensive kernels.
The first information table includes: ID. IPC _ S, K _ NO, wherein ID is serial number, IPC _ S is instruction number of each cycle corresponding to the kernel, and K _ NO is kernel identification.
The first information table structure is as follows:
ID | IPC_S | K_NO |
0 | 208 | K0 |
1 | 136 | K1 |
2 | 693 | K2 |
…… | …… | …… |
n | 537 | Kn |
the second information table includes: ID. IPC _ G, K _ NO1, K _ NO2 and TPS, wherein ID is a serial number, IPC _ G represents the number of instructions per cycle when two cores are executed together, and TPS is system throughput.
The second information table structure is as follows:
ID | IPC_G | K_NO1 | K_NO2 | TPS |
0 | 208 | K0 | K_NO2 | 1.3 |
1 | 136 | K1 | K_NO2 | 1.2 |
2 | 693 | K2 | K_NO2 | 1.7 |
…… | …… | …… | K_NO2 | …… |
n | 537 | Kn | K_NO2 | 1.15 |
the system throughput TPS is calculated by using the information of the IPC when the kernel in the first information table is executed independently and the IPC1 and the IPC2 when the two kernels in the second information table are executed in parallel.
Wherein TPS ═ IPC _ s (i) + IPC _ s (j))/IPC _ G (s, j), where i and j respectively represent two different cores, and IPC _ G (s, j) can be obtained by weighted summation or other calculation of IPC information of i and j cores when the two cores are executed simultaneously.
The kernel group selection method comprises the following steps: when n new kernels wait in the kernel queue, sequentially collecting IPC _ S when the kernels are executed independently and a kernel group IPC _ G of a possible kernel group by using a hardware performance counter, calculating the throughput TPS of the kernel group by using the IPC when the kernels are executed independently and the kernel group IPC of the possible kernel group, and selecting the kernel group with the largest TPS as an optimal kernel group.
Finally, it should be noted that the above-mentioned technical solution is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application method and principle of the present invention disclosed, and the method is not limited to the above-mentioned specific embodiment of the present invention, so that the above-mentioned embodiment is only preferred, and not restrictive.
Claims (6)
1. A GPU resource sharing method is characterized in that: the method comprises the steps of constructing a first information table and a second information table, determining an optimal kernel group through a kernel group selection method according to the first information table and the second information table, and realizing GPU resource sharing according to the obtained optimal kernel group, wherein the first information table stores kernel information when kernels are independently executed, the second information table stores kernel information when two kernels are executed together, and the kernels comprise memory intensive kernels and calculation intensive kernels.
2. The GPU resource sharing method of claim 1, wherein: the first information table includes: ID. IPC _ S, K _ NO, wherein ID is serial number, IPC _ S is instruction number of each cycle corresponding to the kernel, and K _ NO is kernel identification.
3. The GPU resource sharing method of claim 1, wherein: the second information table includes: ID. IPC _ G, K _ NO1, K _ NO2 and TPS, wherein ID is a serial number, IPC _ G represents the number of instructions per cycle when two cores are executed together, and TPS is system throughput.
4. The GPU resource sharing method of claim 3, wherein: the system throughput TPS is calculated by using the information of the IPC _ S when the kernel in the first information table is executed independently and the IPC _ G when the two kernels in the second information table are executed in parallel.
5. The GPU resource sharing method of claim 4, wherein: the system throughput TPS is calculated by the following formula: TPS ═ IPC _ s (i) + IPC _ s (j))/IPC _ G (s, j), where i, j represent two different cores, respectively.
6. The GPU resource sharing method of claim 1, wherein: the kernel group selection method comprises the following steps: when n new kernels wait in the kernel queue, sequentially collecting IPC _ S when the kernels are executed independently and a kernel group IPC _ G of a possible kernel group by using a hardware performance counter, calculating the throughput TPS of the kernel group by using the IPC when the kernels are executed independently and the kernel group IPC of the possible kernel group, and selecting the kernel group with the largest TPS as an optimal kernel group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210010700.3A CN114463159B (en) | 2022-01-06 | 2022-01-06 | GPU resource sharing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210010700.3A CN114463159B (en) | 2022-01-06 | 2022-01-06 | GPU resource sharing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114463159A true CN114463159A (en) | 2022-05-10 |
CN114463159B CN114463159B (en) | 2024-02-23 |
Family
ID=81409623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210010700.3A Active CN114463159B (en) | 2022-01-06 | 2022-01-06 | GPU resource sharing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114463159B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120081373A1 (en) * | 2010-09-30 | 2012-04-05 | Nec Laboratories America, Inc. | Energy-aware task consolidation on graphics processing unit (gpu) |
CN109857564A (en) * | 2019-03-05 | 2019-06-07 | 上海交通大学 | The GPU of method for managing resource and its application based on fine-grained GPU |
US20190196853A1 (en) * | 2017-12-21 | 2019-06-27 | International Business Machines Corporation | Runtime gpu/cpu selection |
CN111045800A (en) * | 2019-11-14 | 2020-04-21 | 武汉纺织大学 | Method and system for optimizing GPU (graphics processing Unit) performance based on short job priority |
-
2022
- 2022-01-06 CN CN202210010700.3A patent/CN114463159B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120081373A1 (en) * | 2010-09-30 | 2012-04-05 | Nec Laboratories America, Inc. | Energy-aware task consolidation on graphics processing unit (gpu) |
US20190196853A1 (en) * | 2017-12-21 | 2019-06-27 | International Business Machines Corporation | Runtime gpu/cpu selection |
CN109857564A (en) * | 2019-03-05 | 2019-06-07 | 上海交通大学 | The GPU of method for managing resource and its application based on fine-grained GPU |
CN111045800A (en) * | 2019-11-14 | 2020-04-21 | 武汉纺织大学 | Method and system for optimizing GPU (graphics processing Unit) performance based on short job priority |
Also Published As
Publication number | Publication date |
---|---|
CN114463159B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416433B (en) | Neural network heterogeneous acceleration method and system based on asynchronous event | |
EP2513860B1 (en) | A graphics pipeline scheduling architecture utilizing performance counters | |
CN110471766B (en) | GPU resource scheduling system and method based on CUDA | |
CN106358003A (en) | Video analysis and accelerating method based on thread level flow line | |
US8321492B1 (en) | System, method, and computer program product for converting a reduction algorithm to a segmented reduction algorithm | |
US10831693B1 (en) | Multicast master | |
CN103049241A (en) | Method for improving computation performance of CPU (Central Processing Unit) +GPU (Graphics Processing Unit) heterogeneous device | |
CN110427256A (en) | Job scheduling optimization method, equipment, storage medium and device priority-based | |
US11138106B1 (en) | Target port with distributed transactions | |
CN112162854A (en) | Method, system and medium for scheduling calculation tasks between CPU-GPU | |
CN112181613B (en) | Heterogeneous resource distributed computing platform batch task scheduling method and storage medium | |
CN114995782B (en) | Data processing method, device, equipment and readable storage medium | |
CN111209115A (en) | Cloud desktop vGPU (virtual graphics processing Unit) over-allocation method | |
CN111506434B (en) | Task processing method and device and computer readable storage medium | |
CN116263701A (en) | Computing power network task scheduling method and device, computer equipment and storage medium | |
CN111736990B (en) | Teaching and scientific research platform resource allocation method and device based on load balancing | |
CN114926318A (en) | Multi-GPU aggregation method, image processing device, electronic equipment and storage medium | |
CN109656714B (en) | GPU resource scheduling method of virtualized graphics card | |
CN109840597B (en) | Model prediction method and device, electronic equipment and storage medium | |
CN114463159B (en) | GPU resource sharing method | |
US20230083565A1 (en) | Image data processing method and apparatus, storage medium, and electronic device | |
CN105957131B (en) | Graphic system and its method | |
US11868875B1 (en) | Data selection circuit | |
CN114816746A (en) | Method for realizing mixed type virtualization by GPU with multiple virtualization types | |
CN112114967A (en) | GPU resource reservation method based on service priority |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |