CN110888737A - Ringbuffer implementation system and method supporting multiple GPUs - Google Patents
Ringbuffer implementation system and method supporting multiple GPUs Download PDFInfo
- Publication number
- CN110888737A CN110888737A CN201911125585.9A CN201911125585A CN110888737A CN 110888737 A CN110888737 A CN 110888737A CN 201911125585 A CN201911125585 A CN 201911125585A CN 110888737 A CN110888737 A CN 110888737A
- Authority
- CN
- China
- Prior art keywords
- ringbuffer
- page
- gpu
- management module
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention belongs to the technical field of computer application, and particularly relates to a Ringbuffer implementation system supporting multiple GPUs. The system comprises a Ringbuffer page table management module (1), a multi-GPU task management module (2), a data buffer module (3) and a GPU task buffer module (4). The invention provides a Ringbuffer implementation method supporting multiple GPUs by constructing a mapping relation between a Ringbuffer page space and the multiple GPUs.
Description
The invention belongs to the technical field of computer application, and particularly relates to a Ringbuffer implementation system and method supporting multiple GPUs.
Background
Since the advent of the GPU, the GPU has played an important role in the fields requiring high performance computing, such as image video processing, physics, bioscience, chemistry, artificial intelligence, and the like, due to its ultra-strong computing power. In order to complete the increasingly complex graphic image processing task, the general computing task and the reduction of the computing time, a plurality of GPU cards are required to be used for computing at the same time. Ringbuffer is a technology that can effectively improve the memory allocation and usage efficiency. Whether Ringbuffer can be flexibly and efficiently used to manage and distribute the calculation tasks on a plurality of GPUs has direct influence on the task completion efficiency.
Disclosure of Invention
The purpose of the invention is as follows:
in order to solve the problems, the invention provides a Ringbuffer implementation method supporting multiple GPUs, and the method is characterized in that through constructing the mapping relation between the Ringbuffer page space and the multiple GPUs, data of the multiple GPUs are uniformly managed by one Ringbuffer module, so that the memory space is saved, the flexible distribution of graphics or the operation of calculation tasks on the multiple GPUs is realized, and the operation efficiency of the tasks is improved.
The technical scheme is as follows:
the invention provides a Ringbuffer implementation system supporting multiple GPUs (graphic processing units), which comprises a first GPU and a second GPU
Comprises a Ringbuffer page table management module (1), a multi-GPU task management module (2) and a number
And (4) GPU task buffer modules according to the buffer modules (3) and (4).
Further, the Ringbuffer page table management module (1) is used for realizing page management of the data buffering module (3), the module divides the Ringbuffer data buffering module (3) into a plurality of page spaces with the same size, and each page has own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
Further, the multi-GPU task management module (2) is configured to receive multi-GPU task configuration information configured by a user, allocate a GPU to be executed for a graphics or a computing task generated by the Ringbuffer page table management module (1), and then return task allocation information to the Ringbuffer page table management module (1).
Further, the data buffer module (3) is configured to receive and store incoming data according to the division of the Ringbuffer page table management module (1), and send data in a page to a corresponding GPU task buffer under the control of the Ringbuffer page table management module (1).
Further, the GPU task buffer module (4) is configured to buffer tasks of each GPU, independently receive task data from the data buffer module (3), and reply a task completion signal to the Ringbuffer page of the data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
Another object of the present invention is to provide an implementation method of a Ringbuffer implementation system supporting multiple GPUs, which includes the following steps:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
Has the advantages that:
the invention provides a Ringbuffer implementation method supporting multiple GPUs by constructing the mapping relation between the Ringbuffer page space and the multiple GPUs.
Drawings
FIG. 1 is a schematic diagram of a Ringbuffer implementation method supporting multiple GPUs according to the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention will be further described with reference to the accompanying drawings in which:
as shown in fig. 1, the system for implementing Ringbuffer supporting multiple GPUs includes a Ringbuffer page table management module (1), a multiple-GPU task management module (2), a data buffering module (3), and a GPU task buffering module (4).
The Ringbuffer page table management module (1) is used for realizing page management of the data buffer module (3), the data buffer module (3) of the Ringbuffer is divided into a plurality of page spaces with the same size by the Ringbuffer page table management module, and each page has own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
The multi-GPU task management module (2) is used for receiving multi-GPU task configuration information configured by a user, distributing executed GPUs for graphics or computing tasks generated by the Ringbuffer page table management module (1), and then returning task distribution information to the Ringbuffer page table management module (1).
The data buffer module (3) is used for receiving and storing the incoming data according to the division of the Ringbuffer page table management module (1), and sending the data in the page to the corresponding GPU task buffer under the control of the Ringbuffer page table management module (1).
The GPU task buffer module (4) is used for buffering tasks of each GPU, independently receives task data from the data buffer module (3), and replies a task completion signal to a Ringbuffer page of the data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
The implementation method of the Ringbuffer implementation system supporting multiple GPUs comprises the following steps:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (6)
1. The Ringbuffer implementation system supporting multiple GPUs is characterized by comprising a Ringbuffer page table management module (1), a multiple GPU task management module (2), a data buffering module (3) and a GPU task buffering module (4).
2. The multiple-GPU supporting Ringbuffer implementation system of claim 1, characterized in that the Ringbuffer page table management module (1) is used for implementing page management to the data buffer module (3), and the Ringbuffer page table management module divides the data buffer module (3) of the Ringbuffer into a plurality of page spaces with the same size, and each page has its own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
3. The multiple-GPU supporting Ringbuffer implementation system according to claim 1, wherein the multiple-GPU task management module (2) is configured to accept multiple-GPU task configuration information configured by a user, allocate a GPU for execution for graphics or computing tasks generated by the Ringbuffer page table management module (1), and then return the task allocation information to the Ringbuffer page table management module (1).
4. The multiple-GPU-capable Ringbuffer implementation system according to claim 1, wherein the data buffer module (3) is configured to receive and store incoming data according to the division of the Ringbuffer page table management module (1), and the Ringbuffer page table management module (1) controls to send data in a page to a corresponding GPU task buffer.
5. The multiple-GPU supporting Ringbuffer implementation system of claim 1, wherein the GPU task buffer module (4) is used for buffering tasks of each GPU, independently receives task data from the data buffer module (3), and replies a task completion signal to a Ringbuffer page of data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
6. The method for implementing a multi-GPU enabled Ringbuffer implementation system as claimed in any of claims 1-5, said method comprising the steps of:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911125585.9A CN110888737A (en) | 2019-11-18 | 2019-11-18 | Ringbuffer implementation system and method supporting multiple GPUs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911125585.9A CN110888737A (en) | 2019-11-18 | 2019-11-18 | Ringbuffer implementation system and method supporting multiple GPUs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110888737A true CN110888737A (en) | 2020-03-17 |
Family
ID=69747733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911125585.9A Pending CN110888737A (en) | 2019-11-18 | 2019-11-18 | Ringbuffer implementation system and method supporting multiple GPUs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110888737A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113808001A (en) * | 2021-11-19 | 2021-12-17 | 南京芯驰半导体科技有限公司 | Method and system for single system to simultaneously support multiple GPU (graphics processing Unit) work |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120001905A1 (en) * | 2010-06-30 | 2012-01-05 | Ati Technologies, Ulc | Seamless Integration of Multi-GPU Rendering |
CN102597950A (en) * | 2009-09-03 | 2012-07-18 | 先进微装置公司 | Hardware-based scheduling of GPU work |
CN106708601A (en) * | 2016-12-12 | 2017-05-24 | 中国航空工业集团公司西安航空计算技术研究所 | GPU-oriented virtual IO ringbuffer realization method |
CN107124286A (en) * | 2016-02-24 | 2017-09-01 | 深圳市知穹科技有限公司 | A kind of mass data high speed processing, the system and method for interaction |
CN107908428A (en) * | 2017-11-24 | 2018-04-13 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of frame, the GPU graphics command buffer synchronisation methods of page synchronization |
-
2019
- 2019-11-18 CN CN201911125585.9A patent/CN110888737A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102597950A (en) * | 2009-09-03 | 2012-07-18 | 先进微装置公司 | Hardware-based scheduling of GPU work |
US20120001905A1 (en) * | 2010-06-30 | 2012-01-05 | Ati Technologies, Ulc | Seamless Integration of Multi-GPU Rendering |
CN107124286A (en) * | 2016-02-24 | 2017-09-01 | 深圳市知穹科技有限公司 | A kind of mass data high speed processing, the system and method for interaction |
CN106708601A (en) * | 2016-12-12 | 2017-05-24 | 中国航空工业集团公司西安航空计算技术研究所 | GPU-oriented virtual IO ringbuffer realization method |
CN107908428A (en) * | 2017-11-24 | 2018-04-13 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of frame, the GPU graphics command buffer synchronisation methods of page synchronization |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113808001A (en) * | 2021-11-19 | 2021-12-17 | 南京芯驰半导体科技有限公司 | Method and system for single system to simultaneously support multiple GPU (graphics processing Unit) work |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442560B (en) | Log replay method, device, server and storage medium | |
CN104407933A (en) | Data backup method and device | |
CN112035381A (en) | Storage system and storage data processing method | |
US11010056B2 (en) | Data operating method, device, and system | |
CN103488580A (en) | Method for establishing address mapping table of solid-state memory | |
EP3846036A1 (en) | Matrix storage method, matrix access method, apparatus and electronic device | |
CN103760525A (en) | Completion type in-place matrix transposition method | |
CN112967666B (en) | LED display screen control device and control method capable of realizing random pixel arrangement | |
CN108958950A (en) | Task management method, host and the storage device of electronic storage device | |
US7865632B2 (en) | Memory allocation and access method and device using the same | |
CN104765701A (en) | Data access method and device | |
CN104252422A (en) | Memory access method and memory controller | |
CN110888737A (en) | Ringbuffer implementation system and method supporting multiple GPUs | |
CN101127847A (en) | A screen display synthesis method and synthesis device | |
CN102542525B (en) | Information processing equipment and information processing method | |
CN116719764A (en) | Data synchronization method, system and related device | |
CN107908428B (en) | Frame and page synchronous GPU (graphics processing Unit) graphics instruction buffer synchronization method | |
CN104899158A (en) | Memory access optimization method and memory access optimization device | |
CN102541808A (en) | Soc (System on chip) chip system and method for realizing configurable buffer in Soc chip | |
CN113268356B (en) | LINUX system-based multi-GPU board card bounding system, method and medium | |
CN113220346B (en) | Hardware circuit, data moving method, chip and electronic equipment | |
CN104156907A (en) | FPGA-based infrared preprocessing storage system and FPGA-based infrared preprocessing storage method | |
CN109189505B (en) | Method and system for reducing storage space occupied by object serialization | |
CN110795247B (en) | Efficient dynamic memory management method applied to MCU | |
CN109726023B (en) | Graphic task synchronous management method supporting multithreading and multiple contexts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |