CN110888737A - Ringbuffer implementation system and method supporting multiple GPUs - Google Patents

Ringbuffer implementation system and method supporting multiple GPUs Download PDF

Info

Publication number
CN110888737A
CN110888737A CN201911125585.9A CN201911125585A CN110888737A CN 110888737 A CN110888737 A CN 110888737A CN 201911125585 A CN201911125585 A CN 201911125585A CN 110888737 A CN110888737 A CN 110888737A
Authority
CN
China
Prior art keywords
ringbuffer
page
gpu
management module
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911125585.9A
Other languages
Chinese (zh)
Inventor
马城城
聂曌
刘晖
张琛
张兴雷
王晨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN201911125585.9A priority Critical patent/CN110888737A/en
Publication of CN110888737A publication Critical patent/CN110888737A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention belongs to the technical field of computer application, and particularly relates to a Ringbuffer implementation system supporting multiple GPUs. The system comprises a Ringbuffer page table management module (1), a multi-GPU task management module (2), a data buffer module (3) and a GPU task buffer module (4). The invention provides a Ringbuffer implementation method supporting multiple GPUs by constructing a mapping relation between a Ringbuffer page space and the multiple GPUs.

Description

Ringbuffer implementation system and method supporting multiple GPUs
The invention belongs to the technical field of computer application, and particularly relates to a Ringbuffer implementation system and method supporting multiple GPUs.
Background
Since the advent of the GPU, the GPU has played an important role in the fields requiring high performance computing, such as image video processing, physics, bioscience, chemistry, artificial intelligence, and the like, due to its ultra-strong computing power. In order to complete the increasingly complex graphic image processing task, the general computing task and the reduction of the computing time, a plurality of GPU cards are required to be used for computing at the same time. Ringbuffer is a technology that can effectively improve the memory allocation and usage efficiency. Whether Ringbuffer can be flexibly and efficiently used to manage and distribute the calculation tasks on a plurality of GPUs has direct influence on the task completion efficiency.
Disclosure of Invention
The purpose of the invention is as follows:
in order to solve the problems, the invention provides a Ringbuffer implementation method supporting multiple GPUs, and the method is characterized in that through constructing the mapping relation between the Ringbuffer page space and the multiple GPUs, data of the multiple GPUs are uniformly managed by one Ringbuffer module, so that the memory space is saved, the flexible distribution of graphics or the operation of calculation tasks on the multiple GPUs is realized, and the operation efficiency of the tasks is improved.
The technical scheme is as follows:
the invention provides a Ringbuffer implementation system supporting multiple GPUs (graphic processing units), which comprises a first GPU and a second GPU
Comprises a Ringbuffer page table management module (1), a multi-GPU task management module (2) and a number
And (4) GPU task buffer modules according to the buffer modules (3) and (4).
Further, the Ringbuffer page table management module (1) is used for realizing page management of the data buffering module (3), the module divides the Ringbuffer data buffering module (3) into a plurality of page spaces with the same size, and each page has own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
Further, the multi-GPU task management module (2) is configured to receive multi-GPU task configuration information configured by a user, allocate a GPU to be executed for a graphics or a computing task generated by the Ringbuffer page table management module (1), and then return task allocation information to the Ringbuffer page table management module (1).
Further, the data buffer module (3) is configured to receive and store incoming data according to the division of the Ringbuffer page table management module (1), and send data in a page to a corresponding GPU task buffer under the control of the Ringbuffer page table management module (1).
Further, the GPU task buffer module (4) is configured to buffer tasks of each GPU, independently receive task data from the data buffer module (3), and reply a task completion signal to the Ringbuffer page of the data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
Another object of the present invention is to provide an implementation method of a Ringbuffer implementation system supporting multiple GPUs, which includes the following steps:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
Has the advantages that:
the invention provides a Ringbuffer implementation method supporting multiple GPUs by constructing the mapping relation between the Ringbuffer page space and the multiple GPUs.
Drawings
FIG. 1 is a schematic diagram of a Ringbuffer implementation method supporting multiple GPUs according to the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention will be further described with reference to the accompanying drawings in which:
as shown in fig. 1, the system for implementing Ringbuffer supporting multiple GPUs includes a Ringbuffer page table management module (1), a multiple-GPU task management module (2), a data buffering module (3), and a GPU task buffering module (4).
The Ringbuffer page table management module (1) is used for realizing page management of the data buffer module (3), the data buffer module (3) of the Ringbuffer is divided into a plurality of page spaces with the same size by the Ringbuffer page table management module, and each page has own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
The multi-GPU task management module (2) is used for receiving multi-GPU task configuration information configured by a user, distributing executed GPUs for graphics or computing tasks generated by the Ringbuffer page table management module (1), and then returning task distribution information to the Ringbuffer page table management module (1).
The data buffer module (3) is used for receiving and storing the incoming data according to the division of the Ringbuffer page table management module (1), and sending the data in the page to the corresponding GPU task buffer under the control of the Ringbuffer page table management module (1).
The GPU task buffer module (4) is used for buffering tasks of each GPU, independently receives task data from the data buffer module (3), and replies a task completion signal to a Ringbuffer page of the data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
The implementation method of the Ringbuffer implementation system supporting multiple GPUs comprises the following steps:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. The Ringbuffer implementation system supporting multiple GPUs is characterized by comprising a Ringbuffer page table management module (1), a multiple GPU task management module (2), a data buffering module (3) and a GPU task buffering module (4).
2. The multiple-GPU supporting Ringbuffer implementation system of claim 1, characterized in that the Ringbuffer page table management module (1) is used for implementing page management to the data buffer module (3), and the Ringbuffer page table management module divides the data buffer module (3) of the Ringbuffer into a plurality of page spaces with the same size, and each page has its own internal attribute;
the internal attribute comprises a first address, a data input address, a use state and a target GPU;
when a user inputs data to the data buffer module (3), firstly, judging the size of the residual space of the current Ringbuffer page and a data input address through the write-in management function of the Ringbuffer page table management module (1), and when the size of the Ringbuffer page space is not enough or a page switching instruction is received, realizing switching of the Ringbuffer buffer page through the synchronous management function;
meanwhile, combining (2) a multi-GPU task management module to realize target GPU setting operation of a Ringbuffer page, and after (1) the Ringbuffer page table management module records GPU task buffering to which data in the Ringbuffer page is to be sent according to multi-GPU task allocation information obtained from (4) a GPU task buffering module;
and the Ringbuffer page table management module (1) receives task completion signals fed back by each GPU task buffer, realizes the Ringbuffer page recovery operation, and recovers a page in a sent state in the Ringbuffer page table management module (1) after receiving the task completion signals returned by all target GPU task buffers, and initializes the page to be in an unused state.
3. The multiple-GPU supporting Ringbuffer implementation system according to claim 1, wherein the multiple-GPU task management module (2) is configured to accept multiple-GPU task configuration information configured by a user, allocate a GPU for execution for graphics or computing tasks generated by the Ringbuffer page table management module (1), and then return the task allocation information to the Ringbuffer page table management module (1).
4. The multiple-GPU-capable Ringbuffer implementation system according to claim 1, wherein the data buffer module (3) is configured to receive and store incoming data according to the division of the Ringbuffer page table management module (1), and the Ringbuffer page table management module (1) controls to send data in a page to a corresponding GPU task buffer.
5. The multiple-GPU supporting Ringbuffer implementation system of claim 1, wherein the GPU task buffer module (4) is used for buffering tasks of each GPU, independently receives task data from the data buffer module (3), and replies a task completion signal to a Ringbuffer page of data sent from the Ringbuffer page table management module (1) after sending the data to the GPU.
6. The method for implementing a multi-GPU enabled Ringbuffer implementation system as claimed in any of claims 1-5, said method comprising the steps of:
①, acquiring the multi-GPU task configuration information of the user and storing the information in the multi-GPU task management module (2);
②, acquiring the current page data input address and the page residual space of the Ringbuffer data buffer module (3) through the write-in management function of the Ringbuffer page table management module (1);
③, writing command data;
④, repeating steps ② - ③ until the residual space of the current Ringbuffer page is not enough or a page synchronization management command is received, setting the page state to be configured, then switching the write page to the next page in an unused state, and continuing to execute step ④;
⑤, the Ringbuffer page table management module (1) generates the data content in the page to be configured into graphics or calculation tasks, and sends the graphics or calculation tasks to the multi-GPU task management module (2);
⑥, the multi-GPU task management module (2) allocates tasks to a plurality of GPU task buffers of the GPU task buffer module (4) according to the multi-GPU task configuration information input by a user;
⑦, acquiring multi-GPU task allocation information from the GPU task buffer module (4) by the Ringbuffer page table management module (1), and recording GPU task buffer to which data in a page to be configured is sent;
⑧, sending data to GPU by GPU task buffer in GPU task buffer module (4), replying task completion signal to Ringbuffer page table management module (1), when all corresponding GPU task buffers of Ringbuffer page which has sent data have replied task completion signal, recovering the page, and initializing it to unused state.
CN201911125585.9A 2019-11-18 2019-11-18 Ringbuffer implementation system and method supporting multiple GPUs Pending CN110888737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911125585.9A CN110888737A (en) 2019-11-18 2019-11-18 Ringbuffer implementation system and method supporting multiple GPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911125585.9A CN110888737A (en) 2019-11-18 2019-11-18 Ringbuffer implementation system and method supporting multiple GPUs

Publications (1)

Publication Number Publication Date
CN110888737A true CN110888737A (en) 2020-03-17

Family

ID=69747733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911125585.9A Pending CN110888737A (en) 2019-11-18 2019-11-18 Ringbuffer implementation system and method supporting multiple GPUs

Country Status (1)

Country Link
CN (1) CN110888737A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808001A (en) * 2021-11-19 2021-12-17 南京芯驰半导体科技有限公司 Method and system for single system to simultaneously support multiple GPU (graphics processing Unit) work

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120001905A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Seamless Integration of Multi-GPU Rendering
CN102597950A (en) * 2009-09-03 2012-07-18 先进微装置公司 Hardware-based scheduling of GPU work
CN106708601A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 GPU-oriented virtual IO ringbuffer realization method
CN107124286A (en) * 2016-02-24 2017-09-01 深圳市知穹科技有限公司 A kind of mass data high speed processing, the system and method for interaction
CN107908428A (en) * 2017-11-24 2018-04-13 中国航空工业集团公司西安航空计算技术研究所 A kind of frame, the GPU graphics command buffer synchronisation methods of page synchronization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102597950A (en) * 2009-09-03 2012-07-18 先进微装置公司 Hardware-based scheduling of GPU work
US20120001905A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Seamless Integration of Multi-GPU Rendering
CN107124286A (en) * 2016-02-24 2017-09-01 深圳市知穹科技有限公司 A kind of mass data high speed processing, the system and method for interaction
CN106708601A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 GPU-oriented virtual IO ringbuffer realization method
CN107908428A (en) * 2017-11-24 2018-04-13 中国航空工业集团公司西安航空计算技术研究所 A kind of frame, the GPU graphics command buffer synchronisation methods of page synchronization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808001A (en) * 2021-11-19 2021-12-17 南京芯驰半导体科技有限公司 Method and system for single system to simultaneously support multiple GPU (graphics processing Unit) work

Similar Documents

Publication Publication Date Title
CN110442560B (en) Log replay method, device, server and storage medium
CN104407933A (en) Data backup method and device
CN112035381A (en) Storage system and storage data processing method
US11010056B2 (en) Data operating method, device, and system
CN103488580A (en) Method for establishing address mapping table of solid-state memory
EP3846036A1 (en) Matrix storage method, matrix access method, apparatus and electronic device
CN103760525A (en) Completion type in-place matrix transposition method
CN112967666B (en) LED display screen control device and control method capable of realizing random pixel arrangement
CN108958950A (en) Task management method, host and the storage device of electronic storage device
US7865632B2 (en) Memory allocation and access method and device using the same
CN104765701A (en) Data access method and device
CN104252422A (en) Memory access method and memory controller
CN110888737A (en) Ringbuffer implementation system and method supporting multiple GPUs
CN101127847A (en) A screen display synthesis method and synthesis device
CN102542525B (en) Information processing equipment and information processing method
CN116719764A (en) Data synchronization method, system and related device
CN107908428B (en) Frame and page synchronous GPU (graphics processing Unit) graphics instruction buffer synchronization method
CN104899158A (en) Memory access optimization method and memory access optimization device
CN102541808A (en) Soc (System on chip) chip system and method for realizing configurable buffer in Soc chip
CN113268356B (en) LINUX system-based multi-GPU board card bounding system, method and medium
CN113220346B (en) Hardware circuit, data moving method, chip and electronic equipment
CN104156907A (en) FPGA-based infrared preprocessing storage system and FPGA-based infrared preprocessing storage method
CN109189505B (en) Method and system for reducing storage space occupied by object serialization
CN110795247B (en) Efficient dynamic memory management method applied to MCU
CN109726023B (en) Graphic task synchronous management method supporting multithreading and multiple contexts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination