CN118035163A - Method, system and storage medium for processing data in real time by GPU - Google Patents
Method, system and storage medium for processing data in real time by GPU Download PDFInfo
- Publication number
- CN118035163A CN118035163A CN202410426309.0A CN202410426309A CN118035163A CN 118035163 A CN118035163 A CN 118035163A CN 202410426309 A CN202410426309 A CN 202410426309A CN 118035163 A CN118035163 A CN 118035163A
- Authority
- CN
- China
- Prior art keywords
- data
- stored
- processing
- gpu
- reference count
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000004590 computer program Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Transfer Systems (AREA)
Abstract
The invention provides a method, a system and a storage medium for processing data in real time by a GPU, which comprises the following steps: step 1: adding a reference count to the data read in the ring buffer to mark whether the data is processed; step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register; step 3: the CPU receives the interrupt, reads the data address from the completion register, and takes the data out for subsequent processing. The beneficial effects of the invention are as follows: according to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.
Description
Technical Field
The invention relates to the technical field of data interaction between a display card and a processor in a computer, in particular to a method, a system and a storage medium for processing data in real time by a GPU.
Background
At present, rendering and calculation in a computer can be interacted by using a ring buffer (ringbuf), the GPU sequentially reads data according to the position of a read-write pointer of ringbuf, the read data is distributed to a rendering/calculating unit of the GPU for parallel processing according to the data characteristics, the parallel processed data is successively completed, and if interrupt is sequentially triggered to the CPU for processing according to the sequence of the read data from ringbuf, the processing performance of the GPU is affected.
Disclosure of Invention
The invention provides a method for processing data in real time by a GPU, which comprises the following steps:
step 1: adding a reference count to the data read in the ring buffer to mark whether the data is processed;
Step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register;
Step 3: the CPU receives the interrupt, reads the data address from the completion register, and takes the data out for subsequent processing.
As a further improvement of the present invention, in the step 1, the initial value of the reference count is N, where N is the number of core processing units, and when each core processing unit finishes processing data, the reference count is decremented by 1 until all core processing units complete data processing, and the reference count is 0.
As a further improvement of the present invention, in the step 1, 1< =n < =max_core, max_core is the maximum CORE number of the GPU.
As a further improvement of the invention, the method further comprises:
An item setting step: the command address is stored in each item in the ring buffer, a flag is set for each item, and whether the data is stored in the item is judged according to the value of the flag.
As a further improvement of the present invention, in the step of setting the entry, the initial value of the flag is true, true indicates that the entry is free, data can be stored, and when the data is stored in the entry, the flag is set to false, and false indicates that the entry is occupied, and the data in the entry cannot be covered; after the GPU processes the data to generate an interrupt, a mark is set to be true in an interrupt processing function, which indicates that new data can be stored; the true indication can store the data every time the state of the data first judging mark is reached, and the false indication that the data has not been processed yet is needed to wait for the data to be processed and then stored.
The invention also provides a system for processing data by the GPU in real time, which comprises: a memory, a processor and a computer program stored on said memory, said computer program being configured to implement the steps of the method of the invention when called by said processor.
The present invention also provides a computer-readable storage medium characterized in that: the computer readable storage medium stores a computer program configured to implement the steps of the method of the present invention when called by a processor.
The beneficial effects of the invention are as follows: according to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.
Drawings
Fig. 1 is a schematic diagram of the principles of the present invention.
Detailed Description
As shown in fig. 1, the invention discloses a method for processing data by a GPU in real time, which comprises the following steps:
step 1: adding a reference count (refcount) to the data read in the ring buffer (ringbuf) to mark whether the data is processed;
Step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register (done_reg);
Step 3: the CPU receives the interrupt, reads the data address from the completion register, and fetches the data therefrom for subsequent processing, thus eliminating the need for sequential fetching of data from ringbuf, and avoiding the possibility of mishandling of incomplete commands in sequence.
In step 1, the GPU will generally distribute the data to N (1 < = N < = max_core, max_core is the maximum CORE number of the GPU, determined by the GPU itself) CORE processing units (CORE) according to the characteristics of the data, and perform parallel processing, so that the initial value of the reference count is N, and when each CORE processing unit finishes processing the data, the reference count is decremented by 1 until all CORE processing units complete the data processing, and the reference count is 0.
If the CPU fetches data out of order from ringbuf for processing, but the read pointer of ringbuf is still to be updated, there is a risk that the data that is not completed is flushed, and this problem can be solved using a software approach, as a simple approach: since ringbuf is a static ring data structure, the command address is stored in each entry (entry) of ringbuf, a flag (flag) can be set for each entry, the flag initial value is true (true), true indicates that the entry is idle, data can be stored, when data is stored in the entry, the flag is set to false (false), the false indicates that the entry is being occupied, and the data in the entry cannot be covered; after the gpu processes the data to generate an interrupt, setting a flag to true in an interrupt processing function to indicate that new data can be stored; each time the data comes, the state of the flag is firstly judged, true indicates that the data can be stored, false indicates that the data is not processed, and the data needs to be stored after being processed.
According to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.
Claims (7)
1. A method for processing data in real time by a GPU, comprising the steps of:
step 1: adding a reference count to the data read in the ring buffer to mark whether the data is processed;
Step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register;
Step 3: the CPU receives the interrupt, reads the data address from the completion register, and takes the data out for subsequent processing.
2. The method of claim 1, wherein in step 1, the initial value of the reference count is N, N is the number of core processing units, and each core processing unit decrements the reference count by 1 when it finishes processing data until all core processing units complete data processing, and the reference count is 0.
3. The method according to claim 2, wherein in step 1, 1< = N < = max_core, max_core being the maximum number of COREs of the GPU.
4. A method according to any one of claims 1 to 3, further comprising:
An item setting step: the command address is stored in each item in the ring buffer, a flag is set for each item, and whether the data is stored in the item is judged according to the value of the flag.
5. The method of claim 4, wherein in the step of setting the entry, the initial value of the flag is true, true indicating that the entry is free, indicating that data can be stored, and when data is stored in the entry, the flag is set to false, indicating that the entry is occupied, and the data in the entry cannot be overwritten; after the GPU processes the data to generate an interrupt, a mark is set to be true in an interrupt processing function, which indicates that new data can be stored; each time the state of the data first judging mark is come, the true indicates that the data can be stored, the false indicates that the data is not processed yet, and the data needs to be stored after being processed.
6. A system for GPU real-time processing of data, comprising: a memory, a processor and a computer program stored on the memory, the computer program being configured to implement the steps of the method of any one of claims 1-5 when called by the processor.
7. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program configured to implement the steps of the method of any of claims 1-5 when called by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410426309.0A CN118035163B (en) | 2024-04-10 | 2024-04-10 | Method, system and storage medium for processing data in real time by GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410426309.0A CN118035163B (en) | 2024-04-10 | 2024-04-10 | Method, system and storage medium for processing data in real time by GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118035163A true CN118035163A (en) | 2024-05-14 |
CN118035163B CN118035163B (en) | 2024-08-06 |
Family
ID=90989507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410426309.0A Active CN118035163B (en) | 2024-04-10 | 2024-04-10 | Method, system and storage medium for processing data in real time by GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118035163B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1845087A (en) * | 2006-05-18 | 2006-10-11 | 北京中星微电子有限公司 | Interrupt handling method and interrupt handling apparatus |
CN110415161A (en) * | 2019-07-19 | 2019-11-05 | 龙芯中科技术有限公司 | Graphic processing method, device, equipment and storage medium |
CN111880916A (en) * | 2020-07-27 | 2020-11-03 | 长沙景嘉微电子股份有限公司 | Multi-drawing task processing method, device, terminal, medium and host in GPU |
CN113051082A (en) * | 2021-03-02 | 2021-06-29 | 长沙景嘉微电子股份有限公司 | Software and hardware data synchronization method and device, electronic equipment and storage medium |
-
2024
- 2024-04-10 CN CN202410426309.0A patent/CN118035163B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1845087A (en) * | 2006-05-18 | 2006-10-11 | 北京中星微电子有限公司 | Interrupt handling method and interrupt handling apparatus |
CN110415161A (en) * | 2019-07-19 | 2019-11-05 | 龙芯中科技术有限公司 | Graphic processing method, device, equipment and storage medium |
CN111880916A (en) * | 2020-07-27 | 2020-11-03 | 长沙景嘉微电子股份有限公司 | Multi-drawing task processing method, device, terminal, medium and host in GPU |
CN113051082A (en) * | 2021-03-02 | 2021-06-29 | 长沙景嘉微电子股份有限公司 | Software and hardware data synchronization method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN118035163B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7257665B2 (en) | Branch-aware FIFO for interprocessor data sharing | |
US5644784A (en) | Linear list based DMA control structure | |
US20020144099A1 (en) | Hardware architecture for fast servicing of processor interrupts | |
US7873763B2 (en) | Multi-reader multi-writer circular buffer memory | |
US7659904B2 (en) | System and method for processing high priority data elements | |
CN113704301B (en) | Data processing method, device, system, equipment and medium of heterogeneous computing platform | |
US6678755B1 (en) | Method and apparatus for appending memory commands during a direct memory access operation | |
EP1989620A2 (en) | Processing of high priority data elements in systems comprising a host processor and a co-processor | |
JPS6217876Y2 (en) | ||
CN115858167A (en) | Visual software image processing system, method, device, electronic equipment and medium | |
CN111857591B (en) | Method, apparatus, device and computer readable storage medium for executing instructions | |
EP0600165A1 (en) | Vector processing device | |
CN118035163B (en) | Method, system and storage medium for processing data in real time by GPU | |
US6518973B1 (en) | Method, system, and computer program product for efficient buffer level management of memory-buffered graphics data | |
KR20210110156A (en) | Caching device, cache, system, method and apparatus for processing data, and medium | |
US5343557A (en) | Workstation controller with full screen write mode and partial screen write mode | |
CN112767978B (en) | DDR command scheduling method, device, equipment and medium | |
CN115237349A (en) | Data read-write control method, control device, computer storage medium and electronic equipment | |
JPH0696007A (en) | Dma transfer system | |
US20030056037A1 (en) | Hardware chain pull | |
US6311266B1 (en) | Instruction look-ahead system and hardware | |
CN118363901B (en) | PCIe device, electronic component and electronic device | |
EP0410382A2 (en) | Data transfer controller using direct memory access method | |
JP2000029690A (en) | Method and device for data processing | |
US20230176785A1 (en) | METHOD AND SYSTEM TO ABORT A COMMAND IN PCIe BASED NON-VOLATILE MEMORY EXPRESS SOLID-STATE DRIVE |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |