CN118035163A

CN118035163A - Method, system and storage medium for processing data in real time by GPU

Info

Publication number: CN118035163A
Application number: CN202410426309.0A
Authority: CN
Inventors: 唐春有; 莫潘良; 郭一帆; 周晓军
Original assignee: Icube Corp ltd
Current assignee: Icube Corp ltd
Priority date: 2024-04-10
Filing date: 2024-04-10
Publication date: 2024-05-14
Anticipated expiration: 2044-04-10
Also published as: CN118035163B

Abstract

The invention provides a method, a system and a storage medium for processing data in real time by a GPU, which comprises the following steps: step 1: adding a reference count to the data read in the ring buffer to mark whether the data is processed; step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register; step 3: the CPU receives the interrupt, reads the data address from the completion register, and takes the data out for subsequent processing. The beneficial effects of the invention are as follows: according to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.

Description

Method, system and storage medium for processing data in real time by GPU

Technical Field

The invention relates to the technical field of data interaction between a display card and a processor in a computer, in particular to a method, a system and a storage medium for processing data in real time by a GPU.

Background

At present, rendering and calculation in a computer can be interacted by using a ring buffer (ringbuf), the GPU sequentially reads data according to the position of a read-write pointer of ringbuf, the read data is distributed to a rendering/calculating unit of the GPU for parallel processing according to the data characteristics, the parallel processed data is successively completed, and if interrupt is sequentially triggered to the CPU for processing according to the sequence of the read data from ringbuf, the processing performance of the GPU is affected.

Disclosure of Invention

The invention provides a method for processing data in real time by a GPU, which comprises the following steps:

step 1: adding a reference count to the data read in the ring buffer to mark whether the data is processed;

Step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register;

Step 3: the CPU receives the interrupt, reads the data address from the completion register, and takes the data out for subsequent processing.

As a further improvement of the present invention, in the step 1, the initial value of the reference count is N, where N is the number of core processing units, and when each core processing unit finishes processing data, the reference count is decremented by 1 until all core processing units complete data processing, and the reference count is 0.

As a further improvement of the present invention, in the step 1, 1< =n < =max_core, max_core is the maximum CORE number of the GPU.

As a further improvement of the invention, the method further comprises:

An item setting step: the command address is stored in each item in the ring buffer, a flag is set for each item, and whether the data is stored in the item is judged according to the value of the flag.

As a further improvement of the present invention, in the step of setting the entry, the initial value of the flag is true, true indicates that the entry is free, data can be stored, and when the data is stored in the entry, the flag is set to false, and false indicates that the entry is occupied, and the data in the entry cannot be covered; after the GPU processes the data to generate an interrupt, a mark is set to be true in an interrupt processing function, which indicates that new data can be stored; the true indication can store the data every time the state of the data first judging mark is reached, and the false indication that the data has not been processed yet is needed to wait for the data to be processed and then stored.

The invention also provides a system for processing data by the GPU in real time, which comprises: a memory, a processor and a computer program stored on said memory, said computer program being configured to implement the steps of the method of the invention when called by said processor.

The present invention also provides a computer-readable storage medium characterized in that: the computer readable storage medium stores a computer program configured to implement the steps of the method of the present invention when called by a processor.

The beneficial effects of the invention are as follows: according to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.

Drawings

Fig. 1 is a schematic diagram of the principles of the present invention.

Detailed Description

As shown in fig. 1, the invention discloses a method for processing data by a GPU in real time, which comprises the following steps:

step 1: adding a reference count (refcount) to the data read in the ring buffer (ringbuf) to mark whether the data is processed;

Step 2: when the reference count is 0, automatically triggering an interrupt to the CPU to process, and simultaneously writing the address where the data is located into a completion register (done_reg);

Step 3: the CPU receives the interrupt, reads the data address from the completion register, and fetches the data therefrom for subsequent processing, thus eliminating the need for sequential fetching of data from ringbuf, and avoiding the possibility of mishandling of incomplete commands in sequence.

In step 1, the GPU will generally distribute the data to N (1 < = N < = max_core, max_core is the maximum CORE number of the GPU, determined by the GPU itself) CORE processing units (CORE) according to the characteristics of the data, and perform parallel processing, so that the initial value of the reference count is N, and when each CORE processing unit finishes processing the data, the reference count is decremented by 1 until all CORE processing units complete the data processing, and the reference count is 0.

If the CPU fetches data out of order from ringbuf for processing, but the read pointer of ringbuf is still to be updated, there is a risk that the data that is not completed is flushed, and this problem can be solved using a software approach, as a simple approach: since ringbuf is a static ring data structure, the command address is stored in each entry (entry) of ringbuf, a flag (flag) can be set for each entry, the flag initial value is true (true), true indicates that the entry is idle, data can be stored, when data is stored in the entry, the flag is set to false (false), the false indicates that the entry is being occupied, and the data in the entry cannot be covered; after the gpu processes the data to generate an interrupt, setting a flag to true in an interrupt processing function to indicate that new data can be stored; each time the data comes, the state of the flag is firstly judged, true indicates that the data can be stored, false indicates that the data is not processed, and the data needs to be stored after being processed.

According to the invention, the reference count is utilized to realize the first-completed data to trigger the interrupt, and the completion register is utilized to process the completed data in the CPU, so that the real-time processing of the data is realized, and the performance of the GPU is improved.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A method for processing data in real time by a GPU, comprising the steps of:

2. The method of claim 1, wherein in step 1, the initial value of the reference count is N, N is the number of core processing units, and each core processing unit decrements the reference count by 1 when it finishes processing data until all core processing units complete data processing, and the reference count is 0.

3. The method according to claim 2, wherein in step 1, 1< = N < = max_core, max_core being the maximum number of COREs of the GPU.

4. A method according to any one of claims 1 to 3, further comprising:

5. The method of claim 4, wherein in the step of setting the entry, the initial value of the flag is true, true indicating that the entry is free, indicating that data can be stored, and when data is stored in the entry, the flag is set to false, indicating that the entry is occupied, and the data in the entry cannot be overwritten; after the GPU processes the data to generate an interrupt, a mark is set to be true in an interrupt processing function, which indicates that new data can be stored; each time the state of the data first judging mark is come, the true indicates that the data can be stored, the false indicates that the data is not processed yet, and the data needs to be stored after being processed.

6. A system for GPU real-time processing of data, comprising: a memory, a processor and a computer program stored on the memory, the computer program being configured to implement the steps of the method of any one of claims 1-5 when called by the processor.

7. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program configured to implement the steps of the method of any of claims 1-5 when called by a processor.