CN116263982B - Graphics processor, system, method, electronic device and apparatus - Google Patents
Graphics processor, system, method, electronic device and apparatus Download PDFInfo
- Publication number
- CN116263982B CN116263982B CN202210414541.3A CN202210414541A CN116263982B CN 116263982 B CN116263982 B CN 116263982B CN 202210414541 A CN202210414541 A CN 202210414541A CN 116263982 B CN116263982 B CN 116263982B
- Authority
- CN
- China
- Prior art keywords
- block
- pixel
- primitive
- covered
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012545 processing Methods 0.000 claims abstract description 192
- 230000015654 memory Effects 0.000 claims abstract description 50
- 230000008569 process Effects 0.000 claims description 38
- 238000003672 processing method Methods 0.000 claims description 14
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 abstract description 6
- 238000009877 rendering Methods 0.000 description 32
- 238000012360 testing method Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 239000012634 fragment Substances 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000002674 ointment Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Image Generation (AREA)
Abstract
The present disclosure provides a graphics processor, a system, a method, an electronic device, and an apparatus. The graphics processor includes: the rasterizing processing module is used for reading the primitive information of the primitives covered by the first image blocks block by block, rasterizing the primitives covered by the first image blocks block by block according to the primitive information to obtain the pixel information of the primitives covered by the first image blocks, and storing the pixel information of the primitives covered by each first image block into the memory according to the second image block group; and the pixel processing module is used for reading the pixel information covered by the second image block from the memory block by block after the pixel information of the image elements covered by each first image block is all stored in the memory, and carrying out pixel processing block by block according to the read pixel information. The technical scheme provided by the embodiment of the disclosure reduces the development difficulty of the GPU and compresses the development period of the GPU.
Description
Technical Field
The present disclosure relates to the field of graphics rendering technologies, and in particular, to a graphics processor, a graphics processing system, a graphics processing method, an electronic device, and an electronic apparatus.
Background
Tile-Based Rendering architectures (e.g., TBR (Tile-Based Rendering), TBDR (Tile Based deferred render, delayed Rendering Based on tiles)) reduce the memory bandwidth requirements of a GPU (Graphics Processing Unit, graphics processor) when Rendering relative to IMR (Immediate Mode Rendering ) architectures.
A typical GPU rendering pipeline flow for a tile-based rendering architecture includes Geometry processing, tile partitioning (Tilling), rasterization, and pixel processing. Wherein the geometry processing includes vertex shaders (vertex shaders) and geometry shaders (geometry shaders); pixel processing includes fragment shader (fragment shader) and pixel post-processing.
When the image processing performance needs to be optimized, both rasterization and pixel processing in the GPU rendering pipeline flow of the tile-based rendering architecture need to be optimized. The existing image rendering pipeline optimization mode often causes the development difficulty of the GPU to be increased and the development period to be prolonged.
Disclosure of Invention
The disclosure aims to provide a graphics processor, a graphics processing system, a graphics processing method, an electronic device and electronic equipment, which aim to reduce GPU development difficulty and compress GPU development cycle.
According to one aspect of the present disclosure, there is provided a graphic processor, wherein the graphic processor includes:
the rasterizing processing module is used for reading the primitive information of the primitives covered by the first image blocks block by block, rasterizing the primitives covered by the first image blocks according to the primitive information block by block to obtain the pixel information of the primitives covered by the first image blocks, and storing the pixel information of the primitives covered by each first image block into the memory according to the second image block group;
and the pixel processing module is used for reading the pixel information covered by the second image block from the memory block by block after the pixel information of the image elements covered by each first image block is all stored in the memory, and carrying out pixel processing block by block according to the read pixel information.
Optionally, the size of the first tile is determined according to the buffer capacity corresponding to the rasterizing module.
Optionally, the size of the second tile is determined according to the buffer capacity corresponding to the pixel processing module.
Based on any one of the above graphics processor embodiments, the graphics processor further includes a tile dividing module, where the tile dividing module is configured to determine primitives covered by each first tile, and store, in the memory, primitive indexes of the primitives covered by each first tile according to the first tile group, so that the rasterizing processing module reads primitive information of the primitives covered by the first tile according to the primitive indexes of the primitives covered by the first tile.
Further, the pixel information includes frame identification information of the current frame. Correspondingly, the pixel processing result obtained by performing pixel processing on the pixel information according to the read pixel information block by block carries frame identification information.
On the basis of any of the above graphics processor embodiments, the pixel information covered by the second tile may be stored in a linked list.
On the basis of any one of the above-described graphics processor embodiments, the pixel processing module includes a plurality of arithmetic logic units, the task of performing pixel processing on a block-by-block basis according to the read pixel information is performed by the plurality of arithmetic logic units, the cache corresponding to the pixel processing module is a local storage space of the plurality of arithmetic logic units, and the plurality of arithmetic logic units multiplex the arithmetic logic units for computation.
According to another aspect of the present disclosure, there is also provided a graphics processing system including the graphics processor of any one of the embodiments described above.
According to another aspect of the present disclosure, there is also provided an electronic device including the graphics processing system of the above embodiments.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including the electronic device described in the above embodiment.
According to another aspect of the present disclosure, there is also provided a graphic processing method applied to a graphic processor, the graphic processing method including a rasterization process and a pixel process.
The rasterization processing process comprises the following steps: the method comprises the steps of reading pixel information of pixels covered by a first image block by block, carrying out rasterization processing on the pixels covered by the first image block according to the pixel information block by block to obtain pixel information of the pixels covered by the first image block, and storing the pixel information of the pixels covered by each first image block into a memory according to a second image block group;
and after all pixel information of the primitives covered by each first image block is stored in the memory, executing a pixel processing process, wherein the pixel processing process comprises the following steps: and reading pixel information covered by the second image block from the memory block by block, and carrying out pixel processing according to the read pixel information block by block.
Optionally, the size of the first tile is determined according to the buffer capacity corresponding to the rasterization result.
Optionally, the size of the second tile is determined according to the buffer capacity corresponding to the pixel processing result.
On the basis of any one of the above embodiments of the graphics processing method, before the primitive information of the primitives covered by the first tile is read block by block, the primitive covered by each first tile may be determined, and the primitive index of the primitive covered by each first tile is saved to the memory according to the first tile group, so that the primitive information of the primitive covered by the first tile is read according to the primitive index of the primitive covered by the first tile in the rasterization process.
Further, the pixel information may include frame identification information of the current frame. Correspondingly, the pixel processing result obtained by performing pixel processing on the read pixel information block by block carries frame identification information.
On the basis of any of the above embodiments of the graphics processing method, the pixel information covered by the second tile may be stored in a linked list.
On the basis of any one of the above embodiments of the graphics processing method, the pixel processing is performed on a block-by-block basis according to the read pixel information, and a specific implementation manner of the method may include:
and carrying out pixel processing according to the read pixel information block by multiplexing a plurality of arithmetic logic units for calculation, wherein a buffer memory corresponding to a pixel processing result is a local storage space of the plurality of arithmetic logic units.
Drawings
FIG. 1 is a schematic diagram of a data structure of primitives according to some embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a computing arithmetic logic unit and a memory device according to some embodiments of the present disclosure;
FIG. 3 is a schematic diagram of a graphics processing system according to some embodiments of the present disclosure;
fig. 4 is a flow diagram of a graphics processing method according to some embodiments of the present disclosure.
Detailed Description
Before describing embodiments of the present disclosure, it should be noted that:
some embodiments of the disclosure are described as process flows, in which the various operational steps of the flows may be numbered sequentially, but may be performed in parallel, concurrently, or simultaneously.
The terms "first," "second," and the like may be used in embodiments of the present disclosure to describe various features, but these features should not be limited by these terms. These terms are only used to distinguish one feature from another.
The term "and/or," "and/or" may be used in embodiments of the present disclosure to include any and all combinations of one or more of the associated features listed.
It will be understood that when two elements are described in a connected or communicating relationship, unless a direct connection or direct communication between the two elements is explicitly stated, connection or communication between the two elements may be understood as direct connection or communication, as well as indirect connection or communication via intermediate elements.
In order to make the technical solutions and advantages of the embodiments of the present disclosure more apparent, the following detailed description of exemplary embodiments of the present disclosure is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments of which are exhaustive. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.
The graphics processor provided by the embodiments of the present disclosure uses a tile-based rendering architecture. The graphics processor provided by the embodiment of the disclosure decouples geometric processing, rasterizing and pixel processing, so that the size of a block corresponding to the rasterizing processing and the size of a block corresponding to the pixel processing can be different, and the rasterizing processing and the pixel processing can be optimized respectively.
Accordingly, the 3D rendering pipeline is split into two pass. One pass performs geometry processing and rasterization and the other pass performs pixel processing. And the data between the two pass are interacted through the memory.
In some embodiments, a graphics processor provided by embodiments of the present disclosure includes at least a rasterization processing module and a pixel processing module.
Wherein, the rasterization processing module is used for: and reading the pixel information of the pixels covered by the first image blocks block by block, rasterizing the pixels covered by the first image blocks according to the pixel information block by block to obtain the pixel information of the pixels covered by the first image blocks, and storing the pixel information of the pixels covered by each first image block into a memory according to the second image block group.
The pixel processing module is used for: after all the pixel information of the primitives covered by each first image block is stored in the memory, the pixel information covered by the second image block is read from the memory block by block, and pixel processing is performed block by block according to the read pixel information.
In an embodiment of the present disclosure, the first tile is sized to satisfy the processing power of the rasterization processing module. That is, the size of the first tile is not greater than the corresponding cache capacity of the rasterization processing module. In the present disclosure, the cache corresponding to the rasterizing processing module refers to a cache for temporarily storing data generated during and after the processing of the rasterizing processing module. By way of example and not limitation, the corresponding cache of the rasterization processing module includes a depth buffer (depth buffer).
In practical applications, the size of the first tile may be determined according to the buffer capacity corresponding to the rasterization processing module.
By way of example and not limitation, in some embodiments, the cache capacity corresponding to the rasterization processing module is a×b, and then the size of the first tile is a×b. With this size, the image frame is divided into N first tiles. And the rasterizing processing module reads the primitive information of the primitive covered by one first image block at a time according to a preset first image block processing sequence, and performs rasterizing processing on the primitive covered by the current first image block according to the read primitive information to obtain the pixel information of the primitive covered by the current first image block. After the rasterization processing of all the primitives covered by one first image block is completed, the primitive information of the primitive covered by the next first image block is read and the corresponding rasterization processing is carried out.
In the embodiments of the present disclosure, the pixels of the primitive covered by the first tile may be determined by a pixel coverage test of the rasterization stage. The pixel of the primitive refers to the pixel covered by the primitive, and correspondingly, the pixel information of the primitive covered by the first primitive refers to the information of the pixel covered by the primitive covered by the first primitive.
It should be noted that the rasterization process may also include pixel interpolation calculations, depth testing, stencil testing, and the like.
In an embodiment of the present disclosure, the pixel information of the primitive includes, but is not limited to, at least one of:
a sample mark (sample mask) for obtaining a sample to be calculated from the sample mark at a pixel processing stage, and performing an antialiasing operation;
a sample or pixel depth value for performing a depth test on a pixel having an alpha attribute during a pixel processing stage, the depth value being generated during the pixel processing stage when the pixel has an alpha channel or the alpha test is enabled;
the primitive identifier is used for determining the primitive to which the pixel belongs, and a plurality of primitive identifiers can be arranged corresponding to the pixels which are not subjected to the depth test.
Optionally, the pixel information may further include frame identification information of the current frame. The frame identification information is input by software for frame synchronization with the pixel processing stage, and the frame synchronization is mainly maintained at the software level.
If the pixel information includes frame identification information of the current frame, the pixel processing result obtained by performing pixel processing on the block-by-block basis according to the read pixel information carries the frame identification information.
In an embodiment of the present disclosure, the second tile is sized to meet the processing power of the pixel processing module. That is, the size of the second tile is not greater than the corresponding buffer capacity of the pixel processing module. In the present disclosure, the buffer corresponding to the pixel processing module refers to a buffer for temporarily storing data generated during and after the processing of the pixel processing module. By way of example and not limitation, the corresponding buffer of the pixel processing module is a G buffer (G buffer).
In practical applications, the size of the second tile may be determined according to the buffer capacity corresponding to the pixel processing module.
By way of example and not limitation, in some embodiments, the corresponding buffer capacity of the pixel processing module is c×d, and then the size of the second tile is c×d. In this size, the pixel data of the image frame may be divided into M groups, and a single group of pixel information is referred to as pixel information covered by the second tile. The pixel data of the image frame is pixel data obtained by rasterizing all the pixels in the image frame.
Accordingly, the rasterizing processing module stores in memory groups of pixel data for the image frame, each group including pixel data for c×d pixels, in accordance with the size of the second tile. In practice, the pixels of the image frame may be less than mxc×d. For this case, in some embodiments, the last set of pixel data is the actual data amount; in other embodiments, the last set of pixel data still corresponds to the pixel data of c×d pixels by data padding.
Correspondingly, the pixel processing module reads the pixel information of the size of c×d each time (i.e. reads the pixel information covered by one second tile each time) according to the predetermined second tile processing sequence, and performs pixel processing according to the read pixel information.
In the disclosed embodiments, pixel processing may include, but is not limited to, operations including shading processing, pixel post-processing, and the like.
By way of example and not limitation, in some embodiments, each set of pixel data corresponds to a second tile data structure. A second tile data structure stores a set of pixel data having a size of a second tile. Each time the rasterizing processing module processes a primitive covered by a first image block, searching a second image block data structure for last storing pixel data in a memory; if the second block data structure is not full, firstly storing the pixel data obtained by the rasterization processing in the second block data structure; if the remaining space of the second image block data structure is insufficient to store all pixel data obtained by the rasterization processing, continuing to store in the next second image block data structure; if the second block data structure of the last time of storing the pixel data in the memory is full, the pixel data obtained by the rasterization process is stored in the next second block data structure.
Wherein if each second tile has pixel information for a fixed number of pixels (i.e., each group contains pixel information for the same number of pixels), the second tile data structure may be, but is not limited to, a linked list.
In the embodiment of the disclosure, the size of the first tile may be the same as or different from the size of the second tile.
In existing tile-based rendering architectures, both rasterization and pixel processing are based on the same tile size. When multiple samples (multisamples) are enabled, the tile size is reduced. The inventors have found in the course of implementing the present invention that the pixel granularity that the rasterization stage and the pixel processing stage actually require to process is not a binding relationship. For example, rasterization generally requires granularity at sample (sample), but the pixel processing stage may be performed at pixel (pixel) granularity. Therefore, it is not reasonable to use the same tile size for rasterization and pixel processing. According to the technical scheme provided by the embodiment of the disclosure, since the rasterization and the pixel processing are decoupled, the rasterization and the pixel processing can correspond to different block sizes. The first tile size corresponding to the rasterization process may be determined only by the rasterization processing capability, while the second tile size corresponding to the pixel process may be determined only by the pixel processing capability. Both the rasterization processing and the pixel processing can achieve maximization of processing efficiency in accordance with the respective processing capacities.
Based on any of the above graphics processor embodiments, the graphics processor may further include a tile dividing module, where the tile dividing module is configured to determine primitives covered by each first tile, and store, in the memory, primitive indexes of the primitives covered by each first tile according to the first tile group, so that the rasterizing processing module reads primitive information of the primitives covered by the first tile according to the primitive indexes of the primitives covered by the first tile.
Primitive information is generated in a geometric stage, and the primitive information comprises vertex attribute information (such as vertex color, vertex coordinates and the like) of primitives and basic information (such as primitive identifications and the like) of the primitives.
By way of example and not limitation, as shown in FIG. 1, in some embodiments, the data structure holding primitive information is primitive blocks (primitive blocks), each primitive block including primitive information for a plurality of primitives. Each first tile corresponds to a first tile data structure, and each first tile data structure comprises a primitive index corresponding to primitives covered by the first tile, wherein the primitive index indicates the storage address of primitive information of the primitives. For example, the first Tile0 covers four primitives: primitive 1-0, primitive 1-1, primitive 2-0, and primitive 3-1. The primitive information of the primitive 1-0 is primitive information 0 in the first primitive block, the primitive information of the primitive 1-1 is primitive information 1 in the first primitive block, the primitive information of the primitive 2-0 is primitive information 0 in the second primitive block, and the primitive information of the primitive 3-1 is primitive information 1 in the third primitive block. Then, the first Tile data structure corresponding to the first Tile0 includes the primitive index of the primitive 1-0 (in this embodiment, the primitive index of the primitive 1-0 is the storage address of the primitive information 0 in the first primitive block), the primitive index of the primitive 1-1 (in this embodiment, the primitive index of the primitive 1-1 is the storage address of the primitive information 1 in the first primitive block), the primitive index of the primitive 2-0 (in this embodiment, the primitive index of the primitive 2-0 is the storage address of the primitive information 0 in the second primitive block), and the primitive index of the primitive 3-1 (in this embodiment, the primitive index of the primitive 3-1 is the storage address of the primitive information 1 in the third primitive block). For another example, the first Tile1 covers three primitives: primitive 1-1, primitive 2-0, and primitive 3-0. The description of the primitives 1-1 and 2-0 may be referred to above, and will not be repeated here. The primitive information of primitive 3-0 is primitive information 0 in the third primitive block. Then, the first Tile data structure corresponding to the first Tile1 includes the primitive index of the primitive 1-1, the primitive index of the primitive 2-0 and the primitive index of the primitive 3-0 (in this embodiment, the primitive index of the primitive 3-0 is the storage address of the primitive information 0 in the third primitive block).
In practical application, the primitive identifier may also be used as a primitive index.
In the embodiment of the disclosure, the tile dividing module sends the storage address of the tile data structure to the rasterizing processing module, so that the rasterizing processing module reads the primitive index of the primitive covered by the first tile from each tile data structure block by block according to the storage address, and further reads the primitive information according to the primitive index.
On the basis of any one of the above graphics processor embodiments, the pixel processing module includes a plurality of arithmetic logic units, and tasks of performing pixel processing on a block-by-block basis according to the read pixel information are performed by the plurality of arithmetic logic units, and the cache corresponding to the pixel processing module is a local storage space of the plurality of arithmetic logic units.
Wherein the plurality of arithmetic logic units multiplex the arithmetic logic units for calculation.
In the present disclosure, an arithmetic logic unit for performing an image rendering task is referred to as a rendering arithmetic logic unit, and an arithmetic logic unit for performing a calculation task (e.g., tensor calculation) is referred to as a calculation arithmetic logic unit. In the prior art, the arithmetic logic unit for rendering is only used for image rendering, and the arithmetic logic unit for calculating is only used for calculating. The present disclosure innovatively proposes that a multiplexing computation arithmetic logic unit implement a pixel processing function.
By way of example, and not limitation, a typical architecture for a computational arithmetic logic unit is shown in FIG. 2. The work item is an ALU (arithmetic logic unit) of the GPU, the computing task of pixel processing on a tile is split into a plurality of work items for execution, and temporary data generated by the work items are exchanged in a local random access memory (local RAM, local random access memory). When the pixel processing computation task of a tile is completed, the data in the local RAM is written back to system memory. The larger the Local RAM, the smaller the data exchange bandwidth the GPU needs to perform with the system memory.
In order to improve efficiency, the pixel processing is still performed as tile, and the local RAM is used to store the pixel data generated during the pixel processing. The size of the local RAM therefore determines the size of tile.
The pixel processing obtains the data structure of the pixel from the system memory and then obtains the primitive covering this pixel from the data structure of the pixel. The pixel processing process reads the attribute of the primitive to perform operations such as interpolation calculation, illumination calculation, antialiasing and the like on the pixel.
In the process of implementing the present invention, the inventors found by analyzing the 3D rendered data: pixel processing takes up more than 80% of the 3D rendering time, which means that most of the rendering resources of the GPU are used to do pixel processing, not geometry processing. The inventors have also found that: in the current mainstream multi-core graphics processing system adopting the TBDR architecture, in order to achieve a higher frame rate, geometric processing can be completed on one GPU core, but pixel processing needs to be completed by multiple GPU cores. This means that the key to increasing the 3D rendering frame rate is to increase the computational resources of the GPU.
According to the technical scheme provided by the embodiment of the disclosure, the rasterization and the pixel processing process are decoupled, so that the two image processing processes can be optimized respectively. Taking the example of improving the 3D rendering frame rate, since the pixel processing process is decoupled from the rasterization processing process, the pixel processing process can be alone, and more computing resources can be matched for the pixel processing process without optimizing rendering pipelines such as geometry processing and rasterization processing. In particular, the pixel processing may use a second tile of a larger size. In this case, it is possible to multiplex the arithmetic logic unit for calculation and multiplex the cache of the arithmetic logic unit for calculation without separately designing the G buffer.
The on-chip caches to be used for pixel processing and the caches (local random access memories described above) to be used for computation of the GPU are multiplexed. In order to improve the computing capability, the GPU is often added with a computing core or more computing resources, and the stage before pixel processing and rasterization is decoupled in the method, so that the computing core of the GPU is more beneficial to the pixel processing; in addition, by decoupling the stages prior to pixel processing and rasterization, even though pixel processing needs to occupy more rendering time, it can be addressed by increasing the computational power of the GPU without having to re-match the processing power of other rendering pipelines. Since the rasterization and pixel processing processes are decoupled, they no longer need to use the same tile size, so that the tile size can be adjusted according to the rasterization buffer and pixel processing buffer, respectively.
Embodiments of the present disclosure also provide a graphics processing system including a graphics processor as described in any of the embodiments above.
In an embodiment of the present disclosure, a product form of the graphics processing System may be an SOC (System on Chip) Chip.
The graphics processor system in the embodiments of the present disclosure may be a single die SOC chip or a multi die interconnect SOC chip.
The architecture and the working principle of the graphics processing system provided in the present disclosure are described below by taking one die as an example.
In one embodiment shown in FIG. 3, a single die graphics processing system includes a GPU core, i.e., the graphics processor described above.
The GPU core is used to process drawing instructions, and according to the drawing instructions, execute Pipeline of image rendering, and can also be used to execute other operation instructions. The GPU core further includes: the computing unit is used for executing instructions compiled by the shader, belongs to a programmable module and consists of a large number of ALUs; a Cache (Cache) for caching GPU-kernel data to reduce access to memory; the rasterization module is used for a fixed stage of the 3D rendering pipeline and further comprises a primitive information calculation module and a pixel information processing module; a block division (tiling) module, wherein the TBR and TBDR GPU architectures perform block division processing on a frame; the clipping module clips out primitives which are outside the observation range or are not displayed on the back surface at a fixed stage of the 3D rendering pipeline; the post-processing module is used for performing operations such as zooming, cutting, rotating and the like on the drawn graph; microcores (microcores) for scheduling between various pipeline hardware modules on a GPU core, or for task scheduling for multiple GPU cores.
The GPU core is connected to a network on chip. Wherein the network-on-chip is used for data exchange between various masters and slaves (salves) on the graphics processing system, in this embodiment the network-on-chip includes a configuration bus, a data communication network, a communication bus, and so on.
As shown in fig. 3, the graphics processing system may further include:
a general purpose DMA (Direct Memory Access ) for performing data movement between the host side to a graphics processing system memory (e.g., graphics card memory), such as moving vertex (vertex) data of a 3D drawing from the host side to the graphics processing system memory via DMA;
the PCIe controller is used for realizing PCIe protocol through the interface communicated with the host, so that the graphics processing system is connected to the host through the PCIe interface, and programs such as a graphics API, a driver of a display card and the like are run on the host;
the application processor is used for scheduling tasks of each module on the graphic processing system, for example, the GPU is notified to the application processor after rendering a frame of image, and the application processor is restarted to display the image drawn by the GPU on a screen by the display controller;
the memory controller is used for connecting memory equipment and storing data on the SOC;
a display controller for controlling the frame buffer in the memory to be output to the display by a display interface (HDMI, DP, etc.);
video decoding, which can decode the coded video on the host hard disk into pictures capable of being displayed;
the original video code stream on the hard disk of the host can be coded into a specified format and returned to the host.
Based on the graphics processing system architecture shown in FIG. 3, in one embodiment, the graphics rendering process is as follows:
the graphics API of the host (in practical application, for the graphics processing system of the mobile terminal, software on the application processor may also send a drawing instruction to the SOC chip, which requires rendering of an image frame.
Wherein the image frame includes at least one object therein.
The universal DMA transfers vertex coordinate information of each object in the image frame from the host side to the graphics processing system memory.
And after the computing unit of the GPU core acquires the drawing instruction, decoding the drawing instruction.
The vertex shader of the GPU core (whose function is implemented by the computing unit) obtains vertex coordinate information of each object in the image frame from the system memory, and transmits the vertex coordinate information of the objects to the geometry shader (whose function is implemented by the computing unit), which converts the 3D coordinates of the object vertices into expanded texture coordinates (i.e., (u, v) coordinates). In addition, the calculation unit also performs primitive assembly according to the vertex coordinate information of the object, so as to determine the vertex coordinates of each primitive. The value of the texture coordinate corresponding to the vertex coordinate in the texture map is vertex color information.
Vertex coordinate information and vertex texture coordinates of the primitives are saved to a data structure of the primitives in system memory.
After the geometric processing is finished, the image block dividing module performs first image block dividing processing on the image elements in the image frames according to the size of the depth buffer, and stores a first image block dividing processing result in a first image block data structure, wherein the first image block data structure comprises image element indexes of the image elements covered by the image blocks, and the image element indexes indicate storage addresses of image element information of the image elements as shown in fig. 1. The tile partitioning module sends an address of the first tile data structure to the rasterizing module.
When all objects to be displayed for a frame have completed the first tile partition, then the rasterization process is initiated.
According to the requirement of delay rendering, the rasterizing module processes the first image blocks one by one, and reads the image element index (image element identification or image element information storage address) of the image element covering the current first image block from the image block buffer area each time; the rasterizing module reads the primitive information of the primitive through the primitive index, and performs a pixel coverage test using the primitive information of the primitive to determine a pixel covered by the primitive, and then performs a pixel interpolation calculation and at least one pixel test (by way of example and not limitation, the pixel test may include a depth test, a template test, etc.).
In the present disclosure, pixel coverage testing, pixel interpolation calculation, and pixel testing are implemented using existing techniques.
And outputting the information of the pixels through rasterization processing. Specifically, the rasterizing module stores pixel information in a second tile group. The description of the pixel information may refer to the description of the above embodiments, and will not be repeated here.
When all first tiles of a frame have been rasterized, then pixel processing will be invoked.
Specifically, the fragment shader of the GPU core (whose function is implemented by the computing unit) performs shading calculations (e.g., illumination calculations) for corresponding pixels on a per-tile basis according to texture coordinates of the primitive-covered pixels, in units of a second tile.
It should be noted that the above description of the embodiments of the present disclosure uses a single GPU core as an example. The embodiments of the present disclosure are applicable not only to graphics processing systems with a single GPU core, but also to graphics processing systems with multiple GPU cores.
The disclosed embodiments also provide an electronic device including the graphics processing system described in any of the above embodiments. In some use cases, the product form of the electronic device is embodied as a graphics card; in other use scenarios, the product form of the electronic device is embodied as a CPU motherboard.
The embodiment of the disclosure also provides electronic equipment, which comprises the electronic device. In some use scenarios, the product form of the electronic device is a portable electronic device, such as a smart phone, a tablet computer, a VR device, etc.; in some use cases, the electronic device is in the form of a personal computer, game console, workstation, server, etc.
Based on the same inventive concept, the embodiments of the present disclosure also provide a graphic processing method applied to a graphic processor, as shown in fig. 4, including a rasterization process and a pixel process.
The rasterization processing process comprises the following steps: step 401, reading the primitive information of the primitives covered by the first image block by block, rasterizing the primitives covered by the first image block by block according to the primitive information to obtain the pixel information of the primitives covered by the first image block, and storing the pixel information of the primitives covered by each first image block into a memory according to the second image block group;
and after all pixel information of the primitives covered by each first image block is stored in the memory, executing a pixel processing process, wherein the pixel processing process comprises the following steps: step 402, reading pixel information covered by the second image block from the memory block by block, and performing pixel processing according to the read pixel information block by block.
Optionally, the size of the first tile is determined according to the buffer capacity corresponding to the rasterization result.
Optionally, the size of the second tile is determined according to the buffer capacity corresponding to the pixel processing result.
On the basis of any one of the above embodiments of the graphics processing method, before the primitive information of the primitives covered by the first tile is read block by block, the primitive covered by each first tile may be determined, and the primitive index of the primitive covered by each first tile is saved to the memory according to the first tile group, so that the primitive information of the primitive covered by the first tile is read according to the primitive index of the primitive covered by the first tile in the rasterization process.
Further, the pixel information may include frame identification information of the current frame. Correspondingly, the pixel processing result obtained by performing pixel processing on the read pixel information block by block carries frame identification information.
On the basis of any of the above embodiments of the graphics processing method, the pixel information covered by the second tile may be stored in a linked list.
On the basis of any one of the above embodiments of the graphics processing method, the pixel processing is performed on a block-by-block basis according to the read pixel information, and a specific implementation manner of the method may include:
and carrying out pixel processing according to the read pixel information block by multiplexing a plurality of arithmetic logic units for calculation, wherein a buffer memory corresponding to a pixel processing result is a local storage space of the plurality of arithmetic logic units.
It should be noted that the above-described graphics processing method is based on the same inventive concept as the above-described graphics processor. Therefore, the specific implementation manner of each step in the method and the explanation of the related nouns can refer to the description of the above embodiments, which are not repeated here.
While the preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the disclosure.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (13)
1. A graphics processor, comprising:
the rasterizing processing module is used for reading the primitive information of the primitives covered by the first image blocks block by block, rasterizing the primitives covered by the first image blocks according to the primitive information block by block to obtain the pixel information of the primitives covered by the first image blocks, storing the pixel information of the primitives covered by each first image block into the memory according to the second image block group, and determining the size of the first image blocks according to the cache capacity corresponding to the rasterizing processing module;
and the pixel processing module is used for reading the pixel information covered by the second image block from the memory block by block after the pixel information of the image elements covered by each first image block is all stored in the memory, and carrying out pixel processing according to the read pixel information block by block, wherein the size of the second image block is determined according to the cache capacity corresponding to the pixel processing module.
2. The graphics processor of claim 1, further comprising a tile partitioning module: and the rasterizing processing module is used for determining the primitive covered by each first block, storing the primitive index of the primitive covered by each first block into the memory according to the first block group so as to read the primitive information of the primitive covered by the first block according to the primitive index of the primitive covered by the first block.
3. The graphics processor of claim 2, wherein the pixel information includes frame identification information of a current frame, and the frame identification information is carried in a pixel processing result obtained by performing pixel processing on the pixel information read block by block.
4. The graphics processor of claim 1, wherein the pixel information covered by the second tile is stored in a linked list.
5. The graphics processor according to any one of claims 1 to 4, wherein the pixel processing module includes a plurality of arithmetic logic units, a task of performing pixel processing on a block-by-block basis on the read pixel information is performed by the plurality of arithmetic logic units, and caches corresponding to the pixel processing module are local storage spaces of the plurality of arithmetic logic units, and the plurality of arithmetic logic units multiplex arithmetic logic units for computation.
6. A graphics processing system comprising the graphics processor of any one of claims 1 to 5.
7. An electronic device comprising the graphics processing system of claim 6.
8. An electronic device comprising the electronic apparatus of claim 7.
9. A graphics processing method for use in a graphics processor, the method comprising a rasterization process and a pixel process:
the rasterization processing process comprises the following steps: the method comprises the steps of reading primitive information of primitives covered by a first image block by block, carrying out rasterization processing on the primitives covered by the first image block by block according to the primitive information to obtain pixel information of the primitives covered by the first image block, storing the pixel information of the primitives covered by each first image block into a memory according to second image block groups, and determining the size of the first image block according to the cache capacity corresponding to the rasterization processing result;
and executing the pixel processing process after all pixel information of the primitives covered by each first image block is saved in the memory, wherein the pixel processing process comprises the following steps: and reading pixel information covered by a second image block from the memory block by block, and carrying out pixel processing according to the read pixel information block by block, wherein the size of the second image block is determined according to the cache capacity corresponding to the pixel processing result.
10. The method of claim 9, the method further comprising, prior to the block-wise reading primitive information for primitives covered by the first tile:
and determining the primitive covered by each first block, and storing the primitive index of the primitive covered by each first block into the memory according to the first block group so as to read the primitive information of the primitive covered by the first block according to the primitive index of the primitive covered by the first block in the rasterization process.
11. The method according to claim 10, wherein the pixel information includes frame identification information of a current frame, and the frame identification information is carried in a pixel processing result obtained by performing pixel processing on the block-by-block basis according to the read pixel information.
12. The method of claim 9, wherein the pixel information covered by the second tile is stored in a linked list.
13. A method according to any one of claims 9 to 12, said pixel processing on a block-by-block basis on the read pixel information, comprising:
and carrying out pixel processing according to the read pixel information block by multiplexing a plurality of arithmetic logic units for calculation, wherein a buffer memory corresponding to a pixel processing result is a local storage space of the arithmetic logic units for calculation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210414541.3A CN116263982B (en) | 2022-04-20 | 2022-04-20 | Graphics processor, system, method, electronic device and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210414541.3A CN116263982B (en) | 2022-04-20 | 2022-04-20 | Graphics processor, system, method, electronic device and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116263982A CN116263982A (en) | 2023-06-16 |
CN116263982B true CN116263982B (en) | 2023-10-20 |
Family
ID=86723742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210414541.3A Active CN116263982B (en) | 2022-04-20 | 2022-04-20 | Graphics processor, system, method, electronic device and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116263982B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117745518B (en) * | 2024-02-21 | 2024-06-11 | 芯动微电子科技(武汉)有限公司 | Graphics processing method and system for optimizing memory allocation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036537A (en) * | 2013-03-08 | 2014-09-10 | 辉达公司 | Multiresolution Consistent Rasterization |
CN110036413A (en) * | 2016-12-23 | 2019-07-19 | 高通股份有限公司 | Blinkpunkt rendering in tiling framework |
CN111062858A (en) * | 2019-12-27 | 2020-04-24 | 西安芯瞳半导体技术有限公司 | Efficient rendering-ahead method, device and computer storage medium |
CN112561774A (en) * | 2019-09-26 | 2021-03-26 | 英特尔公司 | Graphics processing unit and method therein |
-
2022
- 2022-04-20 CN CN202210414541.3A patent/CN116263982B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104036537A (en) * | 2013-03-08 | 2014-09-10 | 辉达公司 | Multiresolution Consistent Rasterization |
CN110036413A (en) * | 2016-12-23 | 2019-07-19 | 高通股份有限公司 | Blinkpunkt rendering in tiling framework |
CN112561774A (en) * | 2019-09-26 | 2021-03-26 | 英特尔公司 | Graphics processing unit and method therein |
CN111062858A (en) * | 2019-12-27 | 2020-04-24 | 西安芯瞳半导体技术有限公司 | Efficient rendering-ahead method, device and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116263982A (en) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062858B (en) | Efficient rendering-ahead method, device and computer storage medium | |
US9406149B2 (en) | Selecting and representing multiple compression methods | |
US10152765B2 (en) | Texture processing method and unit | |
US9589310B2 (en) | Methods to facilitate primitive batching | |
US20150178879A1 (en) | System, method, and computer program product for simultaneous execution of compute and graphics workloads | |
US11908039B2 (en) | Graphics rendering method and apparatus, and computer-readable storage medium | |
CN106575430B (en) | Method and apparatus for pixel hashing | |
CN110675480B (en) | Method and apparatus for acquiring sampling position of texture operation | |
CN117058288A (en) | Graphics processor, multi-core graphics processing system, electronic device, and apparatus | |
CN112801855B (en) | Method and device for scheduling rendering task based on graphics primitive and storage medium | |
CN116263982B (en) | Graphics processor, system, method, electronic device and apparatus | |
US10192348B2 (en) | Method and apparatus for processing texture | |
US11978234B2 (en) | Method and apparatus of data compression | |
WO2023202367A1 (en) | Graphics processing unit, system, apparatus, device, and method | |
US8947444B1 (en) | Distributed vertex attribute fetch | |
CN113835753B (en) | Techniques for performing accelerated point sampling in a texture processing pipeline | |
US11748933B2 (en) | Method for performing shader occupancy for small primitives | |
CN116957898B (en) | Graphics processor, system, method, electronic device and electronic equipment | |
CN115222869A (en) | Distributed rendering method and device | |
CN116263981B (en) | Graphics processor, system, apparatus, device, and method | |
CN114511657A (en) | Data processing method and related device | |
CN115375821A (en) | Image rendering method and device and server | |
WO2023202366A1 (en) | Graphics processing unit and system, electronic apparatus and device, and graphics processing method | |
WO2023202365A1 (en) | Graphics processing unit, system and method, and apparatus and device | |
US20230377086A1 (en) | Pipeline delay elimination with parallel two level primitive batch binning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |