WO2023202366A1 - Graphics processing unit and system, electronic apparatus and device, and graphics processing method - Google Patents

Graphics processing unit and system, electronic apparatus and device, and graphics processing method Download PDF

Info

Publication number
WO2023202366A1
WO2023202366A1 PCT/CN2023/085937 CN2023085937W WO2023202366A1 WO 2023202366 A1 WO2023202366 A1 WO 2023202366A1 CN 2023085937 W CN2023085937 W CN 2023085937W WO 2023202366 A1 WO2023202366 A1 WO 2023202366A1
Authority
WO
WIPO (PCT)
Prior art keywords
tile
size
sub
module
vrs
Prior art date
Application number
PCT/CN2023/085937
Other languages
French (fr)
Chinese (zh)
Inventor
唐志敏
王海洋
姜莹
Original Assignee
象帝先计算技术(重庆)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 象帝先计算技术(重庆)有限公司 filed Critical 象帝先计算技术(重庆)有限公司
Publication of WO2023202366A1 publication Critical patent/WO2023202366A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • This application relates to the technical field of GPU (Graphics Processing Unit, graphics processor), and in particular to a graphics processor, a graphics processing system, an electronic device, an electronic device and a graphics processing method.
  • GPU Graphics Processing Unit
  • on-chip buffers are used to store pixel depth buffer information, geometry buffer information and other information in a tile.
  • the storage space on the on-chip buffer that saves the depth buffer information is called the depth buffer (Depth Buffer)
  • the storage space on the on-chip buffer that saves the geometry buffer information is called the geometry buffer (G Buffer).
  • the size of the depth buffer determines the number of samples or pixels that can be used for depth testing on the chip
  • the size of the geometry buffer determines the number of samples or pixels that can be used for fragment calculation on the chip.
  • the size of the depth buffer and the size of the geometry buffer together determine the size of the tiles that can be divided. For example, if the tile size is 16 ⁇ 16 pixels, then both the depth buffer and the geometry buffer need to provide on-chip processing capabilities for 16 ⁇ 16 pixels.
  • tile size In traditional tile-based GPU architectures, the choice of tile size is limited by both the depth buffer and the geometry buffer.
  • VRS Very Rate Shading
  • the granularity of fragment calculations is reduced, which means that the geometry buffer requirements become smaller. Larger sized tiles can be divided if the size of the geometry buffer matches the tile size. But depth testing limits tile division to larger sizes. For example, both the depth buffer and the geometry buffer support on-chip processing capabilities of 16 ⁇ 16 pixels. After VRS is enabled, the geometry buffer supports dividing tiles larger than 16 ⁇ 16 pixels, but the depth buffer limits the number of tiles. The maximum size is 16 ⁇ 16 pixels. It can be seen that after enabling VRS, the traditional tile-based GPU architecture limits the choice of tile size.
  • the purpose of this disclosure is to provide a graphics processor, a graphics processing system, a graphics processing method, an electronic device and an electronic device so that when VRS is enabled, tiles can be divided according to the size of the geometry buffer.
  • a graphics processor adopts a tile-based rendering architecture, and the graphics processor at least includes:
  • the tile division module is configured to: perform tile division processing on the primitives in the image frame according to the basic tile size and the VRS pixel group size.
  • the divided tile size is larger than the basic tile size, but not larger than the basic tile size.
  • the depth test module is configured to: perform depth testing one by one, and perform depth testing on multiple sub-tiles for each tile, and the size of each sub-tile is limited to the size of the depth buffer;
  • the fragment shader module is configured to perform fragment calculations tile by tile, where the fragment shader module is called after the depth test of each sub-tile within each tile is completed.
  • the divided tile size is the product of the basic tile size and the VRS pixel group size.
  • the subtile size is the base tile size.
  • the depth testing module may be configured to: divide each tile into multiple sub-tiles.
  • the tile dividing module is further configured to divide each tile into multiple sub-tiles.
  • the depth testing module can be configured to: mark the sub-tile division results in the block division results of each tile.
  • the block division module divides sub-blocks
  • the block division module is further configured to: save the block division results of each block separately, and mark the sub-blocks in the block division results of each block. Block division results.
  • the fragment shader module may be configured to perform a fragment calculation on pixels in the same VRS pixel group.
  • a graphics processing system includes the graphics processor described in any of the above embodiments.
  • an electronic device includes the graphics processing system described in any of the above embodiments.
  • the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a CPU motherboard.
  • an electronic device which includes the above-mentioned electronic device.
  • the product form of the electronic device is a portable electronic device, such as a smartphone, tablet computer, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, game console, etc.
  • a graphics processing method is also provided.
  • the graphics processing method adopts a tile-based rendering architecture.
  • the graphics processing method at least includes the following operations:
  • the graphics elements in the image frame are divided into blocks according to the basic block size and the VRS pixel group size.
  • the divided block size is larger than the basic block size, but not larger than the basic block size and the VRS pixel group size. the product of;
  • Depth testing is performed tile by tile, and for each tile, depth testing is performed in multiple sub-tiles.
  • the size of each sub-tile is limited to the size of the depth buffer
  • Fragment calculations are done on a tile-by-tile basis, where the fragment shader module is called after depth testing of individual sub-tiles within each tile.
  • the divided tile size is the product of the basic tile size and the VRS pixel group size.
  • the subtile size is the base tile size.
  • each tile may also be divided into multiple sub-tiles before being divided into multiple sub-tiles for depth testing.
  • sub-tile division results can be marked in the tile division results of each tile.
  • a fragment calculation is performed on pixels in the same VRS pixel group.
  • Figure 1 is a schematic diagram of block division according to an embodiment of the present disclosure
  • Figure 2 is a schematic structural diagram of a graphics processing system according to an embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of a graphics processing method according to an embodiment of the present disclosure.
  • the terms “first”, “second”, etc. may be used to describe various features, but these features should not be limited by these terms. These terms are used solely to distinguish one characteristic from another.
  • connection or communication between the two components may be understood as a direct connection or communication, or a direct connection or communication between the two components. It can be understood as indirect connection or communication through intermediate components.
  • the geometry buffer can support greater processing power than without the VRS function. For example, when the VRS function is not enabled, the size of the geometry buffer is 16 ⁇ 16, and the supported tile size is 16 ⁇ 16; when the VRS function is enabled, and the VRS pixel group is set to 1 ⁇ 2 Below, the geometry buffer can support a tile size of 16 ⁇ 32.
  • the present disclosure provides a graphics processor that adopts a tile-based rendering architecture, and the graphics processor can adjust the tile size.
  • the tile size can be adjusted according to the size of the geometry buffer, thereby effectively utilizing the geometry buffer.
  • the data interaction with memory caused by tile switching during the fragment calculation phase can be reduced.
  • the tile size may exceed the size of the depth buffer.
  • the graphics processor provided by the present disclosure performs a depth test on each tile according to the granularity of sub-tiles in the depth testing stage to match the depth buffer. area size.
  • the depth test is performed according to the granularity of the sub-tiles, which means that the tile is divided into multiple sub-tiles.
  • the depth test for the tile is converted into reading the data of one sub-tile at a time, and is performed for one sub-tile. Depth testing until all sub-tiles of the tile are depth-tested.
  • GPU refers to a processor with computing functions implemented through hardware, which includes computing units, caches and other components.
  • So GPGPU general-purpose graphics processing unit, general-purpose graphics processor
  • So GPGPU can also be a GPU.
  • the graphics processor provided by the embodiment of the present disclosure is suitable for any tile-based rendering architecture, such as TBR (Tile Based Render, tile-based rendering), TBDR (Tile Based Deferred Rendering, tile-based deferred rendering), etc.
  • TBR Tile Based Render, tile-based rendering
  • TBDR Tile Based Deferred Rendering, tile-based deferred rendering
  • One embodiment of the present disclosure provides a graphics processor that adopts a tile-based rendering architecture, which at least includes a tile partitioning module, a depth testing module, and a fragment shader module.
  • the block division module is configured to perform block division processing on the primitives in the image frame according to the basic block size and the VRS pixel group size.
  • the divided block size is larger than the basic block size, but not larger than the basic block size.
  • the basic tile size refers to the tile size used for tile division when the VRS function is not enabled.
  • the base tile size is determined based on the size of the on-chip buffers such as the size of the geometry buffer and the size of the depth buffer.
  • the divided tile size is the product of the basic tile size and the VRS pixel group size. Therefore, the size of a tile is shown using a rectangular solid line frame in FIG. 1 .
  • the pixel group sizes supported by VRS include: 1 ⁇ 2, 2 ⁇ 1, 2 ⁇ 2, 2 ⁇ 4, 4 ⁇ 2 and 4 ⁇ 4.
  • the rules for tile size adjustment by the tile division module are shown in Table 1.
  • a ⁇ b represents the set basic tile size.
  • the depth testing module is configured to perform depth testing tile by tile, and perform depth testing in multiple subtiles (subtiles) for each tile, and the size of each subtile is no larger than the size of the depth buffer.
  • the sub-tile size is the base tile size. Still taking the block division method shown in Figure 1 as an example, the dotted rectangular box in Figure 1 shows the sub-block size.
  • the depth test is performed on the pixels within the tile.
  • the embodiment of the present disclosure does not limit which pixels are depth tested.
  • the depth test is performed on all pixels covered by the primitive.
  • the depth test is performed on the visible pixels of the screen covered by the primitive. This disclosure does not limit the granularity of depth testing. Depth testing can be performed based on pixel granularity, or depth testing can be performed based on sample granularity.
  • the depth testing module performs depth testing on pixels in each tile according to a predetermined processing sequence. Among them, the depth test for the pixels of each tile is performed at the sub-tile granularity. In this disclosure, each tile is divided into multiple sub-tiles according to the same sub-tile division rules. tiles. Taking the i-th tile as an example, the data in one sub-tile is read each time for depth testing. After completing the depth test of one sub-tile, the data in the next sub-tile is read in a predetermined order. Depth testing is performed until all sub-tiles in the i-th tile complete the depth test. After all sub-tiles in the i-th tile complete the depth test, the fragment shader module can be called to perform fragment calculations for the i-th tile. In the embodiment of the present disclosure, the fragment shader module can be called by the depth testing module or by other hardware modules in the graphics processor.
  • the fragment shader module is configured to perform fragment calculations block by block, wherein the fragment shader module is called after the depth test of each sub-block within each block is completed.
  • the block size divided by the block dividing module can not only be the product of the basic block size and the VRS pixel group size, but can also be other choices under the above constraints.
  • the present disclosure There is no limit to this.
  • the sub-tile size can be not only the basic block size, but also other choices under the above constraints, which is not limited by the present disclosure.
  • the function of dividing sub-tiles can be implemented by the depth testing module, or can be implemented by the tile dividing module. Of course, it can also be implemented by other modules, and the present disclosure does not limit this.
  • the depth testing module may be configured to: divide each tile into multiple sub-tiles.
  • Dividing each tile into multiple sub-tiles specifically refers to determining the primitives covering each sub-tile among the primitives covering the tile.
  • the specific implementation method can refer to the implementation method of tile division, and will not be described again here.
  • the division result of sub-tiles can be saved in a separate data structure, and its data structure can refer to the data structure of the division result of the tile.
  • the data structure of the tile division result may be different in the following ways: the association between the sub-tile and the tile is marked in the sub-tile division result.
  • This disclosure does not limit the specific marking method of association relationships.
  • the division result of each sub-block includes a sub-block identification, and the sub-block identification uses at least one identification bit to mark the block to which the sub-block belongs.
  • the end of the tile is marked in the division result of the last sub-tile of each tile.
  • the above-mentioned correlation relationship does not need to be marked in the data structure of the sub-tile division results.
  • the depth testing module can confirm the end of the depth test of a tile by comparing the tile division results with the sub-tile division results, or The end of the depth test of a tile is confirmed based on the number of reads, which is not limited by this disclosure.
  • the division results of sub-tiles can also be saved in the tile division results. That is, the sub-tile division results are marked in the tile division results of each tile.
  • This disclosure does not limit the marking method of the sub-tile division results.
  • the tile division result of each tile includes a tile identifier and a primitive index of the covering tile. As an example but not a limitation, identification information corresponding to the primitive index of each tile may be added, and the identification information includes the tile index used by the tile.
  • the subtile marker for the meta overridden subtile may be added, and the identification information includes the tile index used by the tile.
  • the tile dividing module divides the sub-tiles, then the tile dividing module is further configured to: divide each tile into multiple sub-tiles.
  • the tile dividing module is further configured to: divide each tile into multiple sub-tiles.
  • division results of sub-tiles can be stored in a separate data structure or in the division results of tiles.
  • the tile division module determines the primitives covering each sub-tile. This process actually realizes the division of sub-tiles and tiles.
  • the blocking rule is to divide sub-blocks using the basic block size as the granularity. If the size of the tile is smaller than the product of the basic tile size and the VRS pixel group size, as an example but not a limitation, the blocking rule is to divide the tile into N sub-tiles.
  • the fragment shader module may be configured to perform a fragment calculation on pixels in the same VRS pixel group.
  • An embodiment of the present disclosure also provides a graphics processing system, which includes the graphics processor described in any of the above embodiments.
  • the product form of the graphics processing system may be a SOC (System on Chip) chip.
  • the graphics processor system in the embodiment of the present disclosure may be a single-die (wafer) SOC chip or a multi-die interconnected SOC chip.
  • a single-die graphics processing system includes a GPU core, which is the above-mentioned graphics processor.
  • the GPU core is used to process drawing instructions. According to the drawing instructions, it executes the pipeline of image rendering and can also be used to execute other computing instructions.
  • the GPU core further includes: a computing unit, which is used to execute compiled instructions of the shader. It is a programmable module and consists of a large number of ALUs; a cache (Cache), which is used to cache GPU core data to reduce access to memory; Rasterization module, a fixed stage of the 3D rendering pipeline; Tilling module, which divides a frame into tiles in the TBR and TBDR GPU architecture; Cropping module, a fixed stage of the 3D rendering pipeline, crops out Graph elements that are outside the observation range or are not displayed on the back; the post-processing module is used to perform operations such as scaling, cropping, rotating, etc. on the finished drawing; the Micro core is used between various pipeline hardware modules on the GPU core Scheduling, or task scheduling for multiple GPU cores.
  • the GPU cores are connected to the on-chip network.
  • the on-chip network is used for data exchange between masters and slaves on the graphics processing system.
  • the on-chip network includes a configuration bus, a data communication network, a communication bus, and so on.
  • the graphics processing system can also include:
  • Universal DMA Direct Memory Access
  • DMA is used to move the vertex data of 3D drawings from the host. to graphics processing system memory;
  • PCIe controller an interface used to communicate with the host, implements the PCIe protocol, so that the graphics processing system is connected to the host through the PCIe interface, and the graphics API and graphics card driver and other programs are run on the host;
  • the application processor is used to schedule the tasks of each module on the graphics processing system. For example, after the GPU has finished rendering a frame, it notifies the application processor, and the application processor then starts the display controller to display the picture drawn by the GPU on the screen;
  • Memory controller used to connect memory devices and save data on the SOC
  • the display controller controls the output of the frame buffer in the memory to the display through the display interface (HDMI, DP, etc.);
  • Video decoding can decode the encoded video on the host hard disk into a displayable picture
  • Video encoding can encode the original video stream on the host hard disk into a specified format and return it to the host.
  • the graphics rendering process is as follows:
  • the graphics API of the host computer (in actual applications, for mobile graphics processing systems, it can also be used by software on the application processor) sends drawing instructions to the SOC chip, requiring the rendering of image frames.
  • the image frame includes at least one object.
  • General DMA transfers the vertex coordinate information of each object in the image frame from the host to the memory of the graphics processing system.
  • the computing unit of the GPU core decodes the drawing instruction.
  • the vertex shader of the GPU core (its function is implemented by the computing unit) obtains the vertex coordinate information of each object in the image frame from the system memory, and transmits the vertex coordinate information of the object to the geometry shader (its function is implemented by the computing unit),
  • the geometry shader converts the 3D coordinates of the object's vertices into unwrapped texture coordinates (i.e. (u,v) coordinates).
  • the computing unit also assembles primitives based on the vertex coordinate information of the object to determine the vertex coordinates of each primitive. Among them, the value at the texture coordinate corresponding to the vertex coordinate in the texture map is the vertex color information.
  • the vertex coordinate information and vertex texture coordinates of the primitive are saved to the primitive's data structure in system memory.
  • the block division module in the GPU core identifies whether VRS is enabled. If VRS is not enabled, the primitives in the image frame are divided into blocks according to the basic block size. If VRS is enabled, the primitives in the image frame are divided into blocks according to the extended size.
  • Tile size performs tile division processing on the primitives in the image frame. Among them, the extended tile size is the product of the basic tile size and the VRS pixel group size. The tile division module saves the tile division result to the tile buffer, and the tile division result of each tile includes the tile identification and the primitive index of the primitive covering the tile.
  • the rasterization module After the tile division is completed, the rasterization module performs rasterization processing.
  • the rasterization module processes tiles one by one, reading the primitive index of the primitive covering the current tile from the tile buffer each time; the rasterization module reads the primitive information of the primitive through the primitive index, and uses the primitive index to
  • the primitive information of the primitive is subjected to a pixel coverage test to determine the pixels covered by the primitive, and then the texture coordinates corresponding to the pixels covered by the primitive are determined through interpolation calculation, and then at least one pixel test is performed to determine the visibility of the pixel (As an example and not a limitation, pixel testing may include depth testing, template testing, etc.).
  • each tile is divided into multiple sub-tiles according to the basic tile size, and the sub-tile division result is marked in the tile division result of each tile. Then, the rasterization module divides the results into sub-tiles and loads the data of one sub-tile from the memory into the depth buffer at a time for depth testing.
  • the rasterization module identifies the tile currently to be processed from the tile buffer according to the tile identifier, and searches for the corresponding sub-tile from the primitive index corresponding to the current tile in the tile buffer.
  • Graph element index and then search the graph element information of the graph element according to the found graph element index.
  • the graph element information includes the depth information of the graph element, and load the depth information of the graph element corresponding to the current sub-tile.
  • the data of another sub-tile is loaded.
  • the fragment shader is called to perform shading calculations on the frame (i.e. fragment calculation).
  • the depth information of all pixels in a sub-tile is stored in the depth buffer, which can be read and updated repeatedly during the depth test until the depth test of all pixels in a sub-tile is completed.
  • the fragment shader of the GPU core (which is implemented by the computing unit) performs shading calculations (such as lighting calculations) on the pixels within the tile.
  • the fragment shader is called once for the pixels in a pixel group according to the shading rate set by VRS.
  • the results of fragment shading calculations are saved in a geometry buffer.
  • the geometry buffer does not copy the fragment shading result to each pixel in the pixel group, but only stores the result of one fragment shading for a pixel group.
  • the fragment shader can also read data from the geometry buffer and perform calculations until all pixels in a tile are rendered.
  • An embodiment of the present disclosure also provides an electronic device, which includes the graphics processing system described in any of the above embodiments.
  • the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a graphics card.
  • the product form of the electronic device is a graphics card.
  • the CPU motherboard For the CPU motherboard.
  • An embodiment of the present disclosure also provides an electronic device, which includes the above-mentioned electronic device.
  • the product form of the electronic device is a portable electronic device, such as a smartphone, tablet, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, game console, workstation, server wait.
  • embodiments of the present disclosure also provide a graphics processing method using a tile-based rendering architecture, as shown in Figure 3.
  • the method at least includes the following steps:
  • Step 301 Perform tile division processing on the picture elements in the image frame according to the basic tile size and the VRS pixel group size.
  • the divided tile size is larger than the basic tile size, but not larger than the basic tile size and the VRS pixel group size.
  • Step 302 Perform depth testing one by one, and perform depth testing on multiple sub-tiles for each tile.
  • the size of each sub-tile is limited to the size of the depth buffer;
  • Step 303 Perform fragment calculations tile by tile. After the depth test of each sub-tile in each tile is completed, the fragment shader module is called.
  • the divided tile size is the product of the basic tile size and the VRS pixel group size.
  • the subtile size is the base tile size.
  • each tile may also be divided into multiple sub-tiles before being divided into multiple sub-tiles for depth testing.
  • sub-tile division results are marked in the tile division results of each tile.
  • a fragment calculation is performed on pixels in the same VRS pixel group.

Abstract

The present invention provides a graphics processing unit, system and method and an electronic apparatus and device. The graphics processing unit comprises a tile division module, configured to: perform tile division processing on primitives in an image frame according to a basic tile size and a VRS pixel group size, the size of the divided tiles being a product of the basic tile size and the VRS pixel group size; a depth test module, configured to: perform a depth test tile by tile, and for each tile, perform a depth test in multiple sub-tiles, the size of each sub-tile being the basic tile size; and a fragment shader module, configured to: perform fragment calculation tile by tile, wherein the fragment shader module is called after the depth test of each sub-tile in each tile is completed.

Description

图形处理器、系统、电子装置、设备及图形处理方法Graphics processor, system, electronic device, equipment and graphics processing method
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210414535.8、申请日为2022年4月20日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210414535.8 and a filing date of April 20, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本申请涉及GPU(Graphics Processing Unit,图形处理器)技术领域,尤其涉及一种图形处理器、图形处理系统、电子装置、电子设备及图形处理方法。This application relates to the technical field of GPU (Graphics Processing Unit, graphics processor), and in particular to a graphics processor, a graphics processing system, an electronic device, an electronic device and a graphics processing method.
背景技术Background technique
随着采用基于图块(tile based)的GPU架构可减少渲染时的内存带宽需求。对基于图块的GPU架构而言,渲染过程的光栅化和像素处理以图块粒度执行。光栅化和像素处理过程中,通过片上缓冲区(On-Chip Buffers)存储一个图块(tile)中像素点深度缓冲信息、几何缓冲信息等信息。其中,保存深度缓冲信息的片上缓冲区上的存储空间称为深度缓冲区(Depth Buffer),保存几何缓冲信息的片上缓冲区上的存储空间称为几何缓冲区(G Buffer)。深度缓冲区的尺寸决定了可在片上进行深度测试的样本(sample)或像素(pixel)的个数,几何缓冲区的尺寸决定了可在片上进行片段(fragment)计算的样本或像素的个数。深度缓冲区的尺寸和几何缓冲区的尺寸共同决定了可划分的图块尺寸。例如,如果图块的尺寸为16×16个像素,那么,深度缓冲区和几何缓冲区都需要提供对16×16个像素的片上处理能力。With the adoption of a tile-based GPU architecture, memory bandwidth requirements during rendering can be reduced. For tile-based GPU architectures, rasterization and pixel processing of the rendering process are performed at tile granularity. During rasterization and pixel processing, on-chip buffers (On-Chip Buffers) are used to store pixel depth buffer information, geometry buffer information and other information in a tile. Among them, the storage space on the on-chip buffer that saves the depth buffer information is called the depth buffer (Depth Buffer), and the storage space on the on-chip buffer that saves the geometry buffer information is called the geometry buffer (G Buffer). The size of the depth buffer determines the number of samples or pixels that can be used for depth testing on the chip, and the size of the geometry buffer determines the number of samples or pixels that can be used for fragment calculation on the chip. . The size of the depth buffer and the size of the geometry buffer together determine the size of the tiles that can be divided. For example, if the tile size is 16×16 pixels, then both the depth buffer and the geometry buffer need to provide on-chip processing capabilities for 16×16 pixels.
在传统的基于图块的GPU架构中,图块尺寸的选择同时受深度缓冲区和几何缓冲区的限制。当使能VRS(Variable Rate Shading,可变速率着色)功能,片段计算的粒度降低,这意味着几何缓冲区的需求变小。若按照几何缓冲区的尺寸匹配图块尺寸,可以划分更大尺寸的图块。但深度测试限制了图块划分更大的尺寸。例如,深度缓冲区和几何缓冲区都支持16×16个像素的片上处理能力,使能VRS后,几何缓冲区支持划分大于16×16个像素的图块,但深度缓冲区限制了图块的最大尺寸为16×16个像素。由此可见,使能VRS后,传统的基于图块的GPU架构限制了图块尺寸的选择。In traditional tile-based GPU architectures, the choice of tile size is limited by both the depth buffer and the geometry buffer. When VRS (Variable Rate Shading) is enabled, the granularity of fragment calculations is reduced, which means that the geometry buffer requirements become smaller. Larger sized tiles can be divided if the size of the geometry buffer matches the tile size. But depth testing limits tile division to larger sizes. For example, both the depth buffer and the geometry buffer support on-chip processing capabilities of 16×16 pixels. After VRS is enabled, the geometry buffer supports dividing tiles larger than 16×16 pixels, but the depth buffer limits the number of tiles. The maximum size is 16×16 pixels. It can be seen that after enabling VRS, the traditional tile-based GPU architecture limits the choice of tile size.
发明内容Contents of the invention
本公开的目的是提供一种图形处理器、图形处理系统、图形处理方法、电子装置及电子设备,以便在使能VRS的情况下,可以根据几何缓冲区的尺寸进行图块划分。The purpose of this disclosure is to provide a graphics processor, a graphics processing system, a graphics processing method, an electronic device and an electronic device so that when VRS is enabled, tiles can be divided according to the size of the geometry buffer.
根据本公开的一个方面,提供一种图形处理器,该图形处理器采用基于图块的渲染架构,该图形处理器至少包括:According to one aspect of the present disclosure, a graphics processor is provided. The graphics processor adopts a tile-based rendering architecture, and the graphics processor at least includes:
图块划分模块,被配置为:根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸大于基本图块尺寸,但不大于基本图块尺寸与VRS像素组尺寸的乘积;The tile division module is configured to: perform tile division processing on the primitives in the image frame according to the basic tile size and the VRS pixel group size. The divided tile size is larger than the basic tile size, but not larger than the basic tile size. The product of the VRS pixel group size;
深度测试模块,被配置为:逐个图块进行深度测试,且针对每个图块,分多个子图块进行深度测试,每个子图块尺寸受限于深度缓冲区的尺寸; The depth test module is configured to: perform depth testing one by one, and perform depth testing on multiple sub-tiles for each tile, and the size of each sub-tile is limited to the size of the depth buffer;
片段着色器模块,被配置为:逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,片段着色器模块被调用。The fragment shader module is configured to perform fragment calculations tile by tile, where the fragment shader module is called after the depth test of each sub-tile within each tile is completed.
可选的,划分的图块尺寸为基本图块尺寸与VRS像素组尺寸的乘积。Optionally, the divided tile size is the product of the basic tile size and the VRS pixel group size.
可选的,子图块尺寸为基本图块尺寸。Optionally, the subtile size is the base tile size.
在上述任一图形处理器实施例的基础上,深度测试模块可以被配置为:将各个图块划分为多个子图块。或者,图块划分模块还被配置为:将各个图块划分为多个子图块。Based on any of the above graphics processor embodiments, the depth testing module may be configured to: divide each tile into multiple sub-tiles. Alternatively, the tile dividing module is further configured to divide each tile into multiple sub-tiles.
若深度测试模块进行子图块的划分,进一步的,深度测试模块还可以被配置为:在每个图块的图块划分结果中标记子图块划分结果。If the depth testing module divides sub-tiles, further, the depth testing module can be configured to: mark the sub-tile division results in the block division results of each tile.
若图块划分模块进行子图块的划分,进一步的,图块划分模块还被配置为:分别保存每个图块的图块划分结果,并在每个图块的图块划分结果中标记子图块划分结果。If the block division module divides sub-blocks, further, the block division module is further configured to: save the block division results of each block separately, and mark the sub-blocks in the block division results of each block. Block division results.
在上述任一图形处理器实施例的基础上,片段着色器模块可以被配置为:对同一VRS像素组中的像素进行一次片段计算。Based on any of the above graphics processor embodiments, the fragment shader module may be configured to perform a fragment calculation on pixels in the same VRS pixel group.
在此基础上,可选的,片段着色器对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。On this basis, optionally, only one fragment calculation result is saved for the same VRS pixel group in the geometry buffer corresponding to the fragment shader.
根据本公开的另一方面,还提供一种图形处理系统,该图形处理系统包括上述任一实施例中所述的图形处理器。According to another aspect of the present disclosure, a graphics processing system is also provided. The graphics processing system includes the graphics processor described in any of the above embodiments.
根据本公开的另一方面,还提供一种电子装置,该电子装置包括上述任一实施例中所述的图形处理系统。在一些使用场景下,该电子装置的产品形式体现为显卡;在另一些使用场景下,该电子装置的产品形式体现为CPU主板。According to another aspect of the present disclosure, an electronic device is also provided. The electronic device includes the graphics processing system described in any of the above embodiments. In some usage scenarios, the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a CPU motherboard.
根据本公开的另一方面,还提供一种电子设备,该电子设备包括上述的电子装置。在一些使用场景下,该电子设备的产品形式是便携式电子设备,例如智能手机、平板电脑、VR设备等;在一些使用场景下,该电子设备的产品形式是个人电脑、游戏主机等。According to another aspect of the present disclosure, an electronic device is also provided, which includes the above-mentioned electronic device. In some usage scenarios, the product form of the electronic device is a portable electronic device, such as a smartphone, tablet computer, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, game console, etc.
根据本公开的另一方面,还提供一种图形处理方法,该图形处理方法采用基于图块的渲染架构,该图形处理方法至少包括如下操作:According to another aspect of the present disclosure, a graphics processing method is also provided. The graphics processing method adopts a tile-based rendering architecture. The graphics processing method at least includes the following operations:
根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸大于所述基本图块尺寸,但不大于基本图块尺寸与所述VRS像素组尺寸的乘积;The graphics elements in the image frame are divided into blocks according to the basic block size and the VRS pixel group size. The divided block size is larger than the basic block size, but not larger than the basic block size and the VRS pixel group size. the product of;
逐个图块进行深度测试,且针对每个图块,分多个子图块进行深度测试,每个子图块尺寸受限于深度缓冲区的尺寸;Depth testing is performed tile by tile, and for each tile, depth testing is performed in multiple sub-tiles. The size of each sub-tile is limited to the size of the depth buffer;
逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,片段着色器模块被调用。Fragment calculations are done on a tile-by-tile basis, where the fragment shader module is called after depth testing of individual sub-tiles within each tile.
可选的,划分的图块尺寸为基本图块尺寸与VRS像素组尺寸的乘积。Optionally, the divided tile size is the product of the basic tile size and the VRS pixel group size.
可选的,子图块尺寸为基本图块尺寸。Optionally, the subtile size is the base tile size.
在上述任一图形处理方法实施例的基础上,分多个子图块进行深度测试之前,还可以将各个图块划分为多个子图块。Based on any of the above embodiments of the graphics processing method, each tile may also be divided into multiple sub-tiles before being divided into multiple sub-tiles for depth testing.
进一步的,可以在每个图块的图块划分结果中标记子图块划分结果。Further, the sub-tile division results can be marked in the tile division results of each tile.
在上述任一图形处理方法实施例的基础上,对同一VRS像素组中的像素进行一次片段计算。Based on any of the above graphics processing method embodiments, a fragment calculation is performed on pixels in the same VRS pixel group.
在此基础上,可选的,片段着色器对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。 On this basis, optionally, only one fragment calculation result is saved for the same VRS pixel group in the geometry buffer corresponding to the fragment shader.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. In the attached picture:
图1为本公开一个实施例的图块划分示意图;Figure 1 is a schematic diagram of block division according to an embodiment of the present disclosure;
图2为本公开一个实施例的图形处理系统结构示意图;Figure 2 is a schematic structural diagram of a graphics processing system according to an embodiment of the present disclosure;
图3为本公开一个实施例的图形处理方法流程示意图。Figure 3 is a schematic flowchart of a graphics processing method according to an embodiment of the present disclosure.
具体实施方式Detailed ways
本发明在介绍本公开实施例之前,应当说明的是:Before introducing the embodiments of the present disclosure, it should be noted that:
本公开部分实施例被描述为处理流程,虽然流程的各个操作步骤可能被冠以顺序的步骤编号,但是其中的操作步骤可以被并行地、并发地或者同时实施。Some embodiments of the present disclosure are described as processing flows. Although various operation steps of the flow may be labeled with sequential step numbers, the operation steps therein may be implemented in parallel, concurrently, or simultaneously.
本公开实施例中可能使用了术语“第一”、“第二”等等来描述各个特征,但是这些特征不应当受这些术语限制。使用这些术语仅仅是为了将一个特征与另一个特征进行区分。In embodiments of the present disclosure, the terms “first”, “second”, etc. may be used to describe various features, but these features should not be limited by these terms. These terms are used solely to distinguish one characteristic from another.
本公开实施例中可能使用了术语“和/或”,“和/或”包括其中一个或更多所列出的相关联特征的任意和所有组合。The term "and/or" may be used in embodiments of the present disclosure, and "and/or" includes any and all combinations of one or more of the associated listed features.
应当理解的是,当描述两个部件的连接关系或通信关系时,除非明确指明两个部件之间直接连接或直接通信,否则,两个部件的连接或通信可以理解为直接连接或通信,也可以理解为通过中间部件间接连接或通信。It should be understood that when describing the connection relationship or communication relationship between two components, unless a direct connection or direct communication between the two components is clearly stated, otherwise, the connection or communication between the two components may be understood as a direct connection or communication, or a direct connection or communication between the two components. It can be understood as indirect connection or communication through intermediate components.
为了使本公开实施例中的技术方案及优点更加清楚明白,以下结合附图对本公开的示例性实施例进行进一步详细的说明,显然,所描述的实施例仅是本公开的一部分实施例,而不是所有实施例的穷举。需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。In order to make the technical solutions and advantages in the embodiments of the present disclosure more clear, the exemplary embodiments of the present disclosure are further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present disclosure. This is not an exhaustive list of all embodiments. It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of the present disclosure can be combined with each other.
使能VRS功能后,在深度测试阶段,按像素或样本的粒度进行深度测试,但是在片段计算阶段,按VRS像素组(pixel group)的粒度进行片段计算,即一个像素组执行一次片段计算。因此,在使能VRS功能的情况下,几何缓冲区可以支持比不使能VRS功能时更大的处理能力。例如,在不使能VRS功能的情况下,几何缓冲区的尺寸为16×16,可以支持的图块尺寸为16×16;在使能VRS功能,且VRS像素组设置为1×2的情况下,几何缓冲区可以支持的图块尺寸为16×32。有鉴于此,本公开提供一种采用基于图块的渲染架构的图形处理器,该图形处理器可以对图块尺寸进行调整。具体的,在使能VRS的情况下,可以按照几何缓冲区的尺寸调整图块尺寸,从而有效利用几何缓冲区。不仅如此,通过增加图块尺寸,可以减少在片段计算阶段由于图块切换带来的与内存之间的数据交互。此外,由于使用更大的图块,可能减少图元(primitive)覆盖多个图块的情况,更有利于减少图块切换而重复使用图元信息的情况。增加图块尺寸后,图块尺寸可能超过深度缓冲区的尺寸,因此,本公开提供的图形处理器,在深度测试阶段对每个图块按照子图块的粒度进行深度测试,以匹配深度缓冲区的尺寸。此处按照子图块的粒度进行深度测试,是指图块被划分为多个子图块,针对图块的深度测试,转换为每次读取一个子图块的数据,针对一个子图块进行深度测试,直至针对图块的所有子图块都进行深度测试。After the VRS function is enabled, in the depth test phase, the depth test is performed at the granularity of pixels or samples, but in the fragment calculation phase, the fragment calculation is performed at the granularity of the VRS pixel group (pixel group), that is, one pixel group performs one fragment calculation. Therefore, with the VRS function enabled, the geometry buffer can support greater processing power than without the VRS function. For example, when the VRS function is not enabled, the size of the geometry buffer is 16×16, and the supported tile size is 16×16; when the VRS function is enabled, and the VRS pixel group is set to 1×2 Below, the geometry buffer can support a tile size of 16×32. In view of this, the present disclosure provides a graphics processor that adopts a tile-based rendering architecture, and the graphics processor can adjust the tile size. Specifically, when VRS is enabled, the tile size can be adjusted according to the size of the geometry buffer, thereby effectively utilizing the geometry buffer. Not only that, by increasing the tile size, the data interaction with memory caused by tile switching during the fragment calculation phase can be reduced. In addition, due to the use of larger tiles, it is possible to reduce the situation where primitives cover multiple tiles, which is more conducive to reducing the situation of tile switching and reuse of primitive information. After increasing the tile size, the tile size may exceed the size of the depth buffer. Therefore, the graphics processor provided by the present disclosure performs a depth test on each tile according to the granularity of sub-tiles in the depth testing stage to match the depth buffer. area size. Here, the depth test is performed according to the granularity of the sub-tiles, which means that the tile is divided into multiple sub-tiles. The depth test for the tile is converted into reading the data of one sub-tile at a time, and is performed for one sub-tile. Depth testing until all sub-tiles of the tile are depth-tested.
其中,GPU是指通过硬件实现、具有计算功能的处理器,其包括计算单元、缓存等等组成部件,可 以是GPGPU(general-purpose graphics processing unit,通用图形处理器),也可以是GPU。Among them, GPU refers to a processor with computing functions implemented through hardware, which includes computing units, caches and other components. So GPGPU (general-purpose graphics processing unit, general-purpose graphics processor) can also be a GPU.
本公开实施例提供的图形处理器适用于任何基于图块的渲染架构,例如,TBR(Tile Based Render,基于图块的渲染)、TBDR(Tile Based Deferred Rendering,基于图块的延迟渲染)等。The graphics processor provided by the embodiment of the present disclosure is suitable for any tile-based rendering architecture, such as TBR (Tile Based Render, tile-based rendering), TBDR (Tile Based Deferred Rendering, tile-based deferred rendering), etc.
本公开的一个实施例提供一种图形处理器,该图形处理器采用基于图块的渲染架构,其至少包括图块划分模块、深度测试模块和片段着色器模块。One embodiment of the present disclosure provides a graphics processor that adopts a tile-based rendering architecture, which at least includes a tile partitioning module, a depth testing module, and a fragment shader module.
本公开中,图块划分模块被配置为:根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸大于基本图块尺寸,但不大于基本图块尺寸与VRS像素组尺寸的乘积。In the present disclosure, the block division module is configured to perform block division processing on the primitives in the image frame according to the basic block size and the VRS pixel group size. The divided block size is larger than the basic block size, but not larger than the basic block size. The product of the tile size and the VRS pixel group size.
本公开实施例中,基本图块尺寸是指在未使能VRS功能的情况下进行图块划分的图块尺寸。基本图块尺寸是根据几何缓冲区的尺寸和深度缓冲区的尺寸等片上缓冲区的尺寸确定的。In the embodiment of the present disclosure, the basic tile size refers to the tile size used for tile division when the VRS function is not enabled. The base tile size is determined based on the size of the on-chip buffers such as the size of the geometry buffer and the size of the depth buffer.
本公开实施例中,如图1所示,假设基本图块尺寸为4×4,设置水平相邻的两个像素构成一个像素组(即VRS像素组尺寸为1×2,图1中椭圆实线框所示),那么,基本图块尺寸与VRS像素组尺寸的乘积为4*1×4*2=4×8。图1所示的实施例中,划分的图块尺寸为基本图块尺寸与VRS像素组尺寸的乘积,因此,图1中使用矩形实线框示出了一个图块的尺寸。In the embodiment of the present disclosure, as shown in Figure 1, assuming that the basic block size is 4×4, two horizontally adjacent pixels are set to form a pixel group (that is, the VRS pixel group size is 1×2, and the ellipse in Figure 1 is actually (shown in wireframe), then the product of the basic tile size and the VRS pixel group size is 4*1*4*2=4*8. In the embodiment shown in FIG. 1 , the divided tile size is the product of the basic tile size and the VRS pixel group size. Therefore, the size of a tile is shown using a rectangular solid line frame in FIG. 1 .
根据Direct3D12的要求,VRS所支持的像素组尺寸包括:1×2,2×1,2×2,2×4,4×2和4×4。对于不同的VRS像素组尺寸,图块划分模块进行的图块尺寸调整的规则如表1所示。表1中,a×b表示设置的基本图块尺寸。According to the requirements of Direct3D12, the pixel group sizes supported by VRS include: 1×2, 2×1, 2×2, 2×4, 4×2 and 4×4. For different VRS pixel group sizes, the rules for tile size adjustment by the tile division module are shown in Table 1. In Table 1, a×b represents the set basic tile size.
表1 tile size分配
Table 1 tile size allocation
本公开中,深度测试模块被配置为:逐个图块进行深度测试,且针对每个图块,分多个子图块(subtile)进行深度测试,每个子图块尺寸不大于深度缓冲区的尺寸。In this disclosure, the depth testing module is configured to perform depth testing tile by tile, and perform depth testing in multiple subtiles (subtiles) for each tile, and the size of each subtile is no larger than the size of the depth buffer.
在一些实施例中,子图块尺寸为基本图块尺寸。仍以图1所示的图块划分方式为例,图1中的虚线矩形框示出了子图块尺寸。In some embodiments, the sub-tile size is the base tile size. Still taking the block division method shown in Figure 1 as an example, the dotted rectangular box in Figure 1 shows the sub-block size.
具体的,是对图块内的像素进行深度测试。本公开实施例不限定对哪些像素进行深度测试。在一些实施例中,对图元覆盖的所有像素进行深度测试。在另一些实施例中,对图元覆盖的屏幕可见像素进行深度测试。本公开不限定深度测试的粒度,可按照像素为粒度进行深度测试,也可按照样本为粒度进行深度测试。Specifically, the depth test is performed on the pixels within the tile. The embodiment of the present disclosure does not limit which pixels are depth tested. In some embodiments, the depth test is performed on all pixels covered by the primitive. In other embodiments, the depth test is performed on the visible pixels of the screen covered by the primitive. This disclosure does not limit the granularity of depth testing. Depth testing can be performed based on pixel granularity, or depth testing can be performed based on sample granularity.
深度测试模块按照预定的处理顺序对各个图块中的像素进行深度测试。其中,对于每个图块的像素的深度测试是以子图块为粒度执行的。本公开中,每个图块按照相同的子图块划分规则被划分为多个子 图块。以第i个图块为例,每次读取一个子图块内的数据进行深度测试,在完成一个子图块的深度测试后,再按照预定的顺序读取下一个子图块内的数据进行深度测试,直至第i个图块内所有子图块完成深度测试。第i个图块内所有子图块完成深度测试后,可以调用片段着色器模块针对第i个图块进行片段计算。本公开实施例中,既可以由深度测试模块调用片段着色器模块,也可以由图形处理器中的其他硬件模块调用片段着色器模块。The depth testing module performs depth testing on pixels in each tile according to a predetermined processing sequence. Among them, the depth test for the pixels of each tile is performed at the sub-tile granularity. In this disclosure, each tile is divided into multiple sub-tiles according to the same sub-tile division rules. tiles. Taking the i-th tile as an example, the data in one sub-tile is read each time for depth testing. After completing the depth test of one sub-tile, the data in the next sub-tile is read in a predetermined order. Depth testing is performed until all sub-tiles in the i-th tile complete the depth test. After all sub-tiles in the i-th tile complete the depth test, the fragment shader module can be called to perform fragment calculations for the i-th tile. In the embodiment of the present disclosure, the fragment shader module can be called by the depth testing module or by other hardware modules in the graphics processor.
其中,片段着色器模块被配置为:逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,片段着色器模块被调用。Among them, the fragment shader module is configured to perform fragment calculations block by block, wherein the fragment shader module is called after the depth test of each sub-block within each block is completed.
本公开实施例中,VRS使能的情况下,图块划分模块划分的图块尺寸不仅可以是基本图块尺寸与VRS像素组尺寸的乘积,还可以是上述约束条件下的其他选择,本公开对此不作限定。In the embodiment of the present disclosure, when VRS is enabled, the block size divided by the block dividing module can not only be the product of the basic block size and the VRS pixel group size, but can also be other choices under the above constraints. The present disclosure There is no limit to this.
本公开实施例中,子图块尺寸不仅可以说基本图块尺寸,还可以是上述约束条件下的其他选择,本公开对此不作限定。In the embodiment of the present disclosure, the sub-tile size can be not only the basic block size, but also other choices under the above constraints, which is not limited by the present disclosure.
本公开实施例中,子图块划分的功能可以由深度测试模块实现,也可以由图块划分模块实现,当然还可以由其他模块实现,本公开对此不作限定。In the embodiment of the present disclosure, the function of dividing sub-tiles can be implemented by the depth testing module, or can be implemented by the tile dividing module. Of course, it can also be implemented by other modules, and the present disclosure does not limit this.
若由深度测试模块划分子图块,那么,深度测试模块可以被配置为:将各个图块划分为多个子图块。If the sub-tiles are divided by the depth testing module, then the depth testing module may be configured to: divide each tile into multiple sub-tiles.
将各个图块划分为多个子图块,具体是指在覆盖图块的图元中,确定覆盖各个子图块的图元。其具体实现方式可以参照图块划分的实现方式,此处不再赘述。Dividing each tile into multiple sub-tiles specifically refers to determining the primitives covering each sub-tile among the primitives covering the tile. The specific implementation method can refer to the implementation method of tile division, and will not be described again here.
子图块的划分结果可以在单独的数据结构中保存,其数据结构可以参照图块划分结果的数据结构。但与图块划分结果的数据结构可能存在以下不同:在子图块的划分结果中标记子图块与图块的关联关系。本公开不对关联关系的具体标记方式进行限定。例如,每个子图块的划分结果中包括子图块标识,该子图块标识中通过至少一个标识位标记该子图块所属的图块。又例如,在每个图块的最后一个子图块的划分结果中标记图块结束。当然,子图块划分结果的数据结构中也可以不标记上述关联关系,那么,深度测试模块可以通过图块划分结果与子图块划分结果的比对来确认一个图块的深度测试结束,或者根据读取次数确认一个图块的深度测试结束,本公开对此不作限定。The division result of sub-tiles can be saved in a separate data structure, and its data structure can refer to the data structure of the division result of the tile. However, the data structure of the tile division result may be different in the following ways: the association between the sub-tile and the tile is marked in the sub-tile division result. This disclosure does not limit the specific marking method of association relationships. For example, the division result of each sub-block includes a sub-block identification, and the sub-block identification uses at least one identification bit to mark the block to which the sub-block belongs. As another example, the end of the tile is marked in the division result of the last sub-tile of each tile. Of course, the above-mentioned correlation relationship does not need to be marked in the data structure of the sub-tile division results. Then, the depth testing module can confirm the end of the depth test of a tile by comparing the tile division results with the sub-tile division results, or The end of the depth test of a tile is confirmed based on the number of reads, which is not limited by this disclosure.
子图块的划分结果也可以保存在图块划分结果中。也就是说,在每个图块的图块划分结果中标记子图块划分结果。本公开不对子图块划分结果的标记方式进行限定。每个图块的图块划分结果包括图块标识和覆盖图块的图元索引,作为举例而非限定,可以增加每个图元的图元索引对应的标识信息,该标识信息包括被该图元覆盖的子图块的子图块标记。The division results of sub-tiles can also be saved in the tile division results. That is, the sub-tile division results are marked in the tile division results of each tile. This disclosure does not limit the marking method of the sub-tile division results. The tile division result of each tile includes a tile identifier and a primitive index of the covering tile. As an example but not a limitation, identification information corresponding to the primitive index of each tile may be added, and the identification information includes the tile index used by the tile. The subtile marker for the meta overridden subtile.
若由图块划分模块划分子图块,那么,图块划分模块还被配置为:将各个图块划分为多个子图块。其具体实现方式可以参照上述实施例的描述,此处不再赘述。If the tile dividing module divides the sub-tiles, then the tile dividing module is further configured to: divide each tile into multiple sub-tiles. For its specific implementation, reference may be made to the description of the above embodiments, which will not be described again here.
如上所述,子图块的划分结果可以在单独的数据结构中保存,也可以保存在图块划分结果中,其具体实现方式可以参照上述实施例的描述,此处不再赘述。As mentioned above, the division results of sub-tiles can be stored in a separate data structure or in the division results of tiles. For specific implementation methods, reference can be made to the description of the above embodiments and will not be described again here.
若由图块划分模块划分子图块,图块划分和子图块划分可以在同一处理过程中完成。作为举例而非限定:图块划分模块确定覆盖各个子图块的图元,这一过程实际上既实现了子图块的划分,也实现了图块的划分。If the sub-tiles are divided by the tile division module, the tile division and the sub-tile division can be completed in the same process. As an example but not a limitation: the tile division module determines the primitives covering each sub-tile. This process actually realizes the division of sub-tiles and tiles.
在上述任意子图块划分的实施例中,若图块的尺寸为基本图块尺寸与VRS像素组尺寸的乘积,作为 举例而非限定,分块规则是以基本图块尺寸为粒度进行子图块的划分。若图块的尺寸小于基本图块尺寸与VRS像素组尺寸的乘积,作为举例而非限定,分块规则是将图块均分为N个子图块。In the above embodiment of any sub-block division, if the size of the block is the product of the basic block size and the VRS pixel group size, as By way of example but not limitation, the blocking rule is to divide sub-blocks using the basic block size as the granularity. If the size of the tile is smaller than the product of the basic tile size and the VRS pixel group size, as an example but not a limitation, the blocking rule is to divide the tile into N sub-tiles.
在上述任一图形处理器实施例的基础上,片段着色器模块可以被配置为:对同一VRS像素组中的像素进行一次片段计算。Based on any of the above graphics processor embodiments, the fragment shader module may be configured to perform a fragment calculation on pixels in the same VRS pixel group.
在此基础上,可选的,片段着色器对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。On this basis, optionally, only one fragment calculation result is saved for the same VRS pixel group in the geometry buffer corresponding to the fragment shader.
本公开实施例还提供一种图形处理系统,该图形处理系统包括上述任一实施例中所述的图形处理器。An embodiment of the present disclosure also provides a graphics processing system, which includes the graphics processor described in any of the above embodiments.
本公开实施例中,图形处理系统的产品形态可以为SOC(System on Chip,片上系统)芯片。In this disclosed embodiment, the product form of the graphics processing system may be a SOC (System on Chip) chip.
本公开实施例中的图形处理器系统可以是单die(晶片)SOC芯片,也可以是多die互联的SOC芯片。The graphics processor system in the embodiment of the present disclosure may be a single-die (wafer) SOC chip or a multi-die interconnected SOC chip.
下面以一个die为例,对本公开提供的图形处理系统的架构及工作原理进行说明。The following uses a die as an example to explain the architecture and working principle of the graphics processing system provided by the present disclosure.
在图2所示的一个实施例中,单die的图形处理系统包括GPU核,该GPU核即上述的图形处理器。In one embodiment shown in FIG. 2 , a single-die graphics processing system includes a GPU core, which is the above-mentioned graphics processor.
GPU核用来处理绘图的指令,根据绘图指令,执行图像渲染的Pipeline,还可以用来执行其他运算指令。GPU核中进一步包括:计算单元,用于执行着色器编译后的指令,属于可编程模块,由大量的ALU组成;缓存(Cache),用于GPU核数据的缓存,以减少对内存的访问;光栅化模块,3D渲染管线的一个固定的阶段;图块划分(Tilling)模块,TBR和TBDR GPU架构中对一帧进行图块划分处理;裁剪模块,3D渲染管线的一个固定的阶段,裁剪掉观察范围外,或者背面不显示的图元;后处理模块,用于对画完的图进行缩放,裁剪,旋转等操作;微核(Micro core),用于GPU核上各个管线硬件模块之间的调度,或者用于多GPU核的任务调度。The GPU core is used to process drawing instructions. According to the drawing instructions, it executes the pipeline of image rendering and can also be used to execute other computing instructions. The GPU core further includes: a computing unit, which is used to execute compiled instructions of the shader. It is a programmable module and consists of a large number of ALUs; a cache (Cache), which is used to cache GPU core data to reduce access to memory; Rasterization module, a fixed stage of the 3D rendering pipeline; Tilling module, which divides a frame into tiles in the TBR and TBDR GPU architecture; Cropping module, a fixed stage of the 3D rendering pipeline, crops out Graph elements that are outside the observation range or are not displayed on the back; the post-processing module is used to perform operations such as scaling, cropping, rotating, etc. on the finished drawing; the Micro core is used between various pipeline hardware modules on the GPU core Scheduling, or task scheduling for multiple GPU cores.
GPU核连接到片上网络。其中,片上网络用于图形处理系统上各个主(master)和从(salve)之间的数据交换,本实施例中,片上网络包括配置总线、数据通信网络、通信总线等等。The GPU cores are connected to the on-chip network. Among them, the on-chip network is used for data exchange between masters and slaves on the graphics processing system. In this embodiment, the on-chip network includes a configuration bus, a data communication network, a communication bus, and so on.
如图2所示,图形处理系统还可以包括:As shown in Figure 2, the graphics processing system can also include:
通用DMA(Direct Memory Access,直接存储器访问),用于执行主机端到图形处理系统内存(例如显卡内存)之间的数据搬移,例如,通过DMA将3D画图的顶点(vertex)数据从主机端搬到图形处理系统内存;Universal DMA (Direct Memory Access) is used to perform data movement between the host and the graphics processing system memory (such as graphics card memory). For example, DMA is used to move the vertex data of 3D drawings from the host. to graphics processing system memory;
PCIe控制器,用于和主机通信的接口,实现PCIe协议,使图形处理系统通过PCIe接口连接到主机上,主机上运行了图形API以及显卡的驱动等程序;PCIe controller, an interface used to communicate with the host, implements the PCIe protocol, so that the graphics processing system is connected to the host through the PCIe interface, and the graphics API and graphics card driver and other programs are run on the host;
应用处理器,用于图形处理系统上各个模块任务的调度,例如GPU渲染完一帧图后通知应用处理器,应用处理器再启动显示控制器将GPU画完的图显示到屏幕上;The application processor is used to schedule the tasks of each module on the graphics processing system. For example, after the GPU has finished rendering a frame, it notifies the application processor, and the application processor then starts the display controller to display the picture drawn by the GPU on the screen;
内存控制器,用于连接内存设备,用于保存SOC上的数据;Memory controller, used to connect memory devices and save data on the SOC;
显示控制器,控制将内存里的frame buffer以显示接口(HDMI,DP等)输出到显示器上;The display controller controls the output of the frame buffer in the memory to the display through the display interface (HDMI, DP, etc.);
视频解码,可以将主机硬盘上的编码的视频解码成能显示的画面;Video decoding can decode the encoded video on the host hard disk into a displayable picture;
视频编码,可以将主机硬盘上原始的视频码流编码成指定的格式返回给主机。Video encoding can encode the original video stream on the host hard disk into a specified format and return it to the host.
基于图2所示的图形处理系统架构,在一个实施例中,图形渲染过程如下:Based on the graphics processing system architecture shown in Figure 2, in one embodiment, the graphics rendering process is as follows:
主机的图形API(实际应用中,对于移动端的图形处理系统,也可以由应用处理器上的软件)向SOC芯片发送绘图指令,要求对图像帧进行渲染。 The graphics API of the host computer (in actual applications, for mobile graphics processing systems, it can also be used by software on the application processor) sends drawing instructions to the SOC chip, requiring the rendering of image frames.
其中,图像帧中包括至少一个物体。Wherein, the image frame includes at least one object.
通用DMA将图像帧中各个物体的顶点坐标信息从主机端搬运至图形处理系统内存。General DMA transfers the vertex coordinate information of each object in the image frame from the host to the memory of the graphics processing system.
GPU核的计算单元获取上述绘图指令后,对该绘图指令进行译码。After obtaining the above drawing instruction, the computing unit of the GPU core decodes the drawing instruction.
GPU核的顶点着色器(由计算单元实现其功能)从系统内存中获取图像帧中各个物体的顶点坐标信息,并将物体的顶点坐标信息传输给几何着色器(由计算单元实现其功能),几何着色器将物体顶点的3D坐标转换为展开的纹理坐标(即(u,v)坐标)。另外,计算单元还根据物体的顶点坐标信息进行图元装配,从而确定各个图元的顶点坐标。其中,纹理图中顶点坐标对应的纹理坐标处的取值为顶点颜色信息。The vertex shader of the GPU core (its function is implemented by the computing unit) obtains the vertex coordinate information of each object in the image frame from the system memory, and transmits the vertex coordinate information of the object to the geometry shader (its function is implemented by the computing unit), The geometry shader converts the 3D coordinates of the object's vertices into unwrapped texture coordinates (i.e. (u,v) coordinates). In addition, the computing unit also assembles primitives based on the vertex coordinate information of the object to determine the vertex coordinates of each primitive. Among them, the value at the texture coordinate corresponding to the vertex coordinate in the texture map is the vertex color information.
图元的顶点坐标信息和顶点纹理坐标被保存到图元在系统内存中的数据结构中。The vertex coordinate information and vertex texture coordinates of the primitive are saved to the primitive's data structure in system memory.
几何处理结束后,GPU核中的图块划分模块识别VRS是否使能,若VRS未使能,按照基本图块尺寸对图像帧中的图元进行图块划分处理,如果VRS使能,按照扩展图块尺寸对图像帧中的图元进行图块划分处理。其中,扩展图块尺寸为基本图块尺寸与VRS像素组尺寸的乘积。图块划分模块将图块划分结果保存至图块缓冲区,每个图块的图块划分结果包括图块标识和覆盖图块的图元的图元索引。After the geometry processing is completed, the block division module in the GPU core identifies whether VRS is enabled. If VRS is not enabled, the primitives in the image frame are divided into blocks according to the basic block size. If VRS is enabled, the primitives in the image frame are divided into blocks according to the extended size. Tile size performs tile division processing on the primitives in the image frame. Among them, the extended tile size is the product of the basic tile size and the VRS pixel group size. The tile division module saves the tile division result to the tile buffer, and the tile division result of each tile includes the tile identification and the primitive index of the primitive covering the tile.
图块划分结束后,光栅化模块进行光栅化处理。光栅化模块逐个图块进行处理,每次从图块缓冲区读取覆盖当前图块的图元的图元索引;光栅化模块通过图元索引读取该图元的图元信息,并利用该图元的图元信息进行像素覆盖测试,以确定该图元覆盖的像素,进而通过插值计算确定该图元覆盖的像素对应的纹理坐标,然后进行至少一项像素测试,以确定像素的可见性(作为举例而非限定,像素测试可以包括深度测试、模板测试等)。After the tile division is completed, the rasterization module performs rasterization processing. The rasterization module processes tiles one by one, reading the primitive index of the primitive covering the current tile from the tile buffer each time; the rasterization module reads the primitive information of the primitive through the primitive index, and uses the primitive index to The primitive information of the primitive is subjected to a pixel coverage test to determine the pixels covered by the primitive, and then the texture coordinates corresponding to the pixels covered by the primitive are determined through interpolation calculation, and then at least one pixel test is performed to determine the visibility of the pixel (As an example and not a limitation, pixel testing may include depth testing, template testing, etc.).
其中,光栅化模块进行深度测试之前,按照基本图块尺寸将各个图块划分为多个子图块,并在每个图块的图块划分结果中标记子图块划分结果。然后,光栅化模块根据子图块划分结果,每次从内存中加载一个子图块的数据到深度缓冲区,进行深度测试。Among them, before the rasterization module performs the depth test, each tile is divided into multiple sub-tiles according to the basic tile size, and the sub-tile division result is marked in the tile division result of each tile. Then, the rasterization module divides the results into sub-tiles and loads the data of one sub-tile from the memory into the depth buffer at a time for depth testing.
作为举例而非限定,光栅化模块根据图块标识从图块缓冲区中识别当前待处理的图块,并从图块缓冲区中当前图块对应的图元索引中查找当前子图块对应的图元索引;进而根据查找到的图元索引查找图元的图元信息,图元信息中包括图元的深度信息,加载当前子图块对应图元的深度信息。As an example and not a limitation, the rasterization module identifies the tile currently to be processed from the tile buffer according to the tile identifier, and searches for the corresponding sub-tile from the primitive index corresponding to the current tile in the tile buffer. Graph element index; and then search the graph element information of the graph element according to the found graph element index. The graph element information includes the depth information of the graph element, and load the depth information of the graph element corresponding to the current sub-tile.
对一个子图块执行完深度测试结束后,再通知加载另一个子图块的数据。当一个图块中所有子图块的深度测试完成后,调用片段着色器对改图框进行着色计算(即片段计算)。After the depth test is completed on one sub-tile, the data of another sub-tile is loaded. After the depth test of all sub-tiles in a tile is completed, the fragment shader is called to perform shading calculations on the frame (i.e. fragment calculation).
本公开实施例中,深度缓冲区中保存了一个子图块中所有像素的深度信息,可在深度测试过程中反复读取和更新,知道一个子图块中所有像素深度测试结束。In this disclosed embodiment, the depth information of all pixels in a sub-tile is stored in the depth buffer, which can be read and updated repeatedly during the depth test until the depth test of all pixels in a sub-tile is completed.
GPU核的片段着色器(由计算单元实现其功能)对图块内的像素进行着色计算(例如光照计算)。The fragment shader of the GPU core (which is implemented by the computing unit) performs shading calculations (such as lighting calculations) on the pixels within the tile.
其中,根据VRS设置的着色速率对一个像素组里像素调用一次片段着色器。片段着色计算的结果保存在几何缓冲区中。Among them, the fragment shader is called once for the pixels in a pixel group according to the shading rate set by VRS. The results of fragment shading calculations are saved in a geometry buffer.
几何缓冲区中并不复制片段着色结果到像素组中的每一个像素,而是对一个像素组只保存一个片段着色的结果。片段着色器也可以读取几何缓冲区中的数据进行计算,直到一个图块中的像素全部渲染完。The geometry buffer does not copy the fragment shading result to each pixel in the pixel group, but only stores the result of one fragment shading for a pixel group. The fragment shader can also read data from the geometry buffer and perform calculations until all pixels in a tile are rendered.
本公开实施例还提供一种电子装置,该电子装置包括上述任一实施例中所述的图形处理系统。在一些使用场景下,该电子装置的产品形式体现为显卡;在另一些使用场景下,该电子装置的产品形式体现 为CPU主板。An embodiment of the present disclosure also provides an electronic device, which includes the graphics processing system described in any of the above embodiments. In some usage scenarios, the product form of the electronic device is a graphics card; in other usage scenarios, the product form of the electronic device is a graphics card. For the CPU motherboard.
本公开实施例还提供一种电子设备,该电子设备包括上述的电子装置。在一些使用场景下,该电子设备的产品形式是便携式电子设备,例如智能手机、平板电脑、VR设备等;在一些使用场景下,该电子设备的产品形式是个人电脑、游戏主机、工作站、服务器等。An embodiment of the present disclosure also provides an electronic device, which includes the above-mentioned electronic device. In some usage scenarios, the product form of the electronic device is a portable electronic device, such as a smartphone, tablet, VR device, etc.; in some usage scenarios, the product form of the electronic device is a personal computer, game console, workstation, server wait.
基于相同的发明构思,本公开实施例还提供一种图形处理方法,采用基于图块的渲染架构,如图3所示,该方法至少包括如下步骤:Based on the same inventive concept, embodiments of the present disclosure also provide a graphics processing method using a tile-based rendering architecture, as shown in Figure 3. The method at least includes the following steps:
步骤301、根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸大于所述基本图块尺寸,但不大于基本图块尺寸与所述VRS像素组尺寸的乘积;Step 301: Perform tile division processing on the picture elements in the image frame according to the basic tile size and the VRS pixel group size. The divided tile size is larger than the basic tile size, but not larger than the basic tile size and the VRS pixel group size. The product of the pixel group dimensions;
步骤302、逐个图块进行深度测试,且针对每个图块,分多个子图块进行深度测试,每个子图块尺寸受限于深度缓冲区的尺寸;Step 302: Perform depth testing one by one, and perform depth testing on multiple sub-tiles for each tile. The size of each sub-tile is limited to the size of the depth buffer;
步骤303、逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,片段着色器模块被调用。Step 303: Perform fragment calculations tile by tile. After the depth test of each sub-tile in each tile is completed, the fragment shader module is called.
可选的,划分的图块尺寸为基本图块尺寸与VRS像素组尺寸的乘积。Optionally, the divided tile size is the product of the basic tile size and the VRS pixel group size.
可选的,子图块尺寸为基本图块尺寸。Optionally, the subtile size is the base tile size.
在上述任一图形处理方法实施例的基础上,分多个子图块进行深度测试之前,还可以将各个图块划分为多个子图块。Based on any of the above embodiments of the graphics processing method, each tile may also be divided into multiple sub-tiles before being divided into multiple sub-tiles for depth testing.
进一步的,在每个图块的图块划分结果中标记子图块划分结果。Further, the sub-tile division results are marked in the tile division results of each tile.
在上述任一图形处理方法实施例的基础上,对同一VRS像素组中的像素进行一次片段计算。Based on any of the above graphics processing method embodiments, a fragment calculation is performed on pixels in the same VRS pixel group.
在此基础上,可选的,片段着色器对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。On this basis, optionally, only one fragment calculation result is saved for the same VRS pixel group in the geometry buffer corresponding to the fragment shader.
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。Although the preferred embodiments of the present disclosure have been described, those skilled in the art will be able to make additional changes and modifications to these embodiments once the basic inventive concepts are apparent. Therefore, it is intended that the appended claims be construed to include the preferred embodiments and all changes and modifications that fall within the scope of this disclosure.
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。 Obviously, those skilled in the art can make various changes and modifications to the present disclosure without departing from the spirit and scope of the disclosure. In this way, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and equivalent technologies, the present disclosure is also intended to include these modifications and variations.

Claims (15)

  1. 一种图形处理器,采用基于图块的渲染架构,所述图形处理器包括:A graphics processor using a tile-based rendering architecture, the graphics processor including:
    图块划分模块,被配置为:根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸为所述基本图块尺寸与所述VRS像素组尺寸的乘积;A block dividing module is configured to: perform block dividing processing on the picture elements in the image frame according to the basic block size and the VRS pixel group size, and the divided block size is the basic block size and the VRS pixel group size. product of dimensions;
    深度测试模块,被配置为:逐个图块进行深度测试,且针对每个图块,分多个子图块进行深度测试,每个子图块尺寸为所述基本图块尺寸;The depth testing module is configured to: perform depth testing one by one, and perform depth testing on each tile in multiple sub-tiles, the size of each sub-tile being the basic tile size;
    片段着色器模块,被配置为:逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,所述片段着色器模块被调用。The fragment shader module is configured to perform fragment calculations block by block, wherein the fragment shader module is called after the depth test of each sub-block within each block is completed.
  2. 根据权利要求1所述的图形处理器,所述深度测试模块被配置为:将各个图块划分为多个子图块。The graphics processor of claim 1, the depth testing module is configured to divide each tile into a plurality of sub-tiles.
  3. 根据权利要求2所述的图形处理器,所述深度测试模块还被配置为:在每个图块的图块划分结果中标记子图块划分结果。The graphics processor according to claim 2, the depth testing module is further configured to: mark the sub-tile division result in the tile division result of each tile.
  4. 根据权利要求1所述的图形处理器,所述图块划分模块还被配置为:将各个图块划分为多个子图块。According to the graphics processor of claim 1, the tile dividing module is further configured to divide each tile into a plurality of sub-tiles.
  5. 根据权利要求4所述的图形处理器,所述图块划分模块还被配置为:分别保存每个图块的图块划分结果,并在每个图块的图块划分结果中标记子图块划分结果。The graphics processor according to claim 4, the tile division module is further configured to: respectively save the tile division results of each tile, and mark sub-tiles in the tile division results of each tile. Divide the results.
  6. 根据权利要求1至5任一项所述的图形处理器,所述片段着色器模块被配置为:对同一VRS像素组中的像素进行一次片段计算。According to the graphics processor according to any one of claims 1 to 5, the fragment shader module is configured to perform a fragment calculation on pixels in the same VRS pixel group.
  7. 根据权利要求6所述的图形处理器,所述片段着色器对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。According to the graphics processor of claim 6, the geometry buffer corresponding to the fragment shader only stores one fragment calculation result for the same VRS pixel group.
  8. 一种图形处理系统,包括权利要求1至7任一项所述的图形处理系统。A graphics processing system, including the graphics processing system according to any one of claims 1 to 7.
  9. 一种电子装置,包括权利要求8所述的系统。An electronic device comprising the system of claim 8.
  10. 一种电子设备,包括权利要求9所述的电子装置。An electronic device including the electronic device according to claim 9.
  11. 一种图形处理方法,采用基于图块的渲染架构,所述图形处理方法包括:A graphics processing method using a tile-based rendering architecture, the graphics processing method includes:
    根据基本图块尺寸和VRS像素组尺寸对图像帧中的图元进行图块划分处理,划分的图块尺寸为所述基本图块尺寸与所述VRS像素组尺寸的乘积; Perform block division processing on the primitives in the image frame according to the basic block size and the VRS pixel group size, and the divided block size is the product of the basic block size and the VRS pixel group size;
    逐个图块进行深度测试,且针对每个图块,分多个子图块进行深度测试,每个子图块尺寸为所述基本图块尺寸;Conduct depth testing one by one, and perform depth testing on multiple sub-tiles for each tile, and the size of each sub-tile is the basic tile size;
    逐个图块进行片段计算,其中,每个图块内各个子图块的深度测试完成后,所述片段着色器模块被调用。Fragment calculations are performed tile by tile, where the fragment shader module is called after the depth test of each sub-tile within each tile is completed.
  12. 根据权利要求11所述的方法,所述分多个子图块进行深度测试之前,所述方法还包括:The method according to claim 11, before dividing into multiple sub-tiles for depth testing, the method further includes:
    将各个图块划分为多个子图块。Divide each tile into sub-tiles.
  13. 根据权利要求12所述的方法,所述方法还包括:The method of claim 12, further comprising:
    在每个图块的图块划分结果中标记子图块划分结果。Mark sub-tile partitioning results in the tile partitioning results for each tile.
  14. 根据权利要求11至13任一项所述的方法,对同一VRS像素组中的像素进行一次片段计算。According to the method of any one of claims 11 to 13, a fragment calculation is performed on pixels in the same VRS pixel group.
  15. 根据权利要求14所述的方法,片段计算对应的几何缓冲区中仅对同一VRS像素组保存一个片段计算结果。 According to the method of claim 14, only one fragment calculation result is stored for the same VRS pixel group in the geometry buffer corresponding to the fragment calculation.
PCT/CN2023/085937 2022-04-20 2023-04-03 Graphics processing unit and system, electronic apparatus and device, and graphics processing method WO2023202366A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210414535.8A CN116957900A (en) 2022-04-20 2022-04-20 Graphics processor, system, electronic device, apparatus, and graphics processing method
CN202210414535.8 2022-04-20

Publications (1)

Publication Number Publication Date
WO2023202366A1 true WO2023202366A1 (en) 2023-10-26

Family

ID=88419074

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085937 WO2023202366A1 (en) 2022-04-20 2023-04-03 Graphics processing unit and system, electronic apparatus and device, and graphics processing method

Country Status (2)

Country Link
CN (1) CN116957900A (en)
WO (1) WO2023202366A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284119A1 (en) * 2015-03-24 2016-09-29 Prasoonkumar Surti Hardware Based Free Lists for Multi-Rate Shader
US20170293995A1 (en) * 2016-04-08 2017-10-12 Qualcomm Incorporated Per-vertex variable rate shading
CN111066066A (en) * 2017-08-25 2020-04-24 超威半导体公司 Variable ratio tinting
US20210192827A1 (en) * 2019-12-20 2021-06-24 Advanced Micro Devices, Inc. Vrs rate feedback

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160284119A1 (en) * 2015-03-24 2016-09-29 Prasoonkumar Surti Hardware Based Free Lists for Multi-Rate Shader
US20170293995A1 (en) * 2016-04-08 2017-10-12 Qualcomm Incorporated Per-vertex variable rate shading
CN111066066A (en) * 2017-08-25 2020-04-24 超威半导体公司 Variable ratio tinting
US20210192827A1 (en) * 2019-12-20 2021-06-24 Advanced Micro Devices, Inc. Vrs rate feedback

Also Published As

Publication number Publication date
CN116957900A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
US8199164B2 (en) Advanced anti-aliasing with multiple graphics processing units
US8670613B2 (en) Lossless frame buffer color compression
CN111062858B (en) Efficient rendering-ahead method, device and computer storage medium
US9406149B2 (en) Selecting and representing multiple compression methods
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
US8009172B2 (en) Graphics processing unit with shared arithmetic logic unit
TW200818054A (en) Tile based precision rasterization in graphics pipeline
WO2020192608A1 (en) Graphics rendering method and apparatus, and computer readable storage medium
CN109785417B (en) Method and device for realizing OpenGL cumulative operation
KR20120096119A (en) Graphic processor and early visibility testing method
US10397542B2 (en) Facilitating quantization and compression of three-dimensional graphics data using screen space metrics at computing devices
CN112801855B (en) Method and device for scheduling rendering task based on graphics primitive and storage medium
JP2014089727A (en) Graphic system using dynamic rearrangement of depth engine
CN110675480A (en) Method and device for acquiring sampling position of texture operation
CN117058288A (en) Graphics processor, multi-core graphics processing system, electronic device, and apparatus
WO2023202367A1 (en) Graphics processing unit, system, apparatus, device, and method
CN111080761A (en) Method and device for scheduling rendering tasks and computer storage medium
US20080055326A1 (en) Processing of Command Sub-Lists by Multiple Graphics Processing Units
JP4532746B2 (en) Method and apparatus for stretch blitting using a 3D pipeline
CN116263982B (en) Graphics processor, system, method, electronic device and apparatus
US9305388B2 (en) Bit-count texture format
US9019284B2 (en) Input output connector for accessing graphics fixed function units in a software-defined pipeline and a method of operating a pipeline
US7492373B2 (en) Reducing memory bandwidth to texture samplers via re-interpolation of texture coordinates
WO2023202366A1 (en) Graphics processing unit and system, electronic apparatus and device, and graphics processing method
US20140204106A1 (en) Shader program attribute storage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23791022

Country of ref document: EP

Kind code of ref document: A1