WO2022110084A1 - Pixel processing method and graphics processing unit - Google Patents

Pixel processing method and graphics processing unit Download PDF

Info

Publication number
WO2022110084A1
WO2022110084A1 PCT/CN2020/132537 CN2020132537W WO2022110084A1 WO 2022110084 A1 WO2022110084 A1 WO 2022110084A1 CN 2020132537 W CN2020132537 W CN 2020132537W WO 2022110084 A1 WO2022110084 A1 WO 2022110084A1
Authority
WO
WIPO (PCT)
Prior art keywords
call
current draw
pixel
depth
draw
Prior art date
Application number
PCT/CN2020/132537
Other languages
French (fr)
Chinese (zh)
Inventor
殷亚云
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080107534.8A priority Critical patent/CN116529771A/en
Priority to PCT/CN2020/132537 priority patent/WO2022110084A1/en
Publication of WO2022110084A1 publication Critical patent/WO2022110084A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture

Definitions

  • the present application relates to the technical field of graphics processing units (GPUs), and in particular, to a pixel processing method executed in a graphics processor.
  • GPUs graphics processing units
  • virtual 3D objects are rendered on a 2D display screen by executing a graphics rendering pipeline in a graphics processor GPU.
  • the GPU receives commands (eg, draw-call commands, i.e., draw-call commands) and/or data (including rendering state, such as materials, textures, shaders, etc. of the drawn object), etc., from, for example, the CPU, as well as from external system memory (not shown) receives vertex data, renders primitives according to commands, such as draw-call commands, and finally generates an output image on the display screen.
  • commands eg, draw-call commands, i.e., draw-call commands
  • data including rendering state, such as materials, textures, shaders, etc. of the drawn object
  • external system memory not shown
  • a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered.
  • a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test).
  • the depth test in most scenes can be performed before pixel-shader, and the pixels that have not been discarded enter pixel-shader for rendering. This is the so-called early depth test (Early Z test).
  • the depth test needs to be performed after pixel shading, which is the so-called post-depth test (Late-Z test).
  • the depth value is affected by pixel shading, it can be known in advance that its influence always changes in one direction. For example, the depth value after pixel shading will always be larger than the original value.
  • a post-depth test (Late-Z test) is performed, which is called a conservative depth test (Conservative-Z test).
  • draw-calls in certain depth test modes may affect the depth testing of subsequent draw-calls.
  • draw-calls of the Late-Z test or the Conservative-Z test when these draw-calls have not completed the depth test and the depth buffer update (ie, read and write to the depth buffer), the subsequent draw-calls (such as Early-Z draw-call) suspends the depth test, otherwise it will cause inconsistencies in the depth buffer. This obviously causes the coherence of draw-call execution to become worse, resulting in serious degradation of GPU performance.
  • Embodiments of the present application provide a pixel processing method for at least solving one of the above-mentioned shortcomings in the prior art.
  • a pixel processing method comprising: after rasterization of a current draw-call command (draw-call), according to a depth test mode of the current draw-call, setting The value of the flag of the -call associated pixel area indicating whether pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; perform pixel shading for the current draw-call and depth testing; and after pixel shading and depth testing of the current draw-call are completed, clear the value of the flag of the associated pixel region to zero.
  • the pixel area associated with the current draw-call will be marked (PipeNeedDrain), and the subsequent draw
  • the disabling of pixel shading and depth testing of -call is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing, thereby improving the coherence of draw-call command execution. to minimize the impact on the parallel computing capability of the GPU.
  • the method further includes: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area does not disable pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, performing pixel shading and depth testing of the current draw-call includes: performing the current draw-call Call's depth-ahead test, after the depth-ahead test is passed and the corresponding depth buffer is updated, the pixel shading of the current draw-call is performed.
  • the depth test mode is conservative depth test
  • the method further comprises: after rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the current draw-call In the advanced depth test, after the advanced depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.
  • the depth test is a later depth test
  • the method further comprises: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the pixels of the current draw-call shading, and perform a post-depth test of the current draw-call.
  • a graphics processor including a memory for storing instructions, and a processing unit configured to perform any of the above methods when executing the instructions.
  • a computer-readable storage medium where program codes are stored in the computer-readable storage medium, and when the program codes are executed by a computer or a processor, any one of the above methods is implemented.
  • a computer program product is provided.
  • the program code included in the computer program product is executed by a computer or a processor, any of the above methods can be implemented.
  • a graphics processing system comprising: a setting unit configured to, after rasterization of a current draw-call (draw-call), test a mode according to the depth of the current draw-call , setting the value of the flag of the pixel area associated with the current draw-call, indicating whether pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for the associated pixel area, and in all After the pixel coloring and the depth test of the described current draw-call are completed, the value of the mark of the associated pixel area is cleared to zero; And the pixel processing unit is configured to perform the pixel coloring and the depth test of the current draw-call .
  • the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are not disabled for the associated pixel region; the pixel processing unit is configured to: execute an advance of the current draw-call Depth test, after the advance depth test passes and the corresponding depth buffer is updated, pixel shading for the current draw-call is performed.
  • the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel region; the pixel processing unit is further configured to: execute an advance of the current draw-call In the depth test, after the advance depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.
  • the depth test is a later depth test
  • the setting unit is further configured to: after the rasterization of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call , indicating that the pixel shading and depth testing of subsequent draw-calls after the current draw-call is disabled for the associated pixel area; the pixel processing unit is further configured to: execute the pixel shading of the current draw-call and Later in-depth testing.
  • a graphics processing system comprising: a control unit configured to: after rasterization of a current draw-call (draw-call), perform a depth test according to the current draw-call mode, set the value of the marker of the pixel area associated with the current draw-call, indicating whether to disable the pixel shading and depth test of the subsequent draw-call after the current draw-call for the associated pixel area; After the summation and depth testing of the pixel area is completed, the value of the mark in the pixel area is cleared to zero; the pixel shader is configured to: execute pixel shading of the current draw-call; the depth test unit is configured to: execute a depth test of the current draw-call; and a marker buffer configured to: store the value of the marker of the associated pixel region.
  • the control unit is further configured to: after the rasterization of the current draw-call, set the marking of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are not disabled for the associated pixel area;
  • the depth testing unit includes an advanced depth testing unit configured to: execute the A look-ahead depth test for the current draw-call; the pixel shader is configured to perform pixel shading for the current draw-call after the look-ahead depth test passes and a corresponding depth buffer is updated.
  • the depth test mode is conservative depth test
  • the control unit is further configured to: after the rasterization of the current draw-call, set the flag of the pixel area associated with the current draw-call. value, indicating that the pixel shading and depth test of the subsequent draw-call after the current draw-call is disabled for the associated pixel area;
  • the depth test unit includes an advance depth test unit and a later depth test unit, the advance depth
  • the testing unit is configured to: perform an advance depth test of the current draw-call;
  • the pixel shader is configured to: after the advance depth test is passed, perform pixel shading of the current draw-call;
  • the post-depth testing unit is configured to perform post-depth testing of the current draw-call after pixel shading of the current draw-call.
  • the depth test is a later depth test
  • the control unit is further configured to: after the rasterization of the current draw-call (draw-call), set the pixels associated with the current draw-call The value of the flag of the region, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for this associated pixel region;
  • the depth testing unit includes a later depth testing unit, and the pixel shader configures is configured to: perform pixel shading of the current draw-call; and the post-stage depth testing unit is configured to perform post-stage depth testing.
  • the pixels of the subsequent draw-call are marked by setting a mark for the affected pixel area.
  • the disabling of shading and depth testing is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing.
  • the pixels of the subsequent draw-call cannot be executed until the draw-call pipeline of the specific depth test mode is emptied, or the pixel coloring and depth test of the draw-call of the specific depth test mode are completed.
  • the continuity of the execution of the draw-call command is improved, and the impact on the parallel computing capability of the GPU is minimized.
  • FIG. 1 is a schematic diagram of a computing device implementing an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a GPU implementing an embodiment of the present application.
  • FIG. 3 is an example of a rendering pipeline implemented in the GPU of FIG. 2 .
  • FIG. 4 is an example of a rendering pipeline implemented in a GPU implementing the pixel processing method according to the embodiment of the present application.
  • FIG. 5 is a flowchart of a pixel processing method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a pixel processing method provided in an embodiment of the present application in an Early-Z mode.
  • FIG. 7 is a flowchart of a pixel processing method provided by an embodiment of the present application in a conservative depth test mode.
  • FIG. 8 is a flowchart of a pixel processing method provided in an embodiment of the present application in a Late-Z mode.
  • FIG. 9 is a schematic diagram of an implementation scenario for implementing the pixel processing method provided by the embodiment of the present application.
  • FIG. 10 is a schematic diagram of a graphics processing system implementing the pixel processing method according to an embodiment of the present application.
  • the methods provided by the embodiments of the present application are implemented, for example, in a computing device.
  • the overall architecture of the computing device is shown in FIG. 1 .
  • the computing device 100 may include, but is not limited to, the following: personal computers, such as laptop computers, desktop computers, tablet computing devices, etc., but also wireless devices, mobile phones (including smart phones), personal digital assistants ( PDA), video game consoles (including video monitors, mobile video game devices, mobile video conferencing units), TV set-top boxes, in-car intelligent systems, intelligent wearable devices, e-book readers, fixed or mobile media players, etc.
  • personal computers such as laptop computers, desktop computers, tablet computing devices, etc., but also wireless devices, mobile phones (including smart phones), personal digital assistants ( PDA), video game consoles (including video monitors, mobile video game devices, mobile video conferencing units), TV set-top boxes, in-car intelligent systems, intelligent wearable devices, e-book readers, fixed or mobile media players, etc.
  • computing device 100 may include a central processing unit (CPU) 102 and system memory 101 in communication via, for example, a memory bridge 104 .
  • the memory bridge 104 may be, for example, a Northbridge chip connected to the I/O (input/output) bridge 105 via a bus or other communication path 112 (eg, a hypertransport link).
  • I/O bridge 105 may be, for example, a south bridge chip that receives user input from one or more input devices 107 (eg, a keyboard, mouse, trackball, touch screen of a display device, or other type of input device) and communicates via communication path 112 and Memory bridge 104 forwards user input to CPU 102.
  • GPU 103 is coupled to memory bridge 104 via a bus or other communication path 112 (eg, PCI Express, Accelerated Graphics Port, or HyperTransport Link) to communicate with CPU 102 and system memory 101.
  • bus or other communication path 112 eg, PCI Express, Accelerated Graphics Port, or HyperTransport Link
  • GPU 103 may perform graphics processing operations to generate and communicate pixel data to display device 110.
  • the system disk 106 is also connected to the I/O bridge 105 .
  • Computing device 100 may also include other components (not explicitly shown), such as USB or other port connections, CD drives, DVD drives, and the like, which may also be connected to I/O bridge 105 .
  • the communication paths interconnecting the various components in Figure 1 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point point communication protocols, and connections between different devices may use different protocols known in the art.
  • PCI Peripheral Component Interconnect
  • PCI-Express PCI-Express
  • AGP Accelerated Graphics Port
  • HyperTransport or any other bus or point-to-point point communication protocols, and connections between different devices may use different protocols known in the art.
  • the configuration of the computing device 100 shown in FIG. 1 is only an example, and those skilled in the art can understand that there may be other configurations of the computing device 100 . It should be understood that variations and modifications are possible.
  • the connection topology eg, the number and arrangement of bridges, the number of CPUs, and the number of GPUs can be modified as needed.
  • the computing device 100 may include two or more CPUs 102 and two or more GPUs 103.
  • GPU 103 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. GPU 103 may be integrated with one or more other components, such as memory bridge 104, CPU 102, and I/O bridge 105, to form a system-on-chip (SOC).
  • SOC system-on-chip
  • FIG. 2 shows a schematic block diagram of the GPU 103 in the computing device 100 of FIG. 1 that can implement the methods of the embodiments of the present application.
  • GPU 103 includes circuitry for graphics processing and video processing.
  • GPU 103 may include a processing core array 203, which may include a plurality of processing cores 2031-2036.
  • FIG. 4 only shows 6 processing cores as an example, and those skilled in the art can understand that the number of processing cores may vary.
  • the processing cores shown may be, for example, general-purpose processing cores or fixed-function processing cores. Based on a plurality of general-purpose processing cores in the processing core array 203, the GPU 103 can concurrently execute a large number of program tasks or computing tasks.
  • Each general-purpose processing core may be programmed to perform various program-related processing tasks, including, but not limited to, graphics rendering operations, and the like.
  • a fixed function processing core may include hardware that is hardwired to perform certain specific functions.
  • the graphics memory 204 may be a part of the GPU 103.
  • GPU 103 may read data from or write data to graphics memory 204. That is, GPU 103 may use local storage instead of external memory to store data. In some cases, GPU 103 may also utilize system memory 101 via a bus, such as communication path 112, to read and write data.
  • Graphics memory 204 may include one or more volatile or nonvolatile memories or storage devices, such as random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), erasable programmable ROM ( EPROM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic data storage devices or optical storage devices, and the like.
  • GPU 103 may be configured to perform various operations: receive graphics data from CPU 102 and/or system memory 101 via memory bridge 104 and a bus, such as communication path 112, process the graphics data to generate pixel data, communicate with local graphics Memory 204 interacts to store and update pixel data, communicate pixel data to display device 110, and the like.
  • CPU 102 is the main processor of computing device 100, which controls and coordinates the operation of other components. Specifically, the CPU 102 issues commands to control the operation of the GPU 103. In some embodiments, CPU 102 writes a stream of commands for controlling GPU 103 to, for example, system memory 101, graphics memory 204, or other storage locations accessible to both CPU 102 and GPU 103. GPU 103 reads the command stream and can execute commands asynchronously relative to the operation of CPU 102.
  • GPU 103 includes an I/O (input/output) unit 202 that communicates with other components of computing device 100 via communication paths 112 connected to memory bridge 104.
  • the connections of GPU 103 to other components of computing device 100 may also vary.
  • GPU 103 may be implemented as an add-in card, such as may be inserted into an expansion slot of computing device 100.
  • the communication path 112 connecting the GPU 103 to the memory bridge 104 may be a PCI-EXPRESS link.
  • I/O unit 202 receives all incoming data packets (or other signals) from communication path 112, directs incoming data packets to the appropriate components of GPU 103, or transmits data packets (or other signals) via communication path 112 to Components external to GPU 103 .
  • I/O unit 202 may direct commands related to processing tasks to scheduler 201 and commands related to memory operations (eg, reads or writes to graphics memory 204 ) to graphics memory 204 .
  • the processing core array 203 may receive processing tasks to be executed from the scheduler 201 .
  • Scheduler 201 may independently schedule tasks for execution by resources of GPU 103 (eg, one or more processing cores of processing core array 203).
  • scheduler 201 may be a hardware processor.
  • the scheduler 201 may be included in the GPU 103.
  • scheduler 201 may also be a separate unit from CPU 102 and GPU 103.
  • the scheduler 201 may also be configured as any processor that receives a stream of commands and/or operations.
  • the CPU 102 via the GPU driver contained in the system memory 101 of FIG. 1, may send to the scheduler 201 a command stream containing a series of operations to be performed by the GPU 103.
  • the scheduler 201 can receive the operation flow including the command stream through the I/O unit 202 and can process the operations of the command stream sequentially based on the order of operations in the command stream, and can schedule the operations in the command stream for processing by the processing core array 230. one or more processing cores to execute.
  • each general-purpose processing core can be programmed to perform processing tasks associated with various programs, including, but not limited to, various operations in the graphics rendering pipeline (eg, vertex shader and/or pixel shader programs, Wait).
  • various operations in the graphics rendering pipeline eg, vertex shader and/or pixel shader programs, Wait.
  • FIG. 3 shows an example of a graphics rendering pipeline implemented by the GPU 103.
  • the graphics rendering pipeline is a logical function formed by cascading processing cores (eg, general-purpose processing cores and/or fixed-function processing cores) included in the processing core array.
  • the scheduler 201, the graphics memory 204, the I/O unit 202, etc. included in the GPU 103 are peripheral circuits or devices that implement the logical functions of the rendering pipeline.
  • a graphics rendering pipeline usually includes a programmable module and a fixed function module, the programmable module is executed by a general-purpose processing core, and the fixed function module is implemented by a corresponding fixed function processing core.
  • the rendering pipeline of the GPU 103 includes an input assembler (IA), a vertex shader (VS), a primitive assembler (PA), a rasterizer, an early depth test unit (Early-Z test unit) ), pixel shader (Pixel Shader), post depth test unit (Late-Z test unit), output unit.
  • IA input assembler
  • VS vertex shader
  • PA primitive assembler
  • rasterizer rasterizer
  • Early-Z test unit early depth test unit
  • pixel shader Pixel Shader
  • Post depth test unit Late-Z test unit
  • the above rendering pipelines are only examples, and are not limited to the above description.
  • the rendering pipeline can also contain other units or modules.
  • the logical order of the above-mentioned units or modules in the rendering pipeline is also not limited to the example in FIG. 3 , but may be changed as required.
  • Each of the above-mentioned units or modules may be implemented in a separately designed fixed function processor in the GPU 103, or may be implemented by executing a specific program in the processing core of the GPU 103.
  • a vertex shader may be implemented in a separately designed fixed-function processor in GPU 103, or may be implemented by executing shader programs in processing cores in GPU 103.
  • the input assembler can also be implemented in a separately designed fixed-function processor in the GPU 103, or by executing specific programs in the processing cores of the GPU 103.
  • vertex buffer (Vertex Buffer, VB), which is used to receive vertex data from the system memory 101, and then transmit the vertex data to the input assembler.
  • vertex buffers are stored in graphics memory 204 on GPU 103.
  • a cache memory (not shown) may also be provided between the graphics memory 204 and the processing core array 203.
  • Vertex buffers may also be stored in caches or other storage areas accessible by processing cores (2031-2036).
  • depth buffer (DB), which may be stored in, for example, graphics memory 204 shown in FIG. 2, or in a cache (not shown) or other storage area accessible by the processing cores.
  • GPU 103 receives commands (eg, draw-call commands) and/or data (including rendering state, such as materials, textures, shader programs, etc.), etc., from, for example, CPU 102, as well as from external system memory 101 receives vertex data, performs primitive rendering according to commands (such as draw-call commands), and finally generates an output image on the display screen.
  • commands eg, draw-call commands
  • data including rendering state, such as materials, textures, shader programs, etc.
  • external system memory 101 receives vertex data, performs primitive rendering according to commands (such as draw-call commands), and finally generates an output image on the display screen.
  • the Input Assembler receives vertex data (vertex coordinates and indices) from the vertex buffer for assembly into geometric primitives (eg, triangles, lines, etc.).
  • the vertex shader determines the attributes of the vertex (lighting, color, etc.) and provides the shaded vertex data to the primitive assembler.
  • Primitive assemblers generate primitives through operations such as clipping, perspective segmentation, and viewport transformations.
  • a rasterizer is used to combine the primitives generated by the Primitive Assembler (PA) into on-screen pixels representing the corresponding primitives.
  • Pixel shaders determine the color of individual pixels by executing pixel shader instructions.
  • the GPU 103 invokes the Draw-Call command for rendering, due to the rendering order of the rendered objects, the later rendered objects may be occluded by the earlier rendered objects.
  • a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered.
  • each pixel stores a corresponding depth value.
  • the depth value of the pixel is compared with the depth value in the current depth buffer. If it is greater than or equal to the depth value in the depth buffer, the pixel is considered to be occluded, so the pixel is discarded; otherwise, the pixel will be discarded.
  • the depth value corresponding to the pixel is written into the depth buffer to update the depth value in the depth buffer.
  • a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test).
  • the early depth test (Early-Z test) and the later depth test (Late-Z test) are completed in the Early-Z test unit and the Late-Z test unit respectively.
  • the corresponding depth test is performed, the reference value is read from the depth buffer to perform the depth test, and if the test passes, the depth value of the passed pixel is written into the depth buffer to update the depth buffer.
  • conservative depth testing it involves advance depth testing and late depth testing, which are completed in the Early-Z testing unit and the Late-Z testing unit as shown in Figure 3, respectively.
  • the pixel processing method of the embodiments of the present application aims to solve the problem of incoherence of draw-calls in the prior art through an improved method, thereby improving the performance of a computer graphics processing system.
  • the pixel processing method proposed by the embodiments of the present application can be applied to, for example, a rendering process of a computer graphics processing system.
  • the pixel processing methods of the embodiments of the present application can be applied, for example, in the pixel shading and depth testing stages after rasterization of the rendering pipeline, to improve the performance of draw-calls in the pixel processing stages (eg, including pixel shading and depth testing). Incoherent problem.
  • FIG. 4 is a schematic diagram of a rendering pipeline in the GPU 103 implementing the pixel processing method according to the embodiment of the present application.
  • a control unit and a mark buffer are set, and the control unit can be a hardware unit set separately in the GPU 103, or a module implemented by executing a computer program in the processing core in the GPU 103.
  • the marker buffer may be stored separately on graphics memory 204 of GPU 103.
  • step S10 after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain corresponding to the pixel area associated with the current draw-call is 0. If so, it indicates that pixel shading and depth testing for this current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
  • the depth test mode here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test.
  • step S11 the control unit sets the value of the PipeNeedDrain mark of the associated pixel area according to the depth test mode of the current draw-call, to control whether the subsequent draw-call after the current draw-call can be used for the pixel. Regions perform pixel shading and depth testing. For example, the control unit writes the set mark for the pixel area associated with the current draw-call into the mark buffer, so as to control whether the subsequent draw-call after the current draw-call can perform pixel shading and depth for the pixel area buffer.
  • the depth test mode of the current draw-call is the Early-Z test mode, keep the previous mark PipeNeedDrain as 0 for its associated pixel area, that is, indicating that the pixel area of the pixel area can continue to perform the pixel coloring of the subsequent draw-call and depth testing.
  • the depth test mode of the current draw-call is conservative depth test or Late-Z test mode, set the flag PipeNeedDrain to 1 for its associated pixel area, that is, to indicate that for this pixel area, the subsequent draw after the current draw-call -call cannot perform pixel shading and depth testing until the flag PipeNeedDrain for that pixel area is cleared.
  • step S12 for the pixel area, pixel shading and depth testing of the current draw-call are performed.
  • step S13 after the pixel coloring and depth test of the current draw-call are completed, the control unit clears the PipeNeedDrain mark corresponding to the pixel area, that is, the pixel coloring and depth of the subsequent draw-call can continue to be performed for this pixel area. test.
  • the depth test mode of the subsequent draw-call is the Late-Z test
  • the subsequent draw-call can continue to perform pixel shading and depth testing, There is no need to wait for the mark PipeNeedDrain corresponding to the pixel area to be cleared.
  • the specific flow of the current draw-call depth test mode in the pixel processing method of FIG. 5 is the Early-Z test, the conservative depth test, and the Late-Z test, respectively, with reference to FIGS. 6-8 .
  • Figure 6 is directed to the case where the current draw-call depth test mode is the Early-Z test.
  • step S110 after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
  • the control unit resets the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call according to the depth test mode of the current draw-call.
  • the depth test mode of the current draw-call is the Early-Z test, then, as shown in step S111, the value of PipeNeedDrain, the marker of the pixel area associated with the current draw-call, is kept as 0, that is, it indicates that for this pixel area, you can continue to execute Pixel shading and depth testing for subsequent draw-calls.
  • step S112 the depth test of the current draw-call, that is, the Early-Z test is performed.
  • the Early-Z test fails, cull the current draw-call primitive. If the Early-Z test of the current draw-call passes and the corresponding depth buffer is updated, as shown in step S113, the pixel shading of the current draw-call is performed.
  • step S114 the control unit clears the value of the mark PipeNeedDrain of the pixel area to zero.
  • Figure 7 is for the case where the current draw-call depth test mode is conservative depth test.
  • step S120 after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
  • control unit resets the value of PipeNeedDrain, a marker of the pixel area associated with the current draw-call, according to the depth test mode of the current draw-call, to control whether subsequent draw-calls after the current draw-call can respond to the current draw-call.
  • the associated pixel region performs pixel shading and depth testing.
  • step S121 according to the depth test mode of the current draw-call is the conservative-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed, but only pixel shading and depth testing of the current draw-call can be performed.
  • step S122 the Early-Z test of the current draw-call is performed.
  • step S123 the pixel coloring of the current draw-call is performed.
  • the current draw-call depth test mode is a conservative depth test
  • the Late-Z test will be performed later, so here the depth buffer update will not be performed temporarily after the Early-Z test of the current draw-call passes, but the Late-Z test will be performed later.
  • -Z test pass before depth buffer update.
  • step S124 After the current draw-call's pixel shading, perform the current draw-call's Late-Z test. After the Late-Z test is passed and the depth buffer is updated, as shown in step S124, the flag PipeNeedDrain of the associated pixel area is cleared to zero, that is, for this pixel area, pixel coloring and depth testing of subsequent draw-calls can be continued.
  • the subsequent draw-call can directly continue to perform pixel shading and Late-Z testing, while There is no need to wait for the PipeNeedDrain corresponding to the pixel area to be cleared.
  • Figure 8 is for the case where the current draw-call depth test mode is the Late-Z test.
  • step S130 after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
  • control unit resets the value of PipeNeedDrain, a marker of the pixel area associated with the current draw-call, according to the depth test mode of the current draw-call, to control whether subsequent draw-calls after the current draw-call can respond to the current draw-call.
  • the associated pixel region performs pixel shading and depth testing.
  • step S131 according to the depth test mode of the current draw-call is the Late-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed.
  • step S132 the pixel rendering of the current draw-call is performed; after the pixel is rendered, its Late-Z test is performed. After the Late-Z test is passed and the corresponding depth buffer update is performed, as shown in step S133, the control unit clears the mark PipeNeedDrain of the corresponding pixel area to zero.
  • the screen is divided into 16 pixel areas t0-t15.
  • six draw-call commands need to be executed, which are d1 to d6 in the rendering order.
  • the depth test mode of d3 and d4 is late-Z mode, and the rest of d1-d2 and d5-d6 are Early-Z mode.
  • the division of the pixel area is exemplary, and the screen may be divided into more pixel areas or less pixel areas.
  • a marker is set for each pixel area, for example, the initial value of PipeNeedDrain is 0, that is, each pixel area can perform draw-call pixel coloring and depth testing.
  • the values of the labels PipeNeedDrain corresponding to all pixel regions are the initial value 0.
  • the value of the mark PipeNeedDrain corresponding to each pixel area is stored in, for example, a mark buffer, for example, a storage area may be allocated to the mark buffer in a memory or a cache.
  • the depth test of draw-call mentioned here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test. It depends on the depth test mode of the specific draw-call.
  • the control unit reads the depth buffer to determine that the mark PipeNeedDrain of its corresponding pixel area t9 is 0, then it can be executed normally Pixel shading and depth testing for d3.
  • the PipeNeedDrain flag of the pixel area t9 corresponding to d3 is set to 1.
  • the value 1 of the flag PipeNeedDrain indicates that, for this pixel region t9, at least pixel shading and depth testing of subsequent draw-calls following draw-call d3 are disabled.
  • the pixel shading and depth testing of the subsequent draw-call for the pixel region t9 can be performed after the PipeNeedDrain flag of the pixel region t9 is cleared to zero.
  • the object of draw-call d5 is to cover the pixel areas t6, t7, t12 and t13.
  • the marking PipeNeedDrain of each pixel area t6, t7, t12 and t13 corresponding to it is determined. For example, if the flags of the t6 and t7 regions are 0, the pixel coloring and depth test of the d5 of the t6 and t7 regions can be directly performed.
  • the object of the current draw-call d6 should cover t8, t10, and t11, and the corresponding markers PipeNeedDrain are all 0, so the current draw-call d6 can be executed directly.
  • the affected pixel area is marked by setting a mark, and the pixel coloring and depth testing Disabling is constrained to the affected pixel area, while for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing.
  • the draw-call is improved. The coherence and parallelism of command execution minimize the impact on the parallel computing capability of the GPU.
  • the methods in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer program codes or computer program instructions, which may be stored on a memory.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer program code or computer program instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program code or computer program instructions may be stored from a computer-readable storage medium. Transmission from one website site, computer, server or data center to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, etc.) or wireless (eg infrared, radio, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; an optical medium, such as a DVD; or a semiconductor medium, such as a solid state disk (Solid State Disk, SSD), and the like.
  • a magnetic medium such as a floppy disk, a hard disk, and a magnetic tape
  • an optical medium such as a DVD
  • a semiconductor medium such as a solid state disk (Solid State Disk, SSD), and the like.
  • the methods of the present application may include various other operations and/or variations of the operations shown. Likewise, the sequence of operations of the flowchart may be modified. It should be understood that not all operations in the flowcharts may be performed. In various embodiments, one or more operations of the method may be controlled or managed by software, firmware, hardware, or any combination thereof, without limitation.
  • a method may include the processes of an embodiment of the present disclosure, which may be controlled or managed by a processor and/or electronic components under the control of computer or computing device readable and executable instructions (or code).
  • the graphics processing system 120 implementing the pixel processing method according to any of the above embodiments of the present application will be described in detail below with reference to FIG. 10 .
  • the graphics processing system 120 includes: a determination unit 121 , a pixel processing unit 122 and a setting unit 123 .
  • the setting unit 123 is configured to: after the rasterization of the current draw-call (draw-call), according to the depth test mode of the current draw-call, set The value of the flag of the pixel region associated with the current draw-call, indicating whether pixel shading and depth testing for subsequent draw-calls following the current draw-call are disabled for the associated pixel region, and in the current draw-call After the pixel shading and depth testing of the draw-call are completed, the value of the flag of the associated pixel area is cleared; the pixel processing unit 123 is configured to perform the pixel shading and depth testing of the current draw-call.
  • the depth test mode of the current draw-call is the advance depth test
  • the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is not disabled for the associated pixel area; the pixel processing unit 122 is configured to: perform an early depth test for the current draw-call, and after the early depth test passes and a corresponding depth buffer is updated, perform pixel shading for the current draw-call.
  • the depth test mode of the current draw-call is conservative depth test
  • the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is disabled for the associated pixel area;
  • the pixel processing unit is further configured It is used for: executing the advance depth test of the current draw-call, after passing the advance depth test, executing the pixel coloring of the current draw-call, after the pixel coloring of the current draw-call, executing the Describe the post depth test of the current draw-call.
  • the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the mark of the pixel area associated with the current draw-call indicates that the pixel coloring and the depth test of the subsequent draw-call after the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to use To: Execute pixel shading and post-depth testing of the current draw-call.
  • the current draw-call is associated with a plurality of pixel regions
  • the current draw-call when executing the pixel shading and depth test of the current draw-call, according to the depth test mode of the current draw-call, Sets the value of this flag for each pixel region separately, indicating whether pixel shading and depth testing for subsequent draw-calls are disabled for that pixel region; and after pixel shading and depth testing for the current draw-call for each pixel region, respectively, is complete , the value of the marker of the pixel area is cleared.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

Provided is a pixel processing method, comprising: after the current draw-call command is rasterized, according to a depth test mode of the current draw-call, setting the value of a flag of a pixel area associated with the current draw-call, so as to indicate whether pixel shading and depth tests of a subsequent draw-call following the current draw-call are disabled for the associated pixel area; executing pixel shading and depth tests of the current draw-call; and after the pixel shading and depth tests of the current draw-call are completed, clearing the value of the flag of the associated pixel area to zero. On the basis of the method, when the current draw-call in a specific depth test mode affects a depth test of a subsequent draw-call, by means of providing a flag for a specific pixel area, the disabling of pixel shading and depth tests is limited to an affected pixel area, and for the remaining pixel areas on which disabling is not performed, the subsequent draw-call can continue to execute pixel shading and depth tests, such that the consistency of draw-call command execution is improved, and influence on the parallel operation capability of a GPU is reduced to the greatest extent.

Description

像素处理方法以及图形处理器Pixel processing method and graphics processor 技术领域technical field
本申请涉及图形处理器(graphics Processing Unit,GPU)技术领域,尤其涉及在图形处理器中执行的像素处理方法。The present application relates to the technical field of graphics processing units (GPUs), and in particular, to a pixel processing method executed in a graphics processor.
背景技术Background technique
当前计算机图形系统中,在图形处理器GPU中通过执行图形渲染管线将虚拟3D对象渲染在2D显示屏上。具体而言,GPU从例如CPU接收命令(例如绘制调用命令,即draw-call命令)和/或数据(包括渲染状态,例如绘制对象的材质、纹理、着色器等)等,以及从外部系统存储器(未示出)接收顶点数据,根据命令、例如绘制调用(draw-call)命令进行图元渲染,最后生成显示屏上的输出图像。In current computer graphics systems, virtual 3D objects are rendered on a 2D display screen by executing a graphics rendering pipeline in a graphics processor GPU. Specifically, the GPU receives commands (eg, draw-call commands, i.e., draw-call commands) and/or data (including rendering state, such as materials, textures, shaders, etc. of the drawn object), etc., from, for example, the CPU, as well as from external system memory (not shown) receives vertex data, renders primitives according to commands, such as draw-call commands, and finally generates an output image on the display screen.
在GPU调用Draw-Call命令进行渲染时,由于渲染对象的渲染顺序,后面渲染的对象可能被前面渲染的对象所遮挡。在图形渲染管线中,在光栅化器之后,要针对像素数据执行深度测试来剔除被遮挡的不再需要渲染的图元或者像素。一般而言,一个draw-call命令配置有对应的深度测试模式:例如,提前深度测试(Early-Z测试)、后期深度测试(Late-Z测试)或者保守深度测试(Conservative-Z测试)。大部分场景下的深度测试可以在像素着色(Pixel-Shader)之前进行,没有被丢弃的像素再进入像素着色进行渲染,这就是所谓的提前深度测试(Early Z测试)。还有一部分场景,比如其深度值受到像素着色影响,深度测试需要在像素着色之后再进行,这就是所谓的后深度测试(Late-Z测试)。还有一部分情况,虽然深度值受到像素着色影响,但是能够提前知道其影响始终朝着一个方向变化,例如像素着色之后的深度值跟原始值相比总是会变大,这时可以在像素着色之前进行提前深度测试(Early-Z测试),在像素着色之后,再进行后期深度测试(Late-Z测试),这称为是保守深度测试(Conservative-Z测试)。When the GPU invokes the Draw-Call command for rendering, due to the rendering order of the rendered objects, the later rendered objects may be occluded by the earlier rendered objects. In the graphics rendering pipeline, after the rasterizer, a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered. Generally speaking, a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test). The depth test in most scenes can be performed before pixel-shader, and the pixels that have not been discarded enter pixel-shader for rendering. This is the so-called early depth test (Early Z test). There are also some scenes, such as the depth value of which is affected by pixel shading, and the depth test needs to be performed after pixel shading, which is the so-called post-depth test (Late-Z test). In some cases, although the depth value is affected by pixel shading, it can be known in advance that its influence always changes in one direction. For example, the depth value after pixel shading will always be larger than the original value. Before performing an early depth test (Early-Z test), after pixel shading, a post-depth test (Late-Z test) is performed, which is called a conservative depth test (Conservative-Z test).
现有技术中,某些深度测试模式的draw-call、例如Late-Z测试或者Conservative-Z测试的draw-call会影响后面的draw-call的深度测试。例如,对于Late-Z测试或者Conservative-Z测试的draw-call,当这些draw-call还未完成深度测试和深度缓冲更新(即,对深度缓冲的读写)时,后面的draw-call(例如Early-Z的draw-call)暂停进行深度测试,否则会导致深度缓冲出现不一致性。这显然造成draw-call执行的连贯性变差,导致GPU性能严重下降。In the prior art, draw-calls in certain depth test modes, such as draw-calls in Late-Z testing or Conservative-Z testing, may affect the depth testing of subsequent draw-calls. For example, for the draw-calls of the Late-Z test or the Conservative-Z test, when these draw-calls have not completed the depth test and the depth buffer update (ie, read and write to the depth buffer), the subsequent draw-calls (such as Early-Z draw-call) suspends the depth test, otherwise it will cause inconsistencies in the depth buffer. This obviously causes the coherence of draw-call execution to become worse, resulting in serious degradation of GPU performance.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供像素处理方法,用于至少解决现有技术中的上述缺点之一。Embodiments of the present application provide a pixel processing method for at least solving one of the above-mentioned shortcomings in the prior art.
根据本申请的第一方面,提供一种像素处理方法,包括:在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;执行所述当前draw-call的像素着色和深度测试;以及在所述当前draw-call的像素着色和深度测试完成之后,将所述关联的像 素区域的标记的值清零。According to a first aspect of the present application, there is provided a pixel processing method, comprising: after rasterization of a current draw-call command (draw-call), according to a depth test mode of the current draw-call, setting The value of the flag of the -call associated pixel area indicating whether pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; perform pixel shading for the current draw-call and depth testing; and after pixel shading and depth testing of the current draw-call are completed, clear the value of the flag of the associated pixel region to zero.
基于上述方法,在特定深度测试模式的当前draw-call会对后续draw-call的深度测试造成影响的情况下,将与所述当前draw-call关联的像素区域设置标记(PipeNeedDrain),将后续draw-call的像素着色和深度测试的禁用约束在受影响的像素区域,而对于未被禁用的其余像素区域,后续draw-call可以继续执行像素着色和深度测试,从而提升draw-call命令执行的连贯性,尽可能减少对GPU并行运算能力的影响。Based on the above method, in the case that the current draw-call of a specific depth test mode will affect the depth test of the subsequent draw-call, the pixel area associated with the current draw-call will be marked (PipeNeedDrain), and the subsequent draw The disabling of pixel shading and depth testing of -call is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing, thereby improving the coherence of draw-call command execution. to minimize the impact on the parallel computing capability of the GPU.
可选地,深度测试模式是提前深度测试,则所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。Optionally, if the depth testing mode is advance depth testing, the method further includes: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area does not disable pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, performing pixel shading and depth testing of the current draw-call includes: performing the current draw-call Call's depth-ahead test, after the depth-ahead test is passed and the corresponding depth buffer is updated, the pixel shading of the current draw-call is performed.
可选地,深度测试模式是保守深度测试,则所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过之后,执行所述当前draw-call的像素着色,在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。Optionally, the depth test mode is conservative depth test, the method further comprises: after rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the current draw-call In the advanced depth test, after the advanced depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.
可选地,深度测试是后期深度测试,则所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的像素着色,以及执行所述当前draw-call的后期深度测试。Optionally, the depth test is a later depth test, the method further comprises: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the pixels of the current draw-call shading, and perform a post-depth test of the current draw-call.
根据本申请的第二方面,提供一种图形处理器,包括存储器,用于存储指令,以及处理单元,配置成在执行指令时执行上述任一方法。According to a second aspect of the present application, there is provided a graphics processor including a memory for storing instructions, and a processing unit configured to perform any of the above methods when executing the instructions.
根据本申请的第三方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储了程序代码,所述程序代码被计算机或处理器执行时,实现上述任一方法。According to a third aspect of the present application, a computer-readable storage medium is provided, where program codes are stored in the computer-readable storage medium, and when the program codes are executed by a computer or a processor, any one of the above methods is implemented.
根据本申请的第四方面,提供一种计算机程序产品,所述计算机程序产品包含的程序代码被计算机或处理器执行时,实现上述任一方法。According to a fourth aspect of the present application, a computer program product is provided. When the program code included in the computer program product is executed by a computer or a processor, any of the above methods can be implemented.
根据本申请的第五方面,提供一种图形处理系统,包括:设置单元,配置成用于在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试,以及在所述当前draw-call的像素着色和深度测试完成之后,将所述关联的像素区域的标记的值清零;以及像素处理单元,配置成用于执行所述当前draw-call的像素着色和深度测试。According to a fifth aspect of the present application, there is provided a graphics processing system, comprising: a setting unit configured to, after rasterization of a current draw-call (draw-call), test a mode according to the depth of the current draw-call , setting the value of the flag of the pixel area associated with the current draw-call, indicating whether pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for the associated pixel area, and in all After the pixel coloring and the depth test of the described current draw-call are completed, the value of the mark of the associated pixel area is cleared to zero; And the pixel processing unit is configured to perform the pixel coloring and the depth test of the current draw-call .
可选地,深度测试模式是提前深度测试,则所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度 测试;所述像素处理单元配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。Optionally, if the depth test mode is an advance depth test, the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are not disabled for the associated pixel region; the pixel processing unit is configured to: execute an advance of the current draw-call Depth test, after the advance depth test passes and the corresponding depth buffer is updated, pixel shading for the current draw-call is performed.
可选地,深度测试模式是保守深度测试,则所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过之后,执行所述当前draw-call的像素着色,在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。Optionally, if the depth test mode is conservative depth test, the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel region; the pixel processing unit is further configured to: execute an advance of the current draw-call In the depth test, after the advance depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.
可选地,深度测试是后期深度测试,则所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的像素着色和后期深度测试。Optionally, the depth test is a later depth test, and the setting unit is further configured to: after the rasterization of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call , indicating that the pixel shading and depth testing of subsequent draw-calls after the current draw-call is disabled for the associated pixel area; the pixel processing unit is further configured to: execute the pixel shading of the current draw-call and Later in-depth testing.
根据本申请的第六方面,提供一种图形处理系统,包括:控制单元,配置成用于:在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;在对于该像素区域的和深度测试完成之后,将该像素区域的该标记的值清零;像素着色器,配置成用于:执行当前draw-call的像素着色;深度测试单元,配置成用于:执行当前draw-call的深度测试;以及标记缓冲,配置成用于:存储所述关联的像素区域的标记的值。According to a sixth aspect of the present application, there is provided a graphics processing system, comprising: a control unit configured to: after rasterization of a current draw-call (draw-call), perform a depth test according to the current draw-call mode, set the value of the marker of the pixel area associated with the current draw-call, indicating whether to disable the pixel shading and depth test of the subsequent draw-call after the current draw-call for the associated pixel area; After the summation and depth testing of the pixel area is completed, the value of the mark in the pixel area is cleared to zero; the pixel shader is configured to: execute pixel shading of the current draw-call; the depth test unit is configured to: execute a depth test of the current draw-call; and a marker buffer configured to: store the value of the marker of the associated pixel region.
可选地,深度测试模式是提前深度测试,则所述控制单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述深度测试单元包括提前深度测试单元,配置成用于:执行所述当前draw-call的提前深度测试;所述像素着色器配置成用于:在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。Optionally, if the depth test mode is an advance depth test, the control unit is further configured to: after the rasterization of the current draw-call, set the marking of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are not disabled for the associated pixel area; the depth testing unit includes an advanced depth testing unit configured to: execute the A look-ahead depth test for the current draw-call; the pixel shader is configured to perform pixel shading for the current draw-call after the look-ahead depth test passes and a corresponding depth buffer is updated.
可选地,深度测试模式是保守深度测试,则所述控制单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述深度测试单元包括提前深度测试单元和后期深度测试单元,所述提前深度测试单元配置成用于:执行所述当前draw-call的提前深度测试;所述像素着色器配置成用于:在所述提前深度测试通过之后,执行所述当前draw-call的像素着色;所述后期深度测试单元配置成用于:在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。Optionally, the depth test mode is conservative depth test, then the control unit is further configured to: after the rasterization of the current draw-call, set the flag of the pixel area associated with the current draw-call. value, indicating that the pixel shading and depth test of the subsequent draw-call after the current draw-call is disabled for the associated pixel area; the depth test unit includes an advance depth test unit and a later depth test unit, the advance depth The testing unit is configured to: perform an advance depth test of the current draw-call; the pixel shader is configured to: after the advance depth test is passed, perform pixel shading of the current draw-call; The post-depth testing unit is configured to perform post-depth testing of the current draw-call after pixel shading of the current draw-call.
可选地,深度测试是后期深度测试,则所述控制单元进一步配置成用于:在所述当前绘制调用命令(draw-call)的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述深度测试单元包括后期深度测试单元,所述像素着色器配置成用于:执行所述当前draw-call的像素着色;所述后期深度测试单元配置成用于执行后期深度测试。Optionally, the depth test is a later depth test, and the control unit is further configured to: after the rasterization of the current draw-call (draw-call), set the pixels associated with the current draw-call The value of the flag of the region, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for this associated pixel region; the depth testing unit includes a later depth testing unit, and the pixel shader configures is configured to: perform pixel shading of the current draw-call; and the post-stage depth testing unit is configured to perform post-stage depth testing.
基于本申请实施例提出的方法,在特定深度测试模式的当前draw-call对后续draw-call的深度测试造成影响的情况下,通过针对受影响的像素区域设置标记,将后续draw-call的像素着色和深度测试的禁用约束在受影响的像素区域,而对于未被禁用的其余像素区域,后续draw-call可以继续执行像素着色和深度测试。相比现有技术中针对上述情况要等到该特定深度测试模式的draw-call流水线清空、或者是该特定深度测试模式的draw-call的像素着色和深度测试完成之后才能执行后续draw-call的像素着色和深度测试的情形相比,提升draw-call命令执行的连贯性,尽可能减少对GPU并行运算能力的影响。Based on the method proposed in the embodiments of the present application, in the case where the current draw-call of a specific depth test mode affects the depth test of the subsequent draw-call, the pixels of the subsequent draw-call are marked by setting a mark for the affected pixel area. The disabling of shading and depth testing is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing. Compared with the prior art for the above-mentioned situation, the pixels of the subsequent draw-call cannot be executed until the draw-call pipeline of the specific depth test mode is emptied, or the pixel coloring and depth test of the draw-call of the specific depth test mode are completed. Compared with the case of shading and depth testing, the continuity of the execution of the draw-call command is improved, and the impact on the parallel computing capability of the GPU is minimized.
附图说明Description of drawings
图1为实施本申请实施例的计算装置的示意图。FIG. 1 is a schematic diagram of a computing device implementing an embodiment of the present application.
图2为实施本申请实施例的GPU的示意图。FIG. 2 is a schematic diagram of a GPU implementing an embodiment of the present application.
图3为图2的GPU中实现的一种渲染管线的示例。FIG. 3 is an example of a rendering pipeline implemented in the GPU of FIG. 2 .
图4为实施本申请实施例的像素处理方法的GPU中实现的一种渲染管线的示例。FIG. 4 is an example of a rendering pipeline implemented in a GPU implementing the pixel processing method according to the embodiment of the present application.
图5为本申请实施例提供的像素处理方法的流程图。FIG. 5 is a flowchart of a pixel processing method provided by an embodiment of the present application.
图6为本申请实施例提供的像素处理方法在Early-Z模式下的流程图。FIG. 6 is a flowchart of a pixel processing method provided in an embodiment of the present application in an Early-Z mode.
图7为本申请实施例提供的像素处理方法在保守深度测试模式下的流程图。FIG. 7 is a flowchart of a pixel processing method provided by an embodiment of the present application in a conservative depth test mode.
图8为本申请实施例提供的像素处理方法在Late-Z模式下的流程图。FIG. 8 is a flowchart of a pixel processing method provided in an embodiment of the present application in a Late-Z mode.
图9为实施本申请实施例提供的像素处理方法的一种实现场景的示意图。FIG. 9 is a schematic diagram of an implementation scenario for implementing the pixel processing method provided by the embodiment of the present application.
图10是实现本申请实施例的像素处理方法的一种图形处理系统的示意图。FIG. 10 is a schematic diagram of a graphics processing system implementing the pixel processing method according to an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图和实施例,对本申请提供的技术方案作详细说明。应理解,本申请实施例中提供的系统结构和应用场景主要是为了解释本申请的技术方案的一些可能的实施方式,不应被解读为对本申请的技术方案的唯一限定。本领域普通技术人员可以知晓,随着系统的变化以及更新的应用场景的出现,本申请提供的技术方案仍然可以适用。The technical solutions provided by the present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the system structures and application scenarios provided in the embodiments of the present application are mainly to explain some possible implementations of the technical solutions of the present application, and should not be interpreted as the only limitations on the technical solutions of the present application. Those of ordinary skill in the art can know that with the changes of the system and the emergence of updated application scenarios, the technical solutions provided in this application are still applicable.
本申请实施例及附图中的术语“第一”、“第二”以及“第三”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于表示不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必仅限于字面列出的那些步骤或单元,而是可包括没有字面列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", and "third" in the embodiments of the present application and the drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to mean non-exclusive inclusion, eg, the inclusion of a series of steps or elements. A method, system, product or device is not necessarily limited to those steps or elements literally listed, but may include other steps or elements not literally listed or inherent to the process, method, product or device.
应理解,在本申请中,步骤的序号的大小并不意味着执行顺序的先后,各步骤的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in this application, the size of the sequence numbers of the steps does not mean the sequence of execution, and the execution sequence of each step should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
本申请实施例提供的方法例如在计算装置中实施。该计算装置的总体架构如图1所示。The methods provided by the embodiments of the present application are implemented, for example, in a computing device. The overall architecture of the computing device is shown in FIG. 1 .
参照图1,其示出了经配置以实现本申请实施例的一个或多个方面的计算装置100。该计算装置100可以包括但不限于以下各项:个人计算机,例如膝上型计算机、桌上型计算机、平板计算装置等,还可以是无线装置、移动电话(包括智能电话)、个人数字助理(PDA)、视频游戏控制台(包含视频显示器、移动视频游戏装置、移动视频会议单元)、电视机顶盒、车内智能系统、智能可穿戴设备、电子书阅读器、固定或移动媒体播放器等。Referring to FIG. 1, a computing device 100 configured to implement one or more aspects of the embodiments of the present application is shown. The computing device 100 may include, but is not limited to, the following: personal computers, such as laptop computers, desktop computers, tablet computing devices, etc., but also wireless devices, mobile phones (including smart phones), personal digital assistants ( PDA), video game consoles (including video monitors, mobile video game devices, mobile video conferencing units), TV set-top boxes, in-car intelligent systems, intelligent wearable devices, e-book readers, fixed or mobile media players, etc.
在图1的实施例中,计算装置100可以包括中央处理单元(CPU)102和经由例如存储器桥104进行通信的系统存储器101。存储器桥104可以是例如北桥芯片,经由总线或者其他通信路径112(例如超传输链路)连接到I/O(输入/输出)桥105。I/O桥105可以是例如南桥芯片,其从一个或多个输入设备107(例如键盘、鼠标、轨迹球、显示设备的触摸屏或其他类型的输入装置)接收用户输入并且经由通信路径112和存储器桥104将用户输入转发到CPU 102。图形处理器(GPU)103经由总线或其他通信路径112(例如PCI Express、加速图形端口或超传输链路)耦合到存储器桥104而与CPU 102和系统存储器101进行通信。在一个实施例中,GPU 103可以执行图形处理操作来生成像素数据并且将像素数据传递到显示设备110。In the embodiment of FIG. 1 , computing device 100 may include a central processing unit (CPU) 102 and system memory 101 in communication via, for example, a memory bridge 104 . The memory bridge 104 may be, for example, a Northbridge chip connected to the I/O (input/output) bridge 105 via a bus or other communication path 112 (eg, a hypertransport link). I/O bridge 105 may be, for example, a south bridge chip that receives user input from one or more input devices 107 (eg, a keyboard, mouse, trackball, touch screen of a display device, or other type of input device) and communicates via communication path 112 and Memory bridge 104 forwards user input to CPU 102. Graphics processing unit (GPU) 103 is coupled to memory bridge 104 via a bus or other communication path 112 (eg, PCI Express, Accelerated Graphics Port, or HyperTransport Link) to communicate with CPU 102 and system memory 101. In one embodiment, GPU 103 may perform graphics processing operations to generate and communicate pixel data to display device 110.
系统盘106也连接到I/O桥105。计算装置100还可以包括其他组件(未明确示出),例如USB或者其他端口连接、CD驱动器、DVD驱动器及类似组件,这些组件也可以连接到I/O桥105。将图1中各种组件互连的通信路径可以使用任何适合的协议来实现,诸如PCI(外围组件互连)、PCI-Express、AGP(加速图形端口)、超传输或者任何其他总线或点到点通信协议,并且不同设备之间的连接可以使用本领域已知的不同协议。The system disk 106 is also connected to the I/O bridge 105 . Computing device 100 may also include other components (not explicitly shown), such as USB or other port connections, CD drives, DVD drives, and the like, which may also be connected to I/O bridge 105 . The communication paths interconnecting the various components in Figure 1 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point point communication protocols, and connections between different devices may use different protocols known in the art.
图1所示的计算装置100的配置仅是一种示例,本领域技术人员可以理解,可以存在其他配置的计算装置100。应该理解,变化和修改都是可能的。连接拓扑,例如,桥的数量和布置、CPU的数量以及GPU的数量可以根据需要来修改。在其他实施例中,计算装置100可以包括两个或两个以上的CPU 102以及两个或者两个以上的GPU 103。The configuration of the computing device 100 shown in FIG. 1 is only an example, and those skilled in the art can understand that there may be other configurations of the computing device 100 . It should be understood that variations and modifications are possible. The connection topology, eg, the number and arrangement of bridges, the number of CPUs, and the number of GPUs can be modified as needed. In other embodiments, the computing device 100 may include two or more CPUs 102 and two or more GPUs 103.
在一个实施例中,GPU 103包含经优化用于图形和视频处理的电路,包括例如,视频输出电路。可以将GPU 103与一个或多个其他组件、诸如存储器桥104、CPU 102以及I/O桥105集成而形成片上系统(SOC)。In one embodiment, GPU 103 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. GPU 103 may be integrated with one or more other components, such as memory bridge 104, CPU 102, and I/O bridge 105, to form a system-on-chip (SOC).
图2示出了可以实现本申请实施例的方法的图1的计算装置100中的GPU 103的示意框图。在一个实施例中,GPU 103包括用于图形处理和视频处理的电路。FIG. 2 shows a schematic block diagram of the GPU 103 in the computing device 100 of FIG. 1 that can implement the methods of the embodiments of the present application. In one embodiment, GPU 103 includes circuitry for graphics processing and video processing.
GPU 103可以包括处理核阵列203,其可以包括多个处理核2031-2036。图4作为示例仅示出了6个处理核,本领域技术人员可以理解,处理核的数量可以变化。示出的处理核可以是例如通用处理核或者固定功能处理核。基于处理核阵列203中的多个通用处理核,GPU 103能够并发执行大量的程序任务或者计算任务。每个通用处理核可以被编程来执行与各种程序相关的处理任务,包括但不限于,图形渲染操作等。固定功能处理核可以包含经硬连线以执行某些特定功能的硬件。 GPU 103 may include a processing core array 203, which may include a plurality of processing cores 2031-2036. FIG. 4 only shows 6 processing cores as an example, and those skilled in the art can understand that the number of processing cores may vary. The processing cores shown may be, for example, general-purpose processing cores or fixed-function processing cores. Based on a plurality of general-purpose processing cores in the processing core array 203, the GPU 103 can concurrently execute a large number of program tasks or computing tasks. Each general-purpose processing core may be programmed to perform various program-related processing tasks, including, but not limited to, graphics rendering operations, and the like. A fixed function processing core may include hardware that is hardwired to perform certain specific functions.
在本申请实施例中,图形存储器204可以是GPU 103的一部分。GPU 103可以从图形存储器204读取数据或者将数据写入到图形存储器204中。也就是说,GPU 103可以使用本地存储装置而不是外部存储器来存储数据。在一些情况下,GPU 103也可以经由总线、例如通信路径112利用系统存储器101来读取和写入数据。图形存储器204可以包括一个或多个易失性或非易失性存储器或者存储装置,例如随机存取存储器(RAM)、静态RAM(SRAM)、动态RAM(DRAM)、可擦除可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)、快闪存储器、磁性数据存储装置或者光学存储装置等。In this embodiment of the present application, the graphics memory 204 may be a part of the GPU 103. GPU 103 may read data from or write data to graphics memory 204. That is, GPU 103 may use local storage instead of external memory to store data. In some cases, GPU 103 may also utilize system memory 101 via a bus, such as communication path 112, to read and write data. Graphics memory 204 may include one or more volatile or nonvolatile memories or storage devices, such as random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), erasable programmable ROM ( EPROM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic data storage devices or optical storage devices, and the like.
GPU 103可以经配置以执行下列各种操作:经由存储器桥104和总线、例如通信路径112从CPU 102和/或系统存储器101接收图形数据,进而对图形数据进行处理来生成像素 数据,与本地图形存储器204交互以存储和更新像素数据、传递像素数据到显示设备110等。 GPU 103 may be configured to perform various operations: receive graphics data from CPU 102 and/or system memory 101 via memory bridge 104 and a bus, such as communication path 112, process the graphics data to generate pixel data, communicate with local graphics Memory 204 interacts to store and update pixel data, communicate pixel data to display device 110, and the like.
在操作中,CPU 102是计算装置100的主处理器,其控制和协调其他组件的操作。具体地,CPU 102发出命令来控制GPU 103的操作。在一些实施例中,CPU 102将用于控制GPU 103的命令流写入到例如系统存储器101、图形存储器204、或CPU 102和GPU 103都可以访问的其他存储位置中。GPU 103读取命令流,可以相对于CPU 102的操作异步地执行命令。In operation, CPU 102 is the main processor of computing device 100, which controls and coordinates the operation of other components. Specifically, the CPU 102 issues commands to control the operation of the GPU 103. In some embodiments, CPU 102 writes a stream of commands for controlling GPU 103 to, for example, system memory 101, graphics memory 204, or other storage locations accessible to both CPU 102 and GPU 103. GPU 103 reads the command stream and can execute commands asynchronously relative to the operation of CPU 102.
如图2所示,GPU 103包括I/O(输入/输出)单元202,其经由连接到存储器桥104的通信路径112与计算装置100的其他组件通信。GPU 103到计算装置100的其他组件的连接也可以变化。在一些实施例中,GPU 103可以作为外插卡来实现,例如可以被插入到计算装置100的扩展槽中。As shown in FIG. 2 , GPU 103 includes an I/O (input/output) unit 202 that communicates with other components of computing device 100 via communication paths 112 connected to memory bridge 104. The connections of GPU 103 to other components of computing device 100 may also vary. In some embodiments, GPU 103 may be implemented as an add-in card, such as may be inserted into an expansion slot of computing device 100.
在一个实施例中,GPU 103连接到存储器桥104的通信路径112可以是PCI-EXPRESS链路。如本领域所知,在PCI-EXPRESS链路中专用通道被分配到GPU 103。I/O单元202从通信路径112接收所有传入的数据包(或者其他信号),将传入的数据包引导到GPU 103的适当组件,或者经由通信路径112传输数据包(或者其他信号)到GPU 103外部的组件。例如,I/O单元202可以将与处理任务有关的命令引导到调度器201,而将与存储器操作有关的命令(例如,对图形存储器204的读取或写入)引导到图形存储器204。In one embodiment, the communication path 112 connecting the GPU 103 to the memory bridge 104 may be a PCI-EXPRESS link. As is known in the art, dedicated lanes are allocated to GPU 103 in a PCI-EXPRESS link. I/O unit 202 receives all incoming data packets (or other signals) from communication path 112, directs incoming data packets to the appropriate components of GPU 103, or transmits data packets (or other signals) via communication path 112 to Components external to GPU 103 . For example, I/O unit 202 may direct commands related to processing tasks to scheduler 201 and commands related to memory operations (eg, reads or writes to graphics memory 204 ) to graphics memory 204 .
处理核阵列203可以从调度器201接收将要执行的处理任务。调度器201可以独立地调度任务以便由GPU 103的资源(例如,处理核阵列203的一个或多个处理核)来执行。在一个实施例中,调度器201可以是硬件处理器。在图2所示的实施例中,调度器201可以包含在GPU 103中。在其他实施例中,调度器201还可以是与CPU 102和GPU 103分离的单元。调度器201还可以被配置成接收命令和/或操作的流的任何处理器。The processing core array 203 may receive processing tasks to be executed from the scheduler 201 . Scheduler 201 may independently schedule tasks for execution by resources of GPU 103 (eg, one or more processing cores of processing core array 203). In one embodiment, scheduler 201 may be a hardware processor. In the embodiment shown in FIG. 2, the scheduler 201 may be included in the GPU 103. In other embodiments, scheduler 201 may also be a separate unit from CPU 102 and GPU 103. The scheduler 201 may also be configured as any processor that receives a stream of commands and/or operations.
在操作中,CPU 102借由图1系统存储器101所包含的GPU驱动程序可向调度器201发送包含待由GPU 103执行的一系列操作的命令流。调度器201可以通过I/O单元202接收包括命令流的操作流程且可以基于命令流中的操作次序依序来处理命令流的操作,且可调度命令流中的操作以便由处理核阵列230中的一个或多个处理核来执行。In operation, the CPU 102, via the GPU driver contained in the system memory 101 of FIG. 1, may send to the scheduler 201 a command stream containing a series of operations to be performed by the GPU 103. The scheduler 201 can receive the operation flow including the command stream through the I/O unit 202 and can process the operations of the command stream sequentially based on the order of operations in the command stream, and can schedule the operations in the command stream for processing by the processing core array 230. one or more processing cores to execute.
在实际中,每个通用处理核可以被编程来执行与各种程序相关的处理任务,包括但不限于,图形渲染管线中的各种操作(例如,顶点着色器和/或像素着色器程序,等)。In practice, each general-purpose processing core can be programmed to perform processing tasks associated with various programs, including, but not limited to, various operations in the graphics rendering pipeline (eg, vertex shader and/or pixel shader programs, Wait).
图3示出了GPU 103所实现的一种图形渲染管线的示例。FIG. 3 shows an example of a graphics rendering pipeline implemented by the GPU 103.
这里需要说明的是,图形渲染管线是利用处理核阵列所包括的处理核(例如通用处理核和/或固定功能处理核)通过级联形成的逻辑功能。GPU 103所包含的调度器201、图形存储器204、I/O单元202等是实现渲染管线的逻辑功能的外围电路或装置。举例来说,图形渲染管线通常包含可编程模块和固定功能模块,可编程模块由通用处理核来执行,固定功能模块由相应的固定功能处理核来实现。It should be noted here that the graphics rendering pipeline is a logical function formed by cascading processing cores (eg, general-purpose processing cores and/or fixed-function processing cores) included in the processing core array. The scheduler 201, the graphics memory 204, the I/O unit 202, etc. included in the GPU 103 are peripheral circuits or devices that implement the logical functions of the rendering pipeline. For example, a graphics rendering pipeline usually includes a programmable module and a fixed function module, the programmable module is executed by a general-purpose processing core, and the fixed function module is implemented by a corresponding fixed function processing core.
如图3所示,例如,GPU 103的渲染管线包括输入组装器(IA)、顶点着色器(VS)、图元组装器(PA)、光栅化器、提前深度测试单元(Early-Z测试单元)、像素着色器(Pixel Shader)、后期深度测试单元(Late-Z测试单元)、输出单元。As shown in FIG. 3, for example, the rendering pipeline of the GPU 103 includes an input assembler (IA), a vertex shader (VS), a primitive assembler (PA), a rasterizer, an early depth test unit (Early-Z test unit) ), pixel shader (Pixel Shader), post depth test unit (Late-Z test unit), output unit.
上述渲染管线仅是示例,不限于上述描述。渲染管线还可以包含其他单元或者模块。 上述各个单元或者模块在渲染管线中的逻辑顺序也不限于图3中的示例,而是可以根据需要而变化。The above rendering pipelines are only examples, and are not limited to the above description. The rendering pipeline can also contain other units or modules. The logical order of the above-mentioned units or modules in the rendering pipeline is also not limited to the example in FIG. 3 , but may be changed as required.
上述各个单元或者模块可以是在GPU 103中的单独设计的固定功能处理器中实现,也可以通过在GPU 103的处理核中执行特定的程序来实现。例如,顶点着色器可以是在GPU 103中的单独设计的固定功能的处理器中实现,也可以通过在GPU 103中的处理核中执行着色器程序来实现。类似地,输入组装器也可以在GPU 103中的单独设计的固定功能处理器中实现,也可以通过在GPU 103的处理核中执行特定的程序来实现。Each of the above-mentioned units or modules may be implemented in a separately designed fixed function processor in the GPU 103, or may be implemented by executing a specific program in the processing core of the GPU 103. For example, a vertex shader may be implemented in a separately designed fixed-function processor in GPU 103, or may be implemented by executing shader programs in processing cores in GPU 103. Similarly, the input assembler can also be implemented in a separately designed fixed-function processor in the GPU 103, or by executing specific programs in the processing cores of the GPU 103.
此外,如图3所述,还存在顶点缓冲(Vertex Buffer,VB),其用于从系统存储器101接收顶点数据,进而将顶点数据传输到输入组装器中。一般而言,顶点缓冲存储在GPU 103上的图形存储器204中。GPU 103中,在图形存储器204和处理核阵列203之间还可以设置高速缓存(未示出)。顶点缓冲也可以存储在高速缓存或者处理核(2031-2036)可以访问的其他存储区域中。还存在深度缓冲(DB),其可以存储在例如图2中示出的图形存储器204中,也可以是存储在高速缓存(未示出)或者处理核可以访问的其他存储区域中。In addition, as described in FIG. 3, there is a vertex buffer (Vertex Buffer, VB), which is used to receive vertex data from the system memory 101, and then transmit the vertex data to the input assembler. Generally speaking, vertex buffers are stored in graphics memory 204 on GPU 103. In the GPU 103, a cache memory (not shown) may also be provided between the graphics memory 204 and the processing core array 203. Vertex buffers may also be stored in caches or other storage areas accessible by processing cores (2031-2036). There is also a depth buffer (DB), which may be stored in, for example, graphics memory 204 shown in FIG. 2, or in a cache (not shown) or other storage area accessible by the processing cores.
在渲染管线执行时,GPU 103从例如CPU 102接收命令(例如draw-call命令)和/或数据(包括渲染状态,例如绘制对象的材质、纹理、着色器程序等)等,以及从外部系统存储器101接收顶点数据,根据命令(例如draw-call命令)进行图元渲染,最后生成显示屏上的输出图像。During rendering pipeline execution, GPU 103 receives commands (eg, draw-call commands) and/or data (including rendering state, such as materials, textures, shader programs, etc.), etc., from, for example, CPU 102, as well as from external system memory 101 receives vertex data, performs primitive rendering according to commands (such as draw-call commands), and finally generates an output image on the display screen.
具体来说,输入组装器(Input Assembler,IA)从顶点缓冲接收顶点数据(顶点坐标和索引)从而组合为几何图元(例如,三角形、直线等)。Specifically, the Input Assembler (IA) receives vertex data (vertex coordinates and indices) from the vertex buffer for assembly into geometric primitives (eg, triangles, lines, etc.).
接下来,顶点着色器确定顶点的属性(光照、颜色等),将完成着色的顶点数据提供到图元组装器。图元组装器例如通过裁剪、透视分割和视口变换等操作生成图元。接着,光栅化器用于将图元组装器(PA)生成的图元产生代表相应图元的屏幕上的像素。像素着色器通过执行像素着色器指令来确定各个像素的颜色。Next, the vertex shader determines the attributes of the vertex (lighting, color, etc.) and provides the shaded vertex data to the primitive assembler. Primitive assemblers generate primitives through operations such as clipping, perspective segmentation, and viewport transformations. Next, a rasterizer is used to combine the primitives generated by the Primitive Assembler (PA) into on-screen pixels representing the corresponding primitives. Pixel shaders determine the color of individual pixels by executing pixel shader instructions.
在GPU 103调用Draw-Call命令进行渲染时,由于渲染对象的渲染顺序,后面渲染的对象可能被前面渲染的对象所遮挡。在图形渲染管线中,在光栅化器之后,要针对像素数据进行深度测试来剔除被遮挡的不再需要渲染的图元或者像素。When the GPU 103 invokes the Draw-Call command for rendering, due to the rendering order of the rendered objects, the later rendered objects may be occluded by the earlier rendered objects. In the graphics rendering pipeline, after the rasterizer, a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered.
在深度缓冲中,每个像素存储相应的深度值。在深度测试时,将像素的深度值与当前深度缓冲区中的深度值进行比较,如果大于或者等于深度缓冲区中的深度值,则认为该像素是被遮挡的,因此丢弃该像素;否则将该像素对应的深度值写入深度缓冲中更新深度缓冲区中的深度值。In the depth buffer, each pixel stores a corresponding depth value. During the depth test, the depth value of the pixel is compared with the depth value in the current depth buffer. If it is greater than or equal to the depth value in the depth buffer, the pixel is considered to be occluded, so the pixel is discarded; otherwise, the pixel will be discarded. The depth value corresponding to the pixel is written into the depth buffer to update the depth value in the depth buffer.
一般而言,一个draw-call命令配置有对应的深度测试模式:例如,提前深度测试(Early-Z测试)、后期深度测试(Late-Z测试)或者保守深度测试(Conservative-Z测试)。提前深度测试(Early-Z测试)、后期深度测试(Late-Z测试)分别在Early-Z测试单元、Late-Z测试单元中完成。在进行相应深度测试时,从深度缓冲中读取参考值进行深度测试,如果测试通过,则将通过的像素的深度值写入深度缓冲中以便更新深度缓冲。对于保守深度测试,涉及到提前深度测试和后期深度测试,分别在如图3所示的Early-Z测试单元、Late-Z测试单元中完成。Generally speaking, a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test). The early depth test (Early-Z test) and the later depth test (Late-Z test) are completed in the Early-Z test unit and the Late-Z test unit respectively. When the corresponding depth test is performed, the reference value is read from the depth buffer to perform the depth test, and if the test passes, the depth value of the passed pixel is written into the depth buffer to update the depth buffer. For conservative depth testing, it involves advance depth testing and late depth testing, which are completed in the Early-Z testing unit and the Late-Z testing unit as shown in Figure 3, respectively.
上述针对渲染管线中的各个单元或者模块的功能的描述仅是示例性的,而不是限制性 的。The above description of the functions of each unit or module in the rendering pipeline is only exemplary, and not restrictive.
本申请实施例的像素处理方法旨在通过改进的方法来解决现有技术中draw-call不连贯的问题,进而提高计算机图形处理系统的性能。The pixel processing method of the embodiments of the present application aims to solve the problem of incoherence of draw-calls in the prior art through an improved method, thereby improving the performance of a computer graphics processing system.
本申请实施例提出的像素处理方法例如可以应用于计算机图形处理系统的渲染过程中。特别是,本申请实施例的像素处理方法例如可以应用在渲染管线的光栅化之后的像素着色和深度测试阶段中,用于改善draw-call在像素处理阶段(例如包括像素着色和深度测试)的不连贯的问题。The pixel processing method proposed by the embodiments of the present application can be applied to, for example, a rendering process of a computer graphics processing system. In particular, the pixel processing methods of the embodiments of the present application can be applied, for example, in the pixel shading and depth testing stages after rasterization of the rendering pipeline, to improve the performance of draw-calls in the pixel processing stages (eg, including pixel shading and depth testing). Incoherent problem.
下面参照图4和图5详细描述本申请实施例提供的像素处理方法。The pixel processing method provided by the embodiments of the present application will be described in detail below with reference to FIG. 4 and FIG. 5 .
图4是实施本申请实施例的像素处理方法的GPU 103中的一种渲染管线的示意图。与图3的渲染管线不同在于,图4中,设置控制单元和标记缓冲,控制单元可以是GPU103中的单独设置的硬件单元,也可以是GPU 103中处理核中执行计算机程序而实现的模块。标记缓冲可以是单独存储在GPU 103的图形存储器204上。FIG. 4 is a schematic diagram of a rendering pipeline in the GPU 103 implementing the pixel processing method according to the embodiment of the present application. Different from the rendering pipeline of Fig. 3, in Fig. 4, a control unit and a mark buffer are set, and the control unit can be a hardware unit set separately in the GPU 103, or a module implemented by executing a computer program in the processing core in the GPU 103. The marker buffer may be stored separately on graphics memory 204 of GPU 103.
如图5所示,在步骤S10中,在当前draw-call的光栅化之后,控制单元读取标记缓冲来确定该当前draw-call关联的像素区域对应的标记PipeNeedDrain的值是否为0。如果是,则指示可以执行该当前draw-call的像素着色和深度测试,即当前draw-call可以进入渲染管线的下一级。这里深度测试模式包括但不限于,Early-Z测试、保守深度测试或者Late-Z测试。As shown in FIG. 5 , in step S10 , after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain corresponding to the pixel area associated with the current draw-call is 0. If so, it indicates that pixel shading and depth testing for this current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline. The depth test mode here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test.
接着,在步骤S11中,控制单元根据当前draw-call的深度测试模式,来设置其关联的像素区域的标记PipeNeedDrain的值,来控制该当前draw-call之后的后续draw-call是否能够对于该像素区域执行像素着色和深度测试。例如,控制单元将设置的对于当前draw-call关联的像素区域的标记写入标记缓冲中,来用于控制该当前draw-call之后的后续draw-call对于该像素区域是否能执行像素着色和深度缓冲。Next, in step S11, the control unit sets the value of the PipeNeedDrain mark of the associated pixel area according to the depth test mode of the current draw-call, to control whether the subsequent draw-call after the current draw-call can be used for the pixel. Regions perform pixel shading and depth testing. For example, the control unit writes the set mark for the pixel area associated with the current draw-call into the mark buffer, so as to control whether the subsequent draw-call after the current draw-call can perform pixel shading and depth for the pixel area buffer.
如果当前draw-call的深度测试模式是Early-Z测试模式,则对于其关联的像素区域保持其之前的标记PipeNeedDrain为0,即指示,可以对于该像素区域可以继续执行后续draw-call的像素着色和深度测试。If the depth test mode of the current draw-call is the Early-Z test mode, keep the previous mark PipeNeedDrain as 0 for its associated pixel area, that is, indicating that the pixel area of the pixel area can continue to perform the pixel coloring of the subsequent draw-call and depth testing.
如果当前draw-call的深度测试模式是保守深度测试或者Late-Z测试模式,则对于其关联的像素区域要设置标记PipeNeedDrain为1,即指示,对于该像素区域,当前draw-call之后的后续draw-call不能执行像素着色和深度测试,直到对于该像素区域的标记PipeNeedDrain清零。If the depth test mode of the current draw-call is conservative depth test or Late-Z test mode, set the flag PipeNeedDrain to 1 for its associated pixel area, that is, to indicate that for this pixel area, the subsequent draw after the current draw-call -call cannot perform pixel shading and depth testing until the flag PipeNeedDrain for that pixel area is cleared.
接着,在步骤S12中,对于该像素区域,执行当前draw-call的像素着色和深度测试。Next, in step S12, for the pixel area, pixel shading and depth testing of the current draw-call are performed.
接着,在步骤S13中,当前draw-call的像素着色和深度测试完成之后,控制单元将该像素区域对应的标记PipeNeedDrain清零,即对于该像素区域可以继续执行后续draw-call的像素着色和深度测试。Next, in step S13, after the pixel coloring and depth test of the current draw-call are completed, the control unit clears the PipeNeedDrain mark corresponding to the pixel area, that is, the pixel coloring and depth of the subsequent draw-call can continue to be performed for this pixel area. test.
优选地,如果后续draw-call的深度测试模式是Late-Z测试,则对于该当前draw-call关联的像素区域,即使标记PipeNeedDrain为1,该后续draw-call可以继续执行像素着色和深度测试,而无需等待该像素区域对应的标记PipeNeedDrain清零。Preferably, if the depth test mode of the subsequent draw-call is the Late-Z test, for the pixel area associated with the current draw-call, even if the flag PipeNeedDrain is 1, the subsequent draw-call can continue to perform pixel shading and depth testing, There is no need to wait for the mark PipeNeedDrain corresponding to the pixel area to be cleared.
下面参照图6-8分别描述对于图5的像素处理方法中当前draw-call深度测试模式分别是Early-Z测试、保守深度测试以及Late-Z测试的具体流程。The specific flow of the current draw-call depth test mode in the pixel processing method of FIG. 5 is the Early-Z test, the conservative depth test, and the Late-Z test, respectively, with reference to FIGS. 6-8 .
图6针对的是当前draw-call深度测试模式是Early-Z测试的情形。Figure 6 is directed to the case where the current draw-call depth test mode is the Early-Z test.
例如,如步骤S110所示,在当前draw-call的光栅化之后,控制单元读取标记缓冲来确定当前draw-call关联的像素区域的标记PipeNeedDrain的值是否是0。如果是,则指示能够执行当前draw-call的像素着色和深度测试,即当前draw-call可以进入渲染管线的下一级。For example, as shown in step S110, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
接着,控制单元根据当前draw-call的深度测试模式,重新设置当前draw-call关联的像素区域的标记PipeNeedDrain的值。当前draw-call的深度测试模式是Early-Z测试,则,如步骤S111所示,保持当前draw-call关联的像素区域的标记PipeNeedDrain的值为0,即指示,对于该像素区域,可以继续执行后续draw-call的像素着色和深度测试。Next, the control unit resets the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call according to the depth test mode of the current draw-call. The depth test mode of the current draw-call is the Early-Z test, then, as shown in step S111, the value of PipeNeedDrain, the marker of the pixel area associated with the current draw-call, is kept as 0, that is, it indicates that for this pixel area, you can continue to execute Pixel shading and depth testing for subsequent draw-calls.
接着,如步骤S112所示,执行当前draw-call的深度测试,即Early-Z测试。Next, as shown in step S112, the depth test of the current draw-call, that is, the Early-Z test is performed.
如果Early-Z测试不通过,则剔除该当前draw-call的图元。如果当前draw-call的Early-Z测试通过(pass),且相应深度缓冲更新之后,如步骤S113所示,执行当前draw-call的像素着色。If the Early-Z test fails, cull the current draw-call primitive. If the Early-Z test of the current draw-call passes and the corresponding depth buffer is updated, as shown in step S113, the pixel shading of the current draw-call is performed.
当前draw-call的像素着色完成之后,如步骤S114所示,控制单元将该像素区域的标记PipeNeedDrain的值清零。After the pixel coloring of the current draw-call is completed, as shown in step S114, the control unit clears the value of the mark PipeNeedDrain of the pixel area to zero.
图7针对的是当前draw-call深度测试模式是保守深度测试的情形。Figure 7 is for the case where the current draw-call depth test mode is conservative depth test.
在步骤S120中,在当前draw-call的光栅化之后,控制单元读取标记缓冲来确定当前draw-call关联的像素区域的标记PipeNeedDrain的值是否是0。如果是,则指示能够执行当前draw-call的像素着色和深度测试,即当前draw-call可以进入渲染管线的下一级。In step S120, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
接着,控制单元根据当前draw-call的深度测试模式,重新设置当前draw-call关联的像素区域的标记PipeNeedDrain的值,用于控制当前draw-call之后的后续draw-call是否能够对当前draw-call关联的像素区域执行像素着色和深度测试。Next, the control unit resets the value of PipeNeedDrain, a marker of the pixel area associated with the current draw-call, according to the depth test mode of the current draw-call, to control whether subsequent draw-calls after the current draw-call can respond to the current draw-call. The associated pixel region performs pixel shading and depth testing.
具体地,如步骤S121所示,根据当前draw-call的深度测试模式是保守-Z模式,则,重新设置当前draw-call关联的像素区域的标记PipeNeedDrain的值为1,即指示,对于该像素区域,不能执行当前draw-call之后的后续draw-call的像素着色和深度测试,而仅当前draw-call的像素着色和深度测试能够执行。Specifically, as shown in step S121, according to the depth test mode of the current draw-call is the conservative-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed, but only pixel shading and depth testing of the current draw-call can be performed.
然后,如步骤S122所示,执行当前draw-call的Early-Z测试。Then, as shown in step S122, the Early-Z test of the current draw-call is performed.
在当前draw-call的Early-Z测试通过之后,如步骤S123所示,执行当前draw-call的像素着色。这里,由于当前draw-call深度测试模式是保守深度测试,其在后面会执行Late-Z测试,因此这里当前draw-call的Early-Z测试通过之后暂时不进行深度缓冲更新,而是在之后Late-Z测试通过之后再进行深度缓冲更新。After the Early-Z test of the current draw-call is passed, as shown in step S123, the pixel coloring of the current draw-call is performed. Here, since the current draw-call depth test mode is a conservative depth test, the Late-Z test will be performed later, so here the depth buffer update will not be performed temporarily after the Early-Z test of the current draw-call passes, but the Late-Z test will be performed later. -Z test pass before depth buffer update.
在当前draw-call的像素着色之后,执行当前draw-call的Late-Z测试。在该Late-Z测试通过且深度缓冲更新之后,如步骤S124所示,将关联的像素区域的标记PipeNeedDrain清零,即对于该像素区域,可以继续执行后续draw-call的像素着色和深度测试。After the current draw-call's pixel shading, perform the current draw-call's Late-Z test. After the Late-Z test is passed and the depth buffer is updated, as shown in step S124, the flag PipeNeedDrain of the associated pixel area is cleared to zero, that is, for this pixel area, pixel coloring and depth testing of subsequent draw-calls can be continued.
同样,优选地,如果后续draw-call的深度测试模式是Late-Z测试,则对于该当前draw-call关联的像素区域,该后续draw-call可以直接继续执行像素着色和Late-Z测试,而无需等待该像素区域对应的标记PipeNeedDrain清零。Likewise, preferably, if the depth test mode of the subsequent draw-call is the Late-Z test, then for the pixel area associated with the current draw-call, the subsequent draw-call can directly continue to perform pixel shading and Late-Z testing, while There is no need to wait for the PipeNeedDrain corresponding to the pixel area to be cleared.
图8针对的是当前draw-call深度测试模式是Late-Z测试的情形。Figure 8 is for the case where the current draw-call depth test mode is the Late-Z test.
在步骤S130中,在当前draw-call的光栅化之后,控制单元读取标记缓冲来确定当前 draw-call关联的像素区域的标记PipeNeedDrain的值是否是0。如果是,则指示能够执行当前draw-call的像素着色和深度测试,即当前draw-call可以进入渲染管线的下一级。In step S130, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.
接着,控制单元根据当前draw-call的深度测试模式,重新设置当前draw-call关联的像素区域的标记PipeNeedDrain的值,用于控制当前draw-call之后的后续draw-call是否能够对当前draw-call关联的像素区域执行像素着色和深度测试。Next, the control unit resets the value of PipeNeedDrain, a marker of the pixel area associated with the current draw-call, according to the depth test mode of the current draw-call, to control whether subsequent draw-calls after the current draw-call can respond to the current draw-call. The associated pixel region performs pixel shading and depth testing.
具体地,如步骤S131所示,根据当前draw-call的深度测试模式是Late-Z模式,则,重新设置当前draw-call关联的像素区域的标记PipeNeedDrain的值为1,即指示,对于该像素区域,不能执行当前draw-call之后的后续draw-call的像素着色和深度测试。Specifically, as shown in step S131, according to the depth test mode of the current draw-call is the Late-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed.
接着,如步骤S132所示,执行当前draw-call的像素着色;像素着色之后,执行其Late-Z测试。Late-Z测试通过且执行相应的深度缓冲更新之后,如步骤S133所示,控制单元将相应像素区域的标记PipeNeedDrain清零。Next, as shown in step S132, the pixel rendering of the current draw-call is performed; after the pixel is rendered, its Late-Z test is performed. After the Late-Z test is passed and the corresponding depth buffer update is performed, as shown in step S133, the control unit clears the mark PipeNeedDrain of the corresponding pixel area to zero.
下面参照图9详细描述本发明实施例的一种具体应用场景。A specific application scenario of the embodiment of the present invention is described in detail below with reference to FIG. 9 .
如图9所示,将屏幕划分成16个像素区域t0~t15。在当前渲染场景中,要执行6个draw-call命令,按渲染顺序分别为d1~d6。d3和d4的深度测试模式为late-Z模式,其余d1-d2和d5-d6均为Early-Z模式。As shown in FIG. 9, the screen is divided into 16 pixel areas t0-t15. In the current rendering scene, six draw-call commands need to be executed, which are d1 to d6 in the rendering order. The depth test mode of d3 and d4 is late-Z mode, and the rest of d1-d2 and d5-d6 are Early-Z mode.
这里,像素区域的划分是示例性的,可以将屏幕划分成更多个像素区域或者更少的像素区域。Here, the division of the pixel area is exemplary, and the screen may be divided into more pixel areas or less pixel areas.
初始时,在标记缓冲中,对于每个像素区域设置标记、例如PipeNeedDrain的初始值为0,即每个像素区域都可以执行draw-call的像素着色和深度测试。例如,如示意图(0-a)、(0-b)所示,初始时,所有像素区域(t0~t15)对应的标记PipeNeedDrain的值均为初始值0。每个像素区域对应的标记PipeNeedDrain的值存储在例如标记缓冲中,例如可以在存储器或者高速缓存中分配存储区域给该标记缓冲。Initially, in the marker buffer, a marker is set for each pixel area, for example, the initial value of PipeNeedDrain is 0, that is, each pixel area can perform draw-call pixel coloring and depth testing. For example, as shown in the schematic diagrams (0-a) and (0-b), initially, the values of the labels PipeNeedDrain corresponding to all pixel regions (t0-t15) are the initial value 0. The value of the mark PipeNeedDrain corresponding to each pixel area is stored in, for example, a mark buffer, for example, a storage area may be allocated to the mark buffer in a memory or a cache.
这里提到draw-call的深度测试包括但不限于,Early-Z测试、保守深度测试或者Late-Z测试。这取决于具体draw-call的深度测试模式。The depth test of draw-call mentioned here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test. It depends on the depth test mode of the specific draw-call.
在当前draw-call的光栅化之后,确定与该当前draw-call关联的像素区域的标记PipeNeedDrain的值。如果该值是0,则指示可以执行当前draw-call的像素着色和深度测试;如果该值是1,则指示禁用其像素着色和深度测试。对于该像素区域的当前draw-call的像素着色和深度测试要等到该像素区域对应的标记PipeNeedDrain的值为0才能执行。After rasterization of the current draw-call, determine the value of the marker PipeNeedDrain for the pixel region associated with this current draw-call. If the value is 0, it indicates that pixel shading and depth testing for the current draw-call can be performed; if the value is 1, it indicates that its pixel shading and depth testing are disabled. The pixel shading and depth test of the current draw-call for this pixel area can only be executed when the value of the PipeNeedDrain corresponding to the pixel area is 0.
如示意图(1-a)和(1-b)所示,在当前draw-call(例如d1)的光栅化之后,确定当前draw-call d1关联的像素区域t0的标记PipeNeedDrain的值为初始值0,则可以正常执行d1的像素着色和深度测试。由于d1的深度测试模式是Early-Z测试,对后续draw-call的深度测试没有影响,因此,控制单元保持对应的像素区域t0的标记PipeNeedDrain的值为0,指示对于该像素区域t0,可以继续执行后续draw-call的像素着色和深度测试。对于下一个draw-call d2也是Early-Z模式,因此同样如此,不再赘述。As shown in schematic diagrams (1-a) and (1-b), after the rasterization of the current draw-call (such as d1), it is determined that the value of the marker PipeNeedDrain of the pixel area t0 associated with the current draw-call d1 is the initial value of 0 , the pixel shading and depth testing of d1 can be performed normally. Since the depth test mode of d1 is Early-Z test, it has no effect on the depth test of subsequent draw-calls. Therefore, the control unit keeps the value of PipeNeedDrain of the corresponding pixel area t0 as 0, indicating that for this pixel area t0, you can continue Perform pixel shading and depth testing for subsequent draw-calls. For the next draw-call d2 is also Early-Z mode, so the same is true and will not be repeated.
如示意图(2-a)和(2-b)所示,在draw-call d3的光栅化之后,控制单元读取深度缓冲来确定其对应的像素区域t9的标记PipeNeedDrain是0,则可以正常执行d3的像素着色和深度测试。这时,由于其Late-Z测试模式将对后续draw-call的深度测试造成影响,则将d3对应的像素区域t9的标记PipeNeedDrain设置成1。这时,该标记PipeNeedDrain的值1 指示,对于该像素区域t9,draw-call d3之后的后续draw-call的至少像素着色和深度测试被禁用。要等到d3的像素着色和late-Z测试完成之后,将像素区域t9的标记PipeNeedDrain清零之后才能执行对于该像素区域t9的后续draw-call的像素着色和深度测试。As shown in the schematic diagrams (2-a) and (2-b), after the rasterization of draw-call d3, the control unit reads the depth buffer to determine that the mark PipeNeedDrain of its corresponding pixel area t9 is 0, then it can be executed normally Pixel shading and depth testing for d3. At this time, since its Late-Z test mode will affect the depth test of subsequent draw-calls, the PipeNeedDrain flag of the pixel area t9 corresponding to d3 is set to 1. At this point, the value 1 of the flag PipeNeedDrain indicates that, for this pixel region t9, at least pixel shading and depth testing of subsequent draw-calls following draw-call d3 are disabled. After the pixel shading and late-Z testing of d3 are completed, the pixel shading and depth testing of the subsequent draw-call for the pixel region t9 can be performed after the PipeNeedDrain flag of the pixel region t9 is cleared to zero.
同理,在draw-call d4的光栅化之后,确定其关联的像素区域t12~t15的标记PipeNeedDrain。由于像素区域t12~t15的标记PipeNeedDrain均是0,则可以正常执行d4的像素着色和深度测试,即draw-call d4进入其渲染管线的下一级。由于其Late-Z测试模式将对后续draw-call的深度测试造成影响,这时,将当前d4关联的像素区域t12~t15的标记PipeNeedDrain设置成1,指示对于该像素区域t9,draw-call d4之后的后续draw-call的至少像素着色和深度测试被禁用,而仅能执行当前draw-call d4的像素着色和深度测试。在当前d4的像素着色完成,并且执行了late-Z操作后,将t12~t15的flag清零。Similarly, after the rasterization of draw-call d4, determine the PipeNeedDrain of the associated pixel area t12~t15. Since the PipeNeedDrain flags of pixel regions t12 to t15 are all 0, the pixel shading and depth test of d4 can be performed normally, that is, draw-call d4 enters the next level of its rendering pipeline. Since its Late-Z test mode will affect the depth test of subsequent draw-calls, at this time, set the PipeNeedDrain flag of the pixel area t12~t15 associated with the current d4 to 1, indicating that for this pixel area t9, draw-call d4 At least pixel shading and depth testing for subsequent draw-calls after that are disabled, and only pixel shading and depth testing for the current draw-call d4 can be performed. After the pixel coloring of the current d4 is completed and the late-Z operation is performed, the flags of t12 to t15 are cleared.
接着,如示意图(3-a)和(3-b)所示,draw-call d5的对象要覆盖像素区域t6、t7、t12和t13。在d5的光栅化之后,确定其对应的每个像素区域t6、t7、t12和t13的标记PipeNeedDrain。例如,对于t6、t7区域的标记为0,则直接可以执行对于t6、t7区域的d5的像素着色和深度测试。对于t12和t13区域,由于d4执行时将关联的像素区域t12~t15的标记PipeNeedDrain设置成1,因此对于t12和t13区域的d5的Early-Z测试以及像素着色,则需要等到这两个区域的标记PipeNeedDrain清零之后才能执行。例如,在d4的像素着色和late-Z操作完成之后,t12~t15的flag清0之后,可以执行对于t12和t13区域的d5的Early-Z测试以及像素着色。Next, as shown in schematic diagrams (3-a) and (3-b), the object of draw-call d5 is to cover the pixel areas t6, t7, t12 and t13. After the rasterization of d5, the marking PipeNeedDrain of each pixel area t6, t7, t12 and t13 corresponding to it is determined. For example, if the flags of the t6 and t7 regions are 0, the pixel coloring and depth test of the d5 of the t6 and t7 regions can be directly performed. For the t12 and t13 areas, since the PipeNeedDrain of the associated pixel areas t12 to t15 is set to 1 when d4 is executed, for the Early-Z test and pixel coloring of d5 in the t12 and t13 areas, you need to wait until the two areas are It can only be executed after the flag PipeNeedDrain is cleared. For example, after the pixel shading and late-Z operation of d4 are completed, and after the flags of t12-t15 are cleared to 0, the Early-Z test and pixel shading of d5 in the t12 and t13 regions can be performed.
如示意图(4-a)和(4-b)所示,当前draw-call d6的对象要覆盖t8、t10、t11,其对应的标记PipeNeedDrain均为0,因此可以直接开始执行当前draw-call d6的Early-Z测试和像素着色。As shown in the schematic diagrams (4-a) and (4-b), the object of the current draw-call d6 should cover t8, t10, and t11, and the corresponding markers PipeNeedDrain are all 0, so the current draw-call d6 can be executed directly. Early-Z testing and pixel shading.
在现有技术中,如图4的场景,由于d3和d4的深度测试模式late-Z模式,当前帧的d3和d4之后的后续所有draw-call的深度测试需要等到d3和d4的像素着色和深度测试全部完成之后才能开始执行。通过本申请上述实施例的技术方案,仅对d3和d4影响到的像素区域设置标记来指示禁用后续draw-call的像素着色和深度测试直到d3和d4影响到的像素区域的标记被清零,而其余没有被d3和d4影响的像素区域可以继续执行后续draw-call的像素着色和深度测试,进一步提升draw-call执行的连贯性和性能。In the prior art, as in the scene shown in Figure 4, due to the late-Z mode of the depth test mode of d3 and d4, the depth test of all subsequent draw-calls after d3 and d4 of the current frame needs to wait until the pixel shading and summation of d3 and d4. Execution can only be started after all in-depth tests have been completed. Through the technical solutions of the above-mentioned embodiments of the present application, only the pixel areas affected by d3 and d4 are set with flags to indicate that the pixel coloring and depth testing of subsequent draw-calls are disabled until the flags of the pixel areas affected by d3 and d4 are cleared, The remaining pixel areas that are not affected by d3 and d4 can continue to perform pixel shading and depth testing of subsequent draw-calls, further improving the consistency and performance of draw-call execution.
基于本申请实施例提出的方法,在特定深度测试模式的当前draw-call对后续draw-call的深度测试造成影响的情况下,通过针对受影响的像素区域设置标记,将像素着色和深度测试的禁用约束在受影响的像素区域,而对于未被禁用的其余像素区域,后续draw-call可以继续执行像素着色和深度测试。相比现有技术中针对上述情况要等到该特定深度测试模式的当前draw-call的像素着色和深度测试完成之后才能执行后续draw-call的像素着色和深度测试的情形相比,提升draw-call命令执行的连贯性和并行性,尽可能减少对GPU并行运算能力的影响。Based on the method proposed in the embodiment of the present application, in the case that the current draw-call of a specific depth test mode affects the depth test of the subsequent draw-call, the affected pixel area is marked by setting a mark, and the pixel coloring and depth testing Disabling is constrained to the affected pixel area, while for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing. Compared with the situation in the prior art that the pixel shading and depth testing of subsequent draw-calls can only be performed after the pixel shading and depth testing of the current draw-call of the specific depth test mode are completed, the draw-call is improved. The coherence and parallelism of command execution minimize the impact on the parallel computing capability of the GPU.
在本申请的实施例中的方法,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序代码或计算机程序指令,可以存储在存储器上。在计算机或者处理器上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请 实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。The methods in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer program codes or computer program instructions, which may be stored on a memory. When the computer program instructions are loaded and executed on a computer or processor, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
所述计算机程序代码或计算机程序指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序代码或计算机程序指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤等)或无线(例如红外、无线电、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘和磁带;可以是光介质,例如DVD;也可以是半导体介质,例如固态硬盘(Solid State Disk,SSD)等。The computer program code or computer program instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program code or computer program instructions may be stored from a computer-readable storage medium. Transmission from one website site, computer, server or data center to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, etc.) or wireless (eg infrared, radio, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; an optical medium, such as a DVD; or a semiconductor medium, such as a solid state disk (Solid State Disk, SSD), and the like.
本申请的方法可以包括各种其它操作和/或所示操作的变化。同样,流程图的操作顺序可以修改。应当理解,不是流程图中的所有操作可以执行。在各种实施例中,方法的一个或多个操作可以由软件、固件、硬件或它们的任意组合来控制或管理,但不限于此。方法可以包括本公开的实施例的过程,其可以在计算机或计算设备可读和可执行指令(或代码)的控制下由处理器和/或电子部件控制或管理。The methods of the present application may include various other operations and/or variations of the operations shown. Likewise, the sequence of operations of the flowchart may be modified. It should be understood that not all operations in the flowcharts may be performed. In various embodiments, one or more operations of the method may be controlled or managed by software, firmware, hardware, or any combination thereof, without limitation. A method may include the processes of an embodiment of the present disclosure, which may be controlled or managed by a processor and/or electronic components under the control of computer or computing device readable and executable instructions (or code).
下面参照图10详细描述实现上面本申请任一实施例的像素处理方法的图形处理系统120。The graphics processing system 120 implementing the pixel processing method according to any of the above embodiments of the present application will be described in detail below with reference to FIG. 10 .
如图10所示,图形处理系统120包括:确定单元121、像素处理单元122和设置单元123。As shown in FIG. 10 , the graphics processing system 120 includes: a determination unit 121 , a pixel processing unit 122 and a setting unit 123 .
具体地,在实现本申请一个实施例的方法时,设置单元123配置成用于:在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试,以及在所述当前draw-call的像素着色和深度测试完成之后,将所述关联的像素区域的标记的值清零;像素处理单元123,配置成用于执行所述当前draw-call的像素着色和深度测试。Specifically, when implementing the method of an embodiment of the present application, the setting unit 123 is configured to: after the rasterization of the current draw-call (draw-call), according to the depth test mode of the current draw-call, set The value of the flag of the pixel region associated with the current draw-call, indicating whether pixel shading and depth testing for subsequent draw-calls following the current draw-call are disabled for the associated pixel region, and in the current draw-call After the pixel shading and depth testing of the draw-call are completed, the value of the flag of the associated pixel area is cleared; the pixel processing unit 123 is configured to perform the pixel shading and depth testing of the current draw-call.
进一步地,在实现该实施例的方法时,当前draw-call的深度测试模式是提前深度测试,则所述设置单元123配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元122配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。Further, when the method of this embodiment is implemented, the depth test mode of the current draw-call is the advance depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is not disabled for the associated pixel area; the pixel processing unit 122 is configured to: perform an early depth test for the current draw-call, and after the early depth test passes and a corresponding depth buffer is updated, perform pixel shading for the current draw-call.
进一步地,在实现该实施例的方法时,当前draw-call的深度测试模式是保守深度测试,则所述设置单元123配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过之后,执行所述当前draw-call的像素着色,在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。Further, when the method of this embodiment is implemented, the depth test mode of the current draw-call is conservative depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is disabled for the associated pixel area; the pixel processing unit is further configured It is used for: executing the advance depth test of the current draw-call, after passing the advance depth test, executing the pixel coloring of the current draw-call, after the pixel coloring of the current draw-call, executing the Describe the post depth test of the current draw-call.
进一步地,在实现该实施例的方法时,当前draw-call的深度测试是后期深度测试,则所述设置单元123配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的像素着色和后期深度测试。Further, when the method of this embodiment is implemented, the depth test of the current draw-call is a later depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the mark of the pixel area associated with the current draw-call indicates that the pixel coloring and the depth test of the subsequent draw-call after the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to use To: Execute pixel shading and post-depth testing of the current draw-call.
进一步地,在实现该实施例的方法时,其中,当前draw-call关联有多个像素区域,在执行当前draw-call的像素着色和深度测试时,根据该当前draw-call的深度测试模式,分别设置每个像素区域的该标记的值,指示对于该像素区域是否禁用后续draw-call的像素着色和深度测试;以及分别在每个像素区域的当前draw-call的像素着色和深度测试完成之后,将该像素区域的标记的值清零。Further, when implementing the method of this embodiment, wherein, the current draw-call is associated with a plurality of pixel regions, and when executing the pixel shading and depth test of the current draw-call, according to the depth test mode of the current draw-call, Sets the value of this flag for each pixel region separately, indicating whether pixel shading and depth testing for subsequent draw-calls are disabled for that pixel region; and after pixel shading and depth testing for the current draw-call for each pixel region, respectively, is complete , the value of the marker of the pixel area is cleared.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

  1. 一种像素处理方法,包括:A pixel processing method, comprising:
    在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;After rasterization of the current draw-call, according to the depth test mode of the current draw-call, the value of the flag of the pixel area associated with the current draw-call is set, indicating that for the associated draw-call Whether the pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call;
    执行所述当前draw-call的像素着色和深度测试;以及performing pixel shading and depth testing of the current draw-call; and
    在所述当前draw-call的像素着色和深度测试完成之后,将所述关联的像素区域的标记的值清零。After the pixel shading and depth testing of the current draw-call is completed, the value of the flag of the associated pixel region is cleared.
  2. 如权利要求1所述的像素处理方法,其中,深度测试模式是提前深度测试,所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;The pixel processing method of claim 1, wherein the depth test mode is an advance depth test, the method further comprising: setting pixels associated with the current draw-call after rasterization of the current draw-call The value of the flag of the region, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are not disabled for the associated pixel region;
    其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。Wherein, executing the pixel shading and depth test of the current draw-call includes: executing the advance depth test of the current draw-call, and after the advance depth test is passed and the corresponding depth buffer is updated, executing the current draw-call Pixel shading for call.
  3. 如权利要求1所述的像素处理方法,其中,深度测试模式是保守深度测试,则所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;The pixel processing method of claim 1, wherein the depth test mode is a conservative depth test, the method further comprising: after rasterization of the current draw-call, setting a value associated with the current draw-call The value of the marker of the pixel area, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call is disabled for the associated pixel area;
    其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过之后,执行所述当前draw-call的像素着色,在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。Wherein, executing the pixel shading and depth test of the current draw-call includes: executing the advance depth test of the current draw-call, after passing the advance depth test, executing the pixel shading of the current draw-call, in After the pixel shading of the current draw-call, a later depth test of the current draw-call is performed.
  4. 如权利要求1所述的像素处理方法,其中,深度测试是后期深度测试,则所述方法进一步包括:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;The pixel processing method of claim 1, wherein the depth test is a post-depth test, the method further comprising: after rasterization of the current draw-call, setting pixels associated with the current draw-call The value of the flag of the region indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for this associated pixel region;
    其中,执行所述当前draw-call的像素着色和深度测试包括:执行所述当前draw-call的像素着色和后期深度测试。Wherein, performing pixel shading and depth testing of the current draw-call includes: performing pixel shading and later depth testing of the current draw-call.
  5. 一种图形处理器,包括存储器,用于存储指令,以及处理单元,配置成在执行指令时执行如权利要求1-4中任一项所述的方法。A graphics processor comprising a memory for storing instructions, and a processing unit configured to perform the method of any of claims 1-4 when executing the instructions.
  6. 一种计算机可读存储介质,所述计算机可读存储介质中存储了程序代码,所述程序代码被计算机或处理器执行时,实现如权利要求1-4中任一项所述的方法。A computer-readable storage medium, in which program codes are stored, and when the program codes are executed by a computer or a processor, the method according to any one of claims 1-4 is implemented.
  7. 一种计算机程序产品,所述计算机程序产品包含的程序代码被计算机或处理器执行时,实现如权利要求1-4中任一项所述的方法。A computer program product, the program code contained in the computer program product, when executed by a computer or a processor, implements the method according to any one of claims 1-4.
  8. 一种图形处理系统,包括:A graphics processing system including:
    设置单元,配置成用于在当前draw-call的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像 素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试,以及在所述当前draw-call的像素着色和深度测试完成之后,将所述关联的像素区域的标记的值清零;a setting unit configured to, after rasterization of the current draw-call, according to the depth test mode of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call Whether the associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call, and after the completion of the current draw-call pixel shading and depth testing, the associated pixel area The marked value is cleared;
    像素处理单元,配置成用于执行所述当前draw-call的像素着色和深度测试。A pixel processing unit configured to perform pixel shading and depth testing of the current draw-call.
  9. 如权利要求8所述的图形处理系统,其中,深度测试模式是提前深度测试,所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。9. The graphics processing system of claim 8, wherein the depth test mode is an advance depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the same as the current draw-call A value of a flag of a call associated pixel region indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are not disabled for the associated pixel region; the pixel processing unit is configured to: execute For the advance depth test of the current draw-call, after the advance depth test is passed and the corresponding depth buffer is updated, pixel shading of the current draw-call is performed.
  10. 如权利要求8所述的图形处理系统,其中,深度测试模式是保守深度测试,所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的提前深度测试,在所述提前深度测试通过之后,执行所述当前draw-call的像素着色,在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。9. The graphics processing system of claim 8, wherein the depth test mode is a conservative depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the same as the current draw-call The value of the flag of a call associated pixel area indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to: execute In the advance depth test of the current draw-call, after the advance depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the current draw-call is performed. post-depth testing.
  11. 如权利要求8所述的图形处理系统,其中,深度测试是后期深度测试,所述设置单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述像素处理单元进一步配置成用于:执行所述当前draw-call的像素着色和后期深度测试。8. The graphics processing system of claim 8, wherein the depth test is a post-depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the current draw-call with the the value of the flag of the associated pixel area, indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to: execute the Pixel shading and post depth testing for the current draw-call.
  12. 一种图形处理系统,包括:A graphics processing system including:
    控制单元,配置成用于:在当前绘制调用命令(draw-call)的光栅化之后,根据所述当前draw-call的深度测试模式,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域是否禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;在对于该像素区域的和深度测试完成之后,将该像素区域的该标记的值清零;A control unit configured to: after the rasterization of the current draw-call, set the flag of the pixel area associated with the current draw-call according to the depth test mode of the current draw-call value indicating whether to disable pixel shading and depth testing of subsequent draw-calls after the current draw-call for the associated pixel region; after the completion of the and depth testing for the pixel region, this mark of the pixel region The value of is cleared;
    像素着色器,配置成用于:执行当前draw-call的像素着色;A pixel shader configured to: perform pixel shading for the current draw-call;
    深度测试单元,配置成用于:执行当前draw-call的深度测试;以及a depth test unit configured to: execute a depth test of the current draw-call; and
    标记缓冲,配置成用于:存储所述关联的像素区域的标记的值。A tag buffer configured to: store a tag value of the associated pixel region.
  13. 如权利要求12所述的图形处理系统,其中,The graphics processing system of claim 12, wherein,
    深度测试模式是提前深度测试,The depth test mode is a depth test in advance,
    所述控制单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域未禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;The control unit is further configured to: after rasterization of the current draw-call, set a value of a flag of a pixel region associated with the current draw-call indicating that all pixel regions are not disabled for the associated pixel region. Describe the pixel shading and depth testing of subsequent draw-calls after the current draw-call;
    所述深度测试单元包括提前深度测试单元,配置成用于:执行所述当前draw-call的提前深度测试;The depth testing unit includes an advanced depth testing unit configured to: perform an advanced depth testing of the current draw-call;
    所述像素着色器配置成用于:在所述提前深度测试通过且相应的深度缓冲更新之后,执行所述当前draw-call的像素着色。The pixel shader is configured to perform pixel shading of the current draw-call after the early depth test passes and a corresponding depth buffer is updated.
  14. 如权利要求12所述的图形处理系统,其中,The graphics processing system of claim 12, wherein,
    深度测试模式是保守深度测试,The depth test mode is a conservative depth test,
    所述控制单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于所述关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;The control unit is further configured to: after rasterization of the current draw-call, set a value of a flag of a pixel region associated with the current draw-call indicating that the associated pixel region is disabled for the Pixel shading and depth testing of subsequent draw-calls after the current draw-call;
    所述深度测试单元包括提前深度测试单元和后期深度测试单元,所述提前深度测试单元配置成用于:执行所述当前draw-call的提前深度测试;The depth test unit includes an advance depth test unit and a later depth test unit, and the advance depth test unit is configured to: execute the advance depth test of the current draw-call;
    所述像素着色器配置成用于:在所述提前深度测试通过之后,执行所述当前draw-call的像素着色;The pixel shader is configured to: perform pixel shading of the current draw-call after the advance depth test is passed;
    所述后期深度测试单元配置成用于:在所述当前draw-call的像素着色之后,执行所述当前draw-call的后期深度测试。The post depth testing unit is configured to perform post depth testing of the current draw-call after pixel shading of the current draw-call.
  15. 如权利要求12所述的图形处理系统,其中,The graphics processing system of claim 12, wherein,
    深度测试是后期深度测试,则所述控制单元进一步配置成用于:在所述当前draw-call的光栅化之后,设置与所述当前draw-call关联的像素区域的标记的值,指示对于该关联的像素区域禁用所述当前draw-call之后的后续draw-call的像素着色和深度测试;所述深度测试单元包括后期深度测试单元,所述像素着色器配置成用于:执行所述当前draw-call的像素着色;所述后期深度测试单元配置成用于执行后期深度测试。The depth test is a later depth test, the control unit is further configured to: after rasterization of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; the depth testing unit includes a later depth testing unit, and the pixel shader is configured to: execute the current draw Pixel shading for -call; the post depth test unit is configured to perform post depth tests.
PCT/CN2020/132537 2020-11-28 2020-11-28 Pixel processing method and graphics processing unit WO2022110084A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080107534.8A CN116529771A (en) 2020-11-28 2020-11-28 Pixel processing method and graphics processor
PCT/CN2020/132537 WO2022110084A1 (en) 2020-11-28 2020-11-28 Pixel processing method and graphics processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/132537 WO2022110084A1 (en) 2020-11-28 2020-11-28 Pixel processing method and graphics processing unit

Publications (1)

Publication Number Publication Date
WO2022110084A1 true WO2022110084A1 (en) 2022-06-02

Family

ID=81755174

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132537 WO2022110084A1 (en) 2020-11-28 2020-11-28 Pixel processing method and graphics processing unit

Country Status (2)

Country Link
CN (1) CN116529771A (en)
WO (1) WO2022110084A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107077833A (en) * 2014-11-20 2017-08-18 英特尔公司 The apparatus and method that efficient frame for the framework that finally sorts is utilized to frame coherence
US20180284872A1 (en) * 2017-04-01 2018-10-04 Intel Corporation Adaptive multi-resolution for graphics
US20180350036A1 (en) * 2017-06-01 2018-12-06 Qualcomm Incorporated Storage for foveated rendering
CN111986279A (en) * 2019-05-24 2020-11-24 辉达公司 Techniques for efficiently accessing memory and avoiding unnecessary computations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107077833A (en) * 2014-11-20 2017-08-18 英特尔公司 The apparatus and method that efficient frame for the framework that finally sorts is utilized to frame coherence
US20180284872A1 (en) * 2017-04-01 2018-10-04 Intel Corporation Adaptive multi-resolution for graphics
US20180350036A1 (en) * 2017-06-01 2018-12-06 Qualcomm Incorporated Storage for foveated rendering
CN111986279A (en) * 2019-05-24 2020-11-24 辉达公司 Techniques for efficiently accessing memory and avoiding unnecessary computations

Also Published As

Publication number Publication date
CN116529771A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US10282813B2 (en) Flex rendering based on a render target in graphics processing
JP6073533B1 (en) Optimized multi-pass rendering on tile-based architecture
KR101697910B1 (en) Fault-tolerant preemption mechanism at arbitrary control points for graphics processing
JP6042584B2 (en) Conditional execution of rendering commands based on visible information per bin along with added inline behavior
EP3353746B1 (en) Dynamically switching between late depth testing and conservative depth testing
US20170083997A1 (en) Storing bandwidth-compressed graphics data
US9852539B2 (en) Single pass surface splatting
WO2022089592A1 (en) Graphics rendering method and related device thereof
EP3427229B1 (en) Visibility information modification
CN111080761B (en) Scheduling method and device for rendering tasks and computer storage medium
US8736624B1 (en) Conditional execution flag in graphics applications
US10346943B2 (en) Prefetching for a graphics shader
WO2022110084A1 (en) Pixel processing method and graphics processing unit
US20190220411A1 (en) Efficient partitioning for binning layouts
US11481967B2 (en) Shader core instruction to invoke depth culling
WO2022067499A1 (en) Coarse-grained depth testing method and graphics processor
CN111179151B (en) Method and device for improving graphic rendering efficiency and computer storage medium
US7685371B1 (en) Hierarchical flush barrier mechanism with deadlock avoidance
CN110892383A (en) Delayed batch processing of incremental constant loads

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962974

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080107534.8

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20962974

Country of ref document: EP

Kind code of ref document: A1