WO2022110084A1

WO2022110084A1 - Pixel processing method and graphics processing unit

Info

Publication number: WO2022110084A1
Application number: PCT/CN2020/132537
Authority: WO
Inventors: 殷亚云
Original assignee: 华为技术有限公司
Priority date: 2020-11-28
Filing date: 2020-11-28
Publication date: 2022-06-02
Also published as: CN116529771A

Abstract

Provided is a pixel processing method, comprising: after the current draw-call command is rasterized, according to a depth test mode of the current draw-call, setting the value of a flag of a pixel area associated with the current draw-call, so as to indicate whether pixel shading and depth tests of a subsequent draw-call following the current draw-call are disabled for the associated pixel area; executing pixel shading and depth tests of the current draw-call; and after the pixel shading and depth tests of the current draw-call are completed, clearing the value of the flag of the associated pixel area to zero. On the basis of the method, when the current draw-call in a specific depth test mode affects a depth test of a subsequent draw-call, by means of providing a flag for a specific pixel area, the disabling of pixel shading and depth tests is limited to an affected pixel area, and for the remaining pixel areas on which disabling is not performed, the subsequent draw-call can continue to execute pixel shading and depth tests, such that the consistency of draw-call command execution is improved, and influence on the parallel operation capability of a GPU is reduced to the greatest extent.

Description

Pixel processing method and graphics processor

technical field

The present application relates to the technical field of graphics processing units (GPUs), and in particular, to a pixel processing method executed in a graphics processor.

Background technique

In current computer graphics systems, virtual 3D objects are rendered on a 2D display screen by executing a graphics rendering pipeline in a graphics processor GPU. Specifically, the GPU receives commands (eg, draw-call commands, i.e., draw-call commands) and/or data (including rendering state, such as materials, textures, shaders, etc. of the drawn object), etc., from, for example, the CPU, as well as from external system memory (not shown) receives vertex data, renders primitives according to commands, such as draw-call commands, and finally generates an output image on the display screen.

When the GPU invokes the Draw-Call command for rendering, due to the rendering order of the rendered objects, the later rendered objects may be occluded by the earlier rendered objects. In the graphics rendering pipeline, after the rasterizer, a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered. Generally speaking, a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test). The depth test in most scenes can be performed before pixel-shader, and the pixels that have not been discarded enter pixel-shader for rendering. This is the so-called early depth test (Early Z test). There are also some scenes, such as the depth value of which is affected by pixel shading, and the depth test needs to be performed after pixel shading, which is the so-called post-depth test (Late-Z test). In some cases, although the depth value is affected by pixel shading, it can be known in advance that its influence always changes in one direction. For example, the depth value after pixel shading will always be larger than the original value. Before performing an early depth test (Early-Z test), after pixel shading, a post-depth test (Late-Z test) is performed, which is called a conservative depth test (Conservative-Z test).

In the prior art, draw-calls in certain depth test modes, such as draw-calls in Late-Z testing or Conservative-Z testing, may affect the depth testing of subsequent draw-calls. For example, for the draw-calls of the Late-Z test or the Conservative-Z test, when these draw-calls have not completed the depth test and the depth buffer update (ie, read and write to the depth buffer), the subsequent draw-calls (such as Early-Z draw-call) suspends the depth test, otherwise it will cause inconsistencies in the depth buffer. This obviously causes the coherence of draw-call execution to become worse, resulting in serious degradation of GPU performance.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a pixel processing method for at least solving one of the above-mentioned shortcomings in the prior art.

According to a first aspect of the present application, there is provided a pixel processing method, comprising: after rasterization of a current draw-call command (draw-call), according to a depth test mode of the current draw-call, setting The value of the flag of the -call associated pixel area indicating whether pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; perform pixel shading for the current draw-call and depth testing; and after pixel shading and depth testing of the current draw-call are completed, clear the value of the flag of the associated pixel region to zero.

Based on the above method, in the case that the current draw-call of a specific depth test mode will affect the depth test of the subsequent draw-call, the pixel area associated with the current draw-call will be marked (PipeNeedDrain), and the subsequent draw The disabling of pixel shading and depth testing of -call is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing, thereby improving the coherence of draw-call command execution. to minimize the impact on the parallel computing capability of the GPU.

Optionally, if the depth testing mode is advance depth testing, the method further includes: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area does not disable pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, performing pixel shading and depth testing of the current draw-call includes: performing the current draw-call Call's depth-ahead test, after the depth-ahead test is passed and the corresponding depth buffer is updated, the pixel shading of the current draw-call is performed.

Optionally, the depth test mode is conservative depth test, the method further comprises: after rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the current draw-call In the advanced depth test, after the advanced depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.

Optionally, the depth test is a later depth test, the method further comprises: after the rasterization of the current draw-call, setting the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; wherein, executing the pixel shading and depth testing of the current draw-call includes: executing the pixels of the current draw-call shading, and perform a post-depth test of the current draw-call.

According to a second aspect of the present application, there is provided a graphics processor including a memory for storing instructions, and a processing unit configured to perform any of the above methods when executing the instructions.

According to a third aspect of the present application, a computer-readable storage medium is provided, where program codes are stored in the computer-readable storage medium, and when the program codes are executed by a computer or a processor, any one of the above methods is implemented.

According to a fourth aspect of the present application, a computer program product is provided. When the program code included in the computer program product is executed by a computer or a processor, any of the above methods can be implemented.

According to a fifth aspect of the present application, there is provided a graphics processing system, comprising: a setting unit configured to, after rasterization of a current draw-call (draw-call), test a mode according to the depth of the current draw-call , setting the value of the flag of the pixel area associated with the current draw-call, indicating whether pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for the associated pixel area, and in all After the pixel coloring and the depth test of the described current draw-call are completed, the value of the mark of the associated pixel area is cleared to zero; And the pixel processing unit is configured to perform the pixel coloring and the depth test of the current draw-call .

Optionally, if the depth test mode is an advance depth test, the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are not disabled for the associated pixel region; the pixel processing unit is configured to: execute an advance of the current draw-call Depth test, after the advance depth test passes and the corresponding depth buffer is updated, pixel shading for the current draw-call is performed.

Optionally, if the depth test mode is conservative depth test, the setting unit is further configured to: after the rasterization of the current draw-call, set the mark of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel region; the pixel processing unit is further configured to: execute an advance of the current draw-call In the depth test, after the advance depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the later depth test of the current draw-call is performed.

Optionally, the depth test is a later depth test, and the setting unit is further configured to: after the rasterization of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call , indicating that the pixel shading and depth testing of subsequent draw-calls after the current draw-call is disabled for the associated pixel area; the pixel processing unit is further configured to: execute the pixel shading of the current draw-call and Later in-depth testing.

According to a sixth aspect of the present application, there is provided a graphics processing system, comprising: a control unit configured to: after rasterization of a current draw-call (draw-call), perform a depth test according to the current draw-call mode, set the value of the marker of the pixel area associated with the current draw-call, indicating whether to disable the pixel shading and depth test of the subsequent draw-call after the current draw-call for the associated pixel area; After the summation and depth testing of the pixel area is completed, the value of the mark in the pixel area is cleared to zero; the pixel shader is configured to: execute pixel shading of the current draw-call; the depth test unit is configured to: execute a depth test of the current draw-call; and a marker buffer configured to: store the value of the marker of the associated pixel region.

Optionally, if the depth test mode is an advance depth test, the control unit is further configured to: after the rasterization of the current draw-call, set the marking of the pixel area associated with the current draw-call. value indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are not disabled for the associated pixel area; the depth testing unit includes an advanced depth testing unit configured to: execute the A look-ahead depth test for the current draw-call; the pixel shader is configured to perform pixel shading for the current draw-call after the look-ahead depth test passes and a corresponding depth buffer is updated.

Optionally, the depth test mode is conservative depth test, then the control unit is further configured to: after the rasterization of the current draw-call, set the flag of the pixel area associated with the current draw-call. value, indicating that the pixel shading and depth test of the subsequent draw-call after the current draw-call is disabled for the associated pixel area; the depth test unit includes an advance depth test unit and a later depth test unit, the advance depth The testing unit is configured to: perform an advance depth test of the current draw-call; the pixel shader is configured to: after the advance depth test is passed, perform pixel shading of the current draw-call; The post-depth testing unit is configured to perform post-depth testing of the current draw-call after pixel shading of the current draw-call.

Optionally, the depth test is a later depth test, and the control unit is further configured to: after the rasterization of the current draw-call (draw-call), set the pixels associated with the current draw-call The value of the flag of the region, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are disabled for this associated pixel region; the depth testing unit includes a later depth testing unit, and the pixel shader configures is configured to: perform pixel shading of the current draw-call; and the post-stage depth testing unit is configured to perform post-stage depth testing.

Based on the method proposed in the embodiments of the present application, in the case where the current draw-call of a specific depth test mode affects the depth test of the subsequent draw-call, the pixels of the subsequent draw-call are marked by setting a mark for the affected pixel area. The disabling of shading and depth testing is constrained to the affected pixel area, and for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing. Compared with the prior art for the above-mentioned situation, the pixels of the subsequent draw-call cannot be executed until the draw-call pipeline of the specific depth test mode is emptied, or the pixel coloring and depth test of the draw-call of the specific depth test mode are completed. Compared with the case of shading and depth testing, the continuity of the execution of the draw-call command is improved, and the impact on the parallel computing capability of the GPU is minimized.

Description of drawings

FIG. 1 is a schematic diagram of a computing device implementing an embodiment of the present application.

FIG. 2 is a schematic diagram of a GPU implementing an embodiment of the present application.

FIG. 3 is an example of a rendering pipeline implemented in the GPU of FIG. 2 .

FIG. 4 is an example of a rendering pipeline implemented in a GPU implementing the pixel processing method according to the embodiment of the present application.

FIG. 5 is a flowchart of a pixel processing method provided by an embodiment of the present application.

FIG. 6 is a flowchart of a pixel processing method provided in an embodiment of the present application in an Early-Z mode.

FIG. 7 is a flowchart of a pixel processing method provided by an embodiment of the present application in a conservative depth test mode.

FIG. 8 is a flowchart of a pixel processing method provided in an embodiment of the present application in a Late-Z mode.

FIG. 9 is a schematic diagram of an implementation scenario for implementing the pixel processing method provided by the embodiment of the present application.

FIG. 10 is a schematic diagram of a graphics processing system implementing the pixel processing method according to an embodiment of the present application.

Detailed ways

The technical solutions provided by the present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the system structures and application scenarios provided in the embodiments of the present application are mainly to explain some possible implementations of the technical solutions of the present application, and should not be interpreted as the only limitations on the technical solutions of the present application. Those of ordinary skill in the art can know that with the changes of the system and the emergence of updated application scenarios, the technical solutions provided in this application are still applicable.

The terms "first", "second", and "third" in the embodiments of the present application and the drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to mean non-exclusive inclusion, eg, the inclusion of a series of steps or elements. A method, system, product or device is not necessarily limited to those steps or elements literally listed, but may include other steps or elements not literally listed or inherent to the process, method, product or device.

It should be understood that, in this application, the size of the sequence numbers of the steps does not mean the sequence of execution, and the execution sequence of each step should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

The methods provided by the embodiments of the present application are implemented, for example, in a computing device. The overall architecture of the computing device is shown in FIG. 1 .

Referring to FIG. 1, a computing device 100 configured to implement one or more aspects of the embodiments of the present application is shown. The computing device 100 may include, but is not limited to, the following: personal computers, such as laptop computers, desktop computers, tablet computing devices, etc., but also wireless devices, mobile phones (including smart phones), personal digital assistants ( PDA), video game consoles (including video monitors, mobile video game devices, mobile video conferencing units), TV set-top boxes, in-car intelligent systems, intelligent wearable devices, e-book readers, fixed or mobile media players, etc.

In the embodiment of FIG. 1 , computing device 100 may include a central processing unit (CPU) 102 and system memory 101 in communication via, for example, a memory bridge 104 . The memory bridge 104 may be, for example, a Northbridge chip connected to the I/O (input/output) bridge 105 via a bus or other communication path 112 (eg, a hypertransport link). I/O bridge 105 may be, for example, a south bridge chip that receives user input from one or more input devices 107 (eg, a keyboard, mouse, trackball, touch screen of a display device, or other type of input device) and communicates via communication path 112 and Memory bridge 104 forwards user input to CPU 102. Graphics processing unit (GPU) 103 is coupled to memory bridge 104 via a bus or other communication path 112 (eg, PCI Express, Accelerated Graphics Port, or HyperTransport Link) to communicate with CPU 102 and system memory 101. In one embodiment, GPU 103 may perform graphics processing operations to generate and communicate pixel data to display device 110.

The system disk 106 is also connected to the I/O bridge 105 . Computing device 100 may also include other components (not explicitly shown), such as USB or other port connections, CD drives, DVD drives, and the like, which may also be connected to I/O bridge 105 . The communication paths interconnecting the various components in Figure 1 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), HyperTransport, or any other bus or point-to-point point communication protocols, and connections between different devices may use different protocols known in the art.

The configuration of the computing device 100 shown in FIG. 1 is only an example, and those skilled in the art can understand that there may be other configurations of the computing device 100 . It should be understood that variations and modifications are possible. The connection topology, eg, the number and arrangement of bridges, the number of CPUs, and the number of GPUs can be modified as needed. In other embodiments, the computing device 100 may include two or more CPUs 102 and two or more GPUs 103.

In one embodiment, GPU 103 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. GPU 103 may be integrated with one or more other components, such as memory bridge 104, CPU 102, and I/O bridge 105, to form a system-on-chip (SOC).

FIG. 2 shows a schematic block diagram of the GPU 103 in the computing device 100 of FIG. 1 that can implement the methods of the embodiments of the present application. In one embodiment, GPU 103 includes circuitry for graphics processing and video processing.

GPU 103 may include a processing core array 203, which may include a plurality of processing cores 2031-2036. FIG. 4 only shows 6 processing cores as an example, and those skilled in the art can understand that the number of processing cores may vary. The processing cores shown may be, for example, general-purpose processing cores or fixed-function processing cores. Based on a plurality of general-purpose processing cores in the processing core array 203, the GPU 103 can concurrently execute a large number of program tasks or computing tasks. Each general-purpose processing core may be programmed to perform various program-related processing tasks, including, but not limited to, graphics rendering operations, and the like. A fixed function processing core may include hardware that is hardwired to perform certain specific functions.

In this embodiment of the present application, the graphics memory 204 may be a part of the GPU 103. GPU 103 may read data from or write data to graphics memory 204. That is, GPU 103 may use local storage instead of external memory to store data. In some cases, GPU 103 may also utilize system memory 101 via a bus, such as communication path 112, to read and write data. Graphics memory 204 may include one or more volatile or nonvolatile memories or storage devices, such as random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), erasable programmable ROM ( EPROM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic data storage devices or optical storage devices, and the like.

GPU 103 may be configured to perform various operations: receive graphics data from CPU 102 and/or system memory 101 via memory bridge 104 and a bus, such as communication path 112, process the graphics data to generate pixel data, communicate with local graphics Memory 204 interacts to store and update pixel data, communicate pixel data to display device 110, and the like.

In operation, CPU 102 is the main processor of computing device 100, which controls and coordinates the operation of other components. Specifically, the CPU 102 issues commands to control the operation of the GPU 103. In some embodiments, CPU 102 writes a stream of commands for controlling GPU 103 to, for example, system memory 101, graphics memory 204, or other storage locations accessible to both CPU 102 and GPU 103. GPU 103 reads the command stream and can execute commands asynchronously relative to the operation of CPU 102.

As shown in FIG. 2 , GPU 103 includes an I/O (input/output) unit 202 that communicates with other components of computing device 100 via communication paths 112 connected to memory bridge 104. The connections of GPU 103 to other components of computing device 100 may also vary. In some embodiments, GPU 103 may be implemented as an add-in card, such as may be inserted into an expansion slot of computing device 100.

In one embodiment, the communication path 112 connecting the GPU 103 to the memory bridge 104 may be a PCI-EXPRESS link. As is known in the art, dedicated lanes are allocated to GPU 103 in a PCI-EXPRESS link. I/O unit 202 receives all incoming data packets (or other signals) from communication path 112, directs incoming data packets to the appropriate components of GPU 103, or transmits data packets (or other signals) via communication path 112 to Components external to GPU 103 . For example, I/O unit 202 may direct commands related to processing tasks to scheduler 201 and commands related to memory operations (eg, reads or writes to graphics memory 204 ) to graphics memory 204 .

The processing core array 203 may receive processing tasks to be executed from the scheduler 201 . Scheduler 201 may independently schedule tasks for execution by resources of GPU 103 (eg, one or more processing cores of processing core array 203). In one embodiment, scheduler 201 may be a hardware processor. In the embodiment shown in FIG. 2, the scheduler 201 may be included in the GPU 103. In other embodiments, scheduler 201 may also be a separate unit from CPU 102 and GPU 103. The scheduler 201 may also be configured as any processor that receives a stream of commands and/or operations.

In operation, the CPU 102, via the GPU driver contained in the system memory 101 of FIG. 1, may send to the scheduler 201 a command stream containing a series of operations to be performed by the GPU 103. The scheduler 201 can receive the operation flow including the command stream through the I/O unit 202 and can process the operations of the command stream sequentially based on the order of operations in the command stream, and can schedule the operations in the command stream for processing by the processing core array 230. one or more processing cores to execute.

In practice, each general-purpose processing core can be programmed to perform processing tasks associated with various programs, including, but not limited to, various operations in the graphics rendering pipeline (eg, vertex shader and/or pixel shader programs, Wait).

FIG. 3 shows an example of a graphics rendering pipeline implemented by the GPU 103.

It should be noted here that the graphics rendering pipeline is a logical function formed by cascading processing cores (eg, general-purpose processing cores and/or fixed-function processing cores) included in the processing core array. The scheduler 201, the graphics memory 204, the I/O unit 202, etc. included in the GPU 103 are peripheral circuits or devices that implement the logical functions of the rendering pipeline. For example, a graphics rendering pipeline usually includes a programmable module and a fixed function module, the programmable module is executed by a general-purpose processing core, and the fixed function module is implemented by a corresponding fixed function processing core.

As shown in FIG. 3, for example, the rendering pipeline of the GPU 103 includes an input assembler (IA), a vertex shader (VS), a primitive assembler (PA), a rasterizer, an early depth test unit (Early-Z test unit) ), pixel shader (Pixel Shader), post depth test unit (Late-Z test unit), output unit.

The above rendering pipelines are only examples, and are not limited to the above description. The rendering pipeline can also contain other units or modules. The logical order of the above-mentioned units or modules in the rendering pipeline is also not limited to the example in FIG. 3 , but may be changed as required.

Each of the above-mentioned units or modules may be implemented in a separately designed fixed function processor in the GPU 103, or may be implemented by executing a specific program in the processing core of the GPU 103. For example, a vertex shader may be implemented in a separately designed fixed-function processor in GPU 103, or may be implemented by executing shader programs in processing cores in GPU 103. Similarly, the input assembler can also be implemented in a separately designed fixed-function processor in the GPU 103, or by executing specific programs in the processing cores of the GPU 103.

In addition, as described in FIG. 3, there is a vertex buffer (Vertex Buffer, VB), which is used to receive vertex data from the system memory 101, and then transmit the vertex data to the input assembler. Generally speaking, vertex buffers are stored in graphics memory 204 on GPU 103. In the GPU 103, a cache memory (not shown) may also be provided between the graphics memory 204 and the processing core array 203. Vertex buffers may also be stored in caches or other storage areas accessible by processing cores (2031-2036). There is also a depth buffer (DB), which may be stored in, for example, graphics memory 204 shown in FIG. 2, or in a cache (not shown) or other storage area accessible by the processing cores.

During rendering pipeline execution, GPU 103 receives commands (eg, draw-call commands) and/or data (including rendering state, such as materials, textures, shader programs, etc.), etc., from, for example, CPU 102, as well as from external system memory 101 receives vertex data, performs primitive rendering according to commands (such as draw-call commands), and finally generates an output image on the display screen.

Specifically, the Input Assembler (IA) receives vertex data (vertex coordinates and indices) from the vertex buffer for assembly into geometric primitives (eg, triangles, lines, etc.).

Next, the vertex shader determines the attributes of the vertex (lighting, color, etc.) and provides the shaded vertex data to the primitive assembler. Primitive assemblers generate primitives through operations such as clipping, perspective segmentation, and viewport transformations. Next, a rasterizer is used to combine the primitives generated by the Primitive Assembler (PA) into on-screen pixels representing the corresponding primitives. Pixel shaders determine the color of individual pixels by executing pixel shader instructions.

When the GPU 103 invokes the Draw-Call command for rendering, due to the rendering order of the rendered objects, the later rendered objects may be occluded by the earlier rendered objects. In the graphics rendering pipeline, after the rasterizer, a depth test is performed on the pixel data to cull occluded primitives or pixels that no longer need to be rendered.

In the depth buffer, each pixel stores a corresponding depth value. During the depth test, the depth value of the pixel is compared with the depth value in the current depth buffer. If it is greater than or equal to the depth value in the depth buffer, the pixel is considered to be occluded, so the pixel is discarded; otherwise, the pixel will be discarded. The depth value corresponding to the pixel is written into the depth buffer to update the depth value in the depth buffer.

Generally speaking, a draw-call command is configured with a corresponding depth test mode: for example, early depth test (Early-Z test), late depth test (Late-Z test) or conservative depth test (Conservative-Z test). The early depth test (Early-Z test) and the later depth test (Late-Z test) are completed in the Early-Z test unit and the Late-Z test unit respectively. When the corresponding depth test is performed, the reference value is read from the depth buffer to perform the depth test, and if the test passes, the depth value of the passed pixel is written into the depth buffer to update the depth buffer. For conservative depth testing, it involves advance depth testing and late depth testing, which are completed in the Early-Z testing unit and the Late-Z testing unit as shown in Figure 3, respectively.

The above description of the functions of each unit or module in the rendering pipeline is only exemplary, and not restrictive.

The pixel processing method of the embodiments of the present application aims to solve the problem of incoherence of draw-calls in the prior art through an improved method, thereby improving the performance of a computer graphics processing system.

The pixel processing method proposed by the embodiments of the present application can be applied to, for example, a rendering process of a computer graphics processing system. In particular, the pixel processing methods of the embodiments of the present application can be applied, for example, in the pixel shading and depth testing stages after rasterization of the rendering pipeline, to improve the performance of draw-calls in the pixel processing stages (eg, including pixel shading and depth testing). Incoherent problem.

The pixel processing method provided by the embodiments of the present application will be described in detail below with reference to FIG. 4 and FIG. 5 .

FIG. 4 is a schematic diagram of a rendering pipeline in the GPU 103 implementing the pixel processing method according to the embodiment of the present application. Different from the rendering pipeline of Fig. 3, in Fig. 4, a control unit and a mark buffer are set, and the control unit can be a hardware unit set separately in the GPU 103, or a module implemented by executing a computer program in the processing core in the GPU 103. The marker buffer may be stored separately on graphics memory 204 of GPU 103.

As shown in FIG. 5 , in step S10 , after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain corresponding to the pixel area associated with the current draw-call is 0. If so, it indicates that pixel shading and depth testing for this current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline. The depth test mode here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test.

Next, in step S11, the control unit sets the value of the PipeNeedDrain mark of the associated pixel area according to the depth test mode of the current draw-call, to control whether the subsequent draw-call after the current draw-call can be used for the pixel. Regions perform pixel shading and depth testing. For example, the control unit writes the set mark for the pixel area associated with the current draw-call into the mark buffer, so as to control whether the subsequent draw-call after the current draw-call can perform pixel shading and depth for the pixel area buffer.

If the depth test mode of the current draw-call is the Early-Z test mode, keep the previous mark PipeNeedDrain as 0 for its associated pixel area, that is, indicating that the pixel area of the pixel area can continue to perform the pixel coloring of the subsequent draw-call and depth testing.

If the depth test mode of the current draw-call is conservative depth test or Late-Z test mode, set the flag PipeNeedDrain to 1 for its associated pixel area, that is, to indicate that for this pixel area, the subsequent draw after the current draw-call -call cannot perform pixel shading and depth testing until the flag PipeNeedDrain for that pixel area is cleared.

Next, in step S12, for the pixel area, pixel shading and depth testing of the current draw-call are performed.

Next, in step S13, after the pixel coloring and depth test of the current draw-call are completed, the control unit clears the PipeNeedDrain mark corresponding to the pixel area, that is, the pixel coloring and depth of the subsequent draw-call can continue to be performed for this pixel area. test.

Preferably, if the depth test mode of the subsequent draw-call is the Late-Z test, for the pixel area associated with the current draw-call, even if the flag PipeNeedDrain is 1, the subsequent draw-call can continue to perform pixel shading and depth testing, There is no need to wait for the mark PipeNeedDrain corresponding to the pixel area to be cleared.

The specific flow of the current draw-call depth test mode in the pixel processing method of FIG. 5 is the Early-Z test, the conservative depth test, and the Late-Z test, respectively, with reference to FIGS. 6-8 .

Figure 6 is directed to the case where the current draw-call depth test mode is the Early-Z test.

For example, as shown in step S110, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.

Next, the control unit resets the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call according to the depth test mode of the current draw-call. The depth test mode of the current draw-call is the Early-Z test, then, as shown in step S111, the value of PipeNeedDrain, the marker of the pixel area associated with the current draw-call, is kept as 0, that is, it indicates that for this pixel area, you can continue to execute Pixel shading and depth testing for subsequent draw-calls.

Next, as shown in step S112, the depth test of the current draw-call, that is, the Early-Z test is performed.

If the Early-Z test fails, cull the current draw-call primitive. If the Early-Z test of the current draw-call passes and the corresponding depth buffer is updated, as shown in step S113, the pixel shading of the current draw-call is performed.

After the pixel coloring of the current draw-call is completed, as shown in step S114, the control unit clears the value of the mark PipeNeedDrain of the pixel area to zero.

Figure 7 is for the case where the current draw-call depth test mode is conservative depth test.

In step S120, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.

Next, the control unit resets the value of PipeNeedDrain, a marker of the pixel area associated with the current draw-call, according to the depth test mode of the current draw-call, to control whether subsequent draw-calls after the current draw-call can respond to the current draw-call. The associated pixel region performs pixel shading and depth testing.

Specifically, as shown in step S121, according to the depth test mode of the current draw-call is the conservative-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed, but only pixel shading and depth testing of the current draw-call can be performed.

Then, as shown in step S122, the Early-Z test of the current draw-call is performed.

After the Early-Z test of the current draw-call is passed, as shown in step S123, the pixel coloring of the current draw-call is performed. Here, since the current draw-call depth test mode is a conservative depth test, the Late-Z test will be performed later, so here the depth buffer update will not be performed temporarily after the Early-Z test of the current draw-call passes, but the Late-Z test will be performed later. -Z test pass before depth buffer update.

After the current draw-call's pixel shading, perform the current draw-call's Late-Z test. After the Late-Z test is passed and the depth buffer is updated, as shown in step S124, the flag PipeNeedDrain of the associated pixel area is cleared to zero, that is, for this pixel area, pixel coloring and depth testing of subsequent draw-calls can be continued.

Likewise, preferably, if the depth test mode of the subsequent draw-call is the Late-Z test, then for the pixel area associated with the current draw-call, the subsequent draw-call can directly continue to perform pixel shading and Late-Z testing, while There is no need to wait for the PipeNeedDrain corresponding to the pixel area to be cleared.

Figure 8 is for the case where the current draw-call depth test mode is the Late-Z test.

In step S130, after the rasterization of the current draw-call, the control unit reads the marker buffer to determine whether the value of the marker PipeNeedDrain of the pixel region associated with the current draw-call is 0. If it is, it indicates that pixel shading and depth testing of the current draw-call can be performed, i.e. the current draw-call can go to the next stage of the rendering pipeline.

Specifically, as shown in step S131, according to the depth test mode of the current draw-call is the Late-Z mode, then, the value of the marker PipeNeedDrain of the pixel area associated with the current draw-call is reset to 1, that is, indicating that for this pixel region, pixel shading and depth testing of subsequent draw-calls after the current draw-call cannot be performed.

Next, as shown in step S132, the pixel rendering of the current draw-call is performed; after the pixel is rendered, its Late-Z test is performed. After the Late-Z test is passed and the corresponding depth buffer update is performed, as shown in step S133, the control unit clears the mark PipeNeedDrain of the corresponding pixel area to zero.

A specific application scenario of the embodiment of the present invention is described in detail below with reference to FIG. 9 .

As shown in FIG. 9, the screen is divided into 16 pixel areas t0-t15. In the current rendering scene, six draw-call commands need to be executed, which are d1 to d6 in the rendering order. The depth test mode of d3 and d4 is late-Z mode, and the rest of d1-d2 and d5-d6 are Early-Z mode.

Here, the division of the pixel area is exemplary, and the screen may be divided into more pixel areas or less pixel areas.

Initially, in the marker buffer, a marker is set for each pixel area, for example, the initial value of PipeNeedDrain is 0, that is, each pixel area can perform draw-call pixel coloring and depth testing. For example, as shown in the schematic diagrams (0-a) and (0-b), initially, the values of the labels PipeNeedDrain corresponding to all pixel regions (t0-t15) are the initial value 0. The value of the mark PipeNeedDrain corresponding to each pixel area is stored in, for example, a mark buffer, for example, a storage area may be allocated to the mark buffer in a memory or a cache.

The depth test of draw-call mentioned here includes, but is not limited to, Early-Z test, conservative depth test or Late-Z test. It depends on the depth test mode of the specific draw-call.

After rasterization of the current draw-call, determine the value of the marker PipeNeedDrain for the pixel region associated with this current draw-call. If the value is 0, it indicates that pixel shading and depth testing for the current draw-call can be performed; if the value is 1, it indicates that its pixel shading and depth testing are disabled. The pixel shading and depth test of the current draw-call for this pixel area can only be executed when the value of the PipeNeedDrain corresponding to the pixel area is 0.

As shown in schematic diagrams (1-a) and (1-b), after the rasterization of the current draw-call (such as d1), it is determined that the value of the marker PipeNeedDrain of the pixel area t0 associated with the current draw-call d1 is the initial value of 0 , the pixel shading and depth testing of d1 can be performed normally. Since the depth test mode of d1 is Early-Z test, it has no effect on the depth test of subsequent draw-calls. Therefore, the control unit keeps the value of PipeNeedDrain of the corresponding pixel area t0 as 0, indicating that for this pixel area t0, you can continue Perform pixel shading and depth testing for subsequent draw-calls. For the next draw-call d2 is also Early-Z mode, so the same is true and will not be repeated.

As shown in the schematic diagrams (2-a) and (2-b), after the rasterization of draw-call d3, the control unit reads the depth buffer to determine that the mark PipeNeedDrain of its corresponding pixel area t9 is 0, then it can be executed normally Pixel shading and depth testing for d3. At this time, since its Late-Z test mode will affect the depth test of subsequent draw-calls, the PipeNeedDrain flag of the pixel area t9 corresponding to d3 is set to 1. At this point, the value 1 of the flag PipeNeedDrain indicates that, for this pixel region t9, at least pixel shading and depth testing of subsequent draw-calls following draw-call d3 are disabled. After the pixel shading and late-Z testing of d3 are completed, the pixel shading and depth testing of the subsequent draw-call for the pixel region t9 can be performed after the PipeNeedDrain flag of the pixel region t9 is cleared to zero.

Similarly, after the rasterization of draw-call d4, determine the PipeNeedDrain of the associated pixel area t12~t15. Since the PipeNeedDrain flags of pixel regions t12 to t15 are all 0, the pixel shading and depth test of d4 can be performed normally, that is, draw-call d4 enters the next level of its rendering pipeline. Since its Late-Z test mode will affect the depth test of subsequent draw-calls, at this time, set the PipeNeedDrain flag of the pixel area t12~t15 associated with the current d4 to 1, indicating that for this pixel area t9, draw-call d4 At least pixel shading and depth testing for subsequent draw-calls after that are disabled, and only pixel shading and depth testing for the current draw-call d4 can be performed. After the pixel coloring of the current d4 is completed and the late-Z operation is performed, the flags of t12 to t15 are cleared.

Next, as shown in schematic diagrams (3-a) and (3-b), the object of draw-call d5 is to cover the pixel areas t6, t7, t12 and t13. After the rasterization of d5, the marking PipeNeedDrain of each pixel area t6, t7, t12 and t13 corresponding to it is determined. For example, if the flags of the t6 and t7 regions are 0, the pixel coloring and depth test of the d5 of the t6 and t7 regions can be directly performed. For the t12 and t13 areas, since the PipeNeedDrain of the associated pixel areas t12 to t15 is set to 1 when d4 is executed, for the Early-Z test and pixel coloring of d5 in the t12 and t13 areas, you need to wait until the two areas are It can only be executed after the flag PipeNeedDrain is cleared. For example, after the pixel shading and late-Z operation of d4 are completed, and after the flags of t12-t15 are cleared to 0, the Early-Z test and pixel shading of d5 in the t12 and t13 regions can be performed.

As shown in the schematic diagrams (4-a) and (4-b), the object of the current draw-call d6 should cover t8, t10, and t11, and the corresponding markers PipeNeedDrain are all 0, so the current draw-call d6 can be executed directly. Early-Z testing and pixel shading.

In the prior art, as in the scene shown in Figure 4, due to the late-Z mode of the depth test mode of d3 and d4, the depth test of all subsequent draw-calls after d3 and d4 of the current frame needs to wait until the pixel shading and summation of d3 and d4. Execution can only be started after all in-depth tests have been completed. Through the technical solutions of the above-mentioned embodiments of the present application, only the pixel areas affected by d3 and d4 are set with flags to indicate that the pixel coloring and depth testing of subsequent draw-calls are disabled until the flags of the pixel areas affected by d3 and d4 are cleared, The remaining pixel areas that are not affected by d3 and d4 can continue to perform pixel shading and depth testing of subsequent draw-calls, further improving the consistency and performance of draw-call execution.

Based on the method proposed in the embodiment of the present application, in the case that the current draw-call of a specific depth test mode affects the depth test of the subsequent draw-call, the affected pixel area is marked by setting a mark, and the pixel coloring and depth testing Disabling is constrained to the affected pixel area, while for the remaining pixel areas that are not disabled, subsequent draw-calls can continue to perform pixel shading and depth testing. Compared with the situation in the prior art that the pixel shading and depth testing of subsequent draw-calls can only be performed after the pixel shading and depth testing of the current draw-call of the specific depth test mode are completed, the draw-call is improved. The coherence and parallelism of command execution minimize the impact on the parallel computing capability of the GPU.

The methods in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer program codes or computer program instructions, which may be stored on a memory. When the computer program instructions are loaded and executed on a computer or processor, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.

The computer program code or computer program instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program code or computer program instructions may be stored from a computer-readable storage medium. Transmission from one website site, computer, server or data center to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, etc.) or wireless (eg infrared, radio, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; an optical medium, such as a DVD; or a semiconductor medium, such as a solid state disk (Solid State Disk, SSD), and the like.

The methods of the present application may include various other operations and/or variations of the operations shown. Likewise, the sequence of operations of the flowchart may be modified. It should be understood that not all operations in the flowcharts may be performed. In various embodiments, one or more operations of the method may be controlled or managed by software, firmware, hardware, or any combination thereof, without limitation. A method may include the processes of an embodiment of the present disclosure, which may be controlled or managed by a processor and/or electronic components under the control of computer or computing device readable and executable instructions (or code).

The graphics processing system 120 implementing the pixel processing method according to any of the above embodiments of the present application will be described in detail below with reference to FIG. 10 .

As shown in FIG. 10 , the graphics processing system 120 includes: a determination unit 121 , a pixel processing unit 122 and a setting unit 123 .

Specifically, when implementing the method of an embodiment of the present application, the setting unit 123 is configured to: after the rasterization of the current draw-call (draw-call), according to the depth test mode of the current draw-call, set The value of the flag of the pixel region associated with the current draw-call, indicating whether pixel shading and depth testing for subsequent draw-calls following the current draw-call are disabled for the associated pixel region, and in the current draw-call After the pixel shading and depth testing of the draw-call are completed, the value of the flag of the associated pixel area is cleared; the pixel processing unit 123 is configured to perform the pixel shading and depth testing of the current draw-call.

Further, when the method of this embodiment is implemented, the depth test mode of the current draw-call is the advance depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is not disabled for the associated pixel area; the pixel processing unit 122 is configured to: perform an early depth test for the current draw-call, and after the early depth test passes and a corresponding depth buffer is updated, perform pixel shading for the current draw-call.

Further, when the method of this embodiment is implemented, the depth test mode of the current draw-call is conservative depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the marker of the pixel area associated with the current draw-call, indicating that the pixel shading and depth testing of the subsequent draw-call after the current draw-call is disabled for the associated pixel area; the pixel processing unit is further configured It is used for: executing the advance depth test of the current draw-call, after passing the advance depth test, executing the pixel coloring of the current draw-call, after the pixel coloring of the current draw-call, executing the Describe the post depth test of the current draw-call.

Further, when the method of this embodiment is implemented, the depth test of the current draw-call is a later depth test, then the setting unit 123 is configured to: after the rasterization of the current draw-call, set the The value of the mark of the pixel area associated with the current draw-call indicates that the pixel coloring and the depth test of the subsequent draw-call after the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to use To: Execute pixel shading and post-depth testing of the current draw-call.

Further, when implementing the method of this embodiment, wherein, the current draw-call is associated with a plurality of pixel regions, and when executing the pixel shading and depth test of the current draw-call, according to the depth test mode of the current draw-call, Sets the value of this flag for each pixel region separately, indicating whether pixel shading and depth testing for subsequent draw-calls are disabled for that pixel region; and after pixel shading and depth testing for the current draw-call for each pixel region, respectively, is complete , the value of the marker of the pixel area is cleared.

The above are only specific implementations of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A pixel processing method, comprising:

After rasterization of the current draw-call, according to the depth test mode of the current draw-call, the value of the flag of the pixel area associated with the current draw-call is set, indicating that for the associated draw-call Whether the pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call;

performing pixel shading and depth testing of the current draw-call; and

After the pixel shading and depth testing of the current draw-call is completed, the value of the flag of the associated pixel region is cleared.
The pixel processing method of claim 1, wherein the depth test mode is an advance depth test, the method further comprising: setting pixels associated with the current draw-call after rasterization of the current draw-call The value of the flag of the region, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call are not disabled for the associated pixel region;

Wherein, executing the pixel shading and depth test of the current draw-call includes: executing the advance depth test of the current draw-call, and after the advance depth test is passed and the corresponding depth buffer is updated, executing the current draw-call Pixel shading for call.
The pixel processing method of claim 1, wherein the depth test mode is a conservative depth test, the method further comprising: after rasterization of the current draw-call, setting a value associated with the current draw-call The value of the marker of the pixel area, indicating that pixel shading and depth testing of subsequent draw-calls after the current draw-call is disabled for the associated pixel area;

Wherein, executing the pixel shading and depth test of the current draw-call includes: executing the advance depth test of the current draw-call, after passing the advance depth test, executing the pixel shading of the current draw-call, in After the pixel shading of the current draw-call, a later depth test of the current draw-call is performed.
The pixel processing method of claim 1, wherein the depth test is a post-depth test, the method further comprising: after rasterization of the current draw-call, setting pixels associated with the current draw-call The value of the flag of the region indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for this associated pixel region;

Wherein, performing pixel shading and depth testing of the current draw-call includes: performing pixel shading and later depth testing of the current draw-call.
A graphics processor comprising a memory for storing instructions, and a processing unit configured to perform the method of any of claims 1-4 when executing the instructions.
A computer-readable storage medium, in which program codes are stored, and when the program codes are executed by a computer or a processor, the method according to any one of claims 1-4 is implemented.
A computer program product, the program code contained in the computer program product, when executed by a computer or a processor, implements the method according to any one of claims 1-4.
A graphics processing system including:

a setting unit configured to, after rasterization of the current draw-call, according to the depth test mode of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call Whether the associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call, and after the completion of the current draw-call pixel shading and depth testing, the associated pixel area The marked value is cleared;

A pixel processing unit configured to perform pixel shading and depth testing of the current draw-call.
9. The graphics processing system of claim 8, wherein the depth test mode is an advance depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the same as the current draw-call A value of a flag of a call associated pixel region indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are not disabled for the associated pixel region; the pixel processing unit is configured to: execute For the advance depth test of the current draw-call, after the advance depth test is passed and the corresponding depth buffer is updated, pixel shading of the current draw-call is performed.
9. The graphics processing system of claim 8, wherein the depth test mode is a conservative depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the same as the current draw-call The value of the flag of a call associated pixel area indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to: execute In the advance depth test of the current draw-call, after the advance depth test is passed, the pixel shading of the current draw-call is performed, and after the pixel shading of the current draw-call, the current draw-call is performed. post-depth testing.
8. The graphics processing system of claim 8, wherein the depth test is a post-depth test, and the setting unit is further configured to: after rasterization of the current draw-call, set the current draw-call with the the value of the flag of the associated pixel area, indicating that pixel shading and depth testing of subsequent draw-calls following the current draw-call are disabled for the associated pixel area; the pixel processing unit is further configured to: execute the Pixel shading and post depth testing for the current draw-call.
A graphics processing system including:

A control unit configured to: after the rasterization of the current draw-call, set the flag of the pixel area associated with the current draw-call according to the depth test mode of the current draw-call value indicating whether to disable pixel shading and depth testing of subsequent draw-calls after the current draw-call for the associated pixel region; after the completion of the and depth testing for the pixel region, this mark of the pixel region The value of is cleared;

A pixel shader configured to: perform pixel shading for the current draw-call;

a depth test unit configured to: execute a depth test of the current draw-call; and

A tag buffer configured to: store a tag value of the associated pixel region.
The graphics processing system of claim 12, wherein,

The depth test mode is a depth test in advance,

The control unit is further configured to: after rasterization of the current draw-call, set a value of a flag of a pixel region associated with the current draw-call indicating that all pixel regions are not disabled for the associated pixel region. Describe the pixel shading and depth testing of subsequent draw-calls after the current draw-call;

The depth testing unit includes an advanced depth testing unit configured to: perform an advanced depth testing of the current draw-call;

The pixel shader is configured to perform pixel shading of the current draw-call after the early depth test passes and a corresponding depth buffer is updated.
The graphics processing system of claim 12, wherein,

The depth test mode is a conservative depth test,

The control unit is further configured to: after rasterization of the current draw-call, set a value of a flag of a pixel region associated with the current draw-call indicating that the associated pixel region is disabled for the Pixel shading and depth testing of subsequent draw-calls after the current draw-call;

The depth test unit includes an advance depth test unit and a later depth test unit, and the advance depth test unit is configured to: execute the advance depth test of the current draw-call;

The pixel shader is configured to: perform pixel shading of the current draw-call after the advance depth test is passed;

The post depth testing unit is configured to perform post depth testing of the current draw-call after pixel shading of the current draw-call.
The graphics processing system of claim 12, wherein,

The depth test is a later depth test, the control unit is further configured to: after rasterization of the current draw-call, set the value of the flag of the pixel area associated with the current draw-call, indicating that for the current draw-call The associated pixel area disables pixel shading and depth testing of subsequent draw-calls after the current draw-call; the depth testing unit includes a later depth testing unit, and the pixel shader is configured to: execute the current draw Pixel shading for -call; the post depth test unit is configured to perform post depth tests.