CN115775295A - Apparatus and method for tile-based deferred rendering - Google Patents

Apparatus and method for tile-based deferred rendering Download PDF

Info

Publication number
CN115775295A
CN115775295A CN202310032126.6A CN202310032126A CN115775295A CN 115775295 A CN115775295 A CN 115775295A CN 202310032126 A CN202310032126 A CN 202310032126A CN 115775295 A CN115775295 A CN 115775295A
Authority
CN
China
Prior art keywords
tile
rendering
visibility information
scheduling
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310032126.6A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202310032126.6A priority Critical patent/CN115775295A/en
Publication of CN115775295A publication Critical patent/CN115775295A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Generation (AREA)

Abstract

The present disclosure provides an apparatus for tile-based deferred rendering, comprising: a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering phase. By the method and the device, the processing period, the memory space and the power of the GPU system are saved, and better GPU utilization rate is realized by load balancing scheduling.

Description

Apparatus and method for tile-based deferred rendering
Technical Field
The present disclosure relates to apparatus, methods, devices, and computer-readable media for tile-based delayed rendering (TBDR), and more particularly to a multi-stage rendering architecture with dead primitive (primary) removal and load balancing for TBDR architecture.
Background
TBDR is a popular modern Graphics Processing Unit (GPU) architecture due to its advantages in power and efficiency. The TBDR mode divides the screen into multiple tiles (tiles). When rendering a geometric object, each primitive will fall into one of the screen tiles, forming a list of primitives for each tile. Later in the pixel shading phase, the rasterization unit extracts the list of primitives to generate pixels. After an optional Hidden Surface Removal (HSR) unit, only visible pixels are delivered to the shading unit for texturing and shading. Thus, the texturing and shading process is delayed until primitive visibility is known, ensuring as low bandwidth usage and lowest processing cycles per frame as possible compared to non-delayed tile-based rendering.
However, in the TBDR mode, the geometry pipeline processes all the original primitives and writes them to the corresponding primitive list according to tile position. The whole geometry process has to handle the dead primitives that will eventually be rejected by the HSR unit. These dead primitives waste a significant amount of computational resources/cycles on geometric operations, such as vertex transformation, attribute interpolation, clipping, and primitive assembly. Furthermore, dead primitives will occupy device memory space to form a list of primitives, which may trigger a memory starvation problem to stop or restart the geometry process.
On the other hand, when multiple processors are enabled, load balancing is a very important research topic for achieving optimal rendering latency. Typically, a central scheduling unit is employed in order to pre-process vertex buffers and assign primitives to each processor for load balancing. In most cases, the central scheduling unit will eventually become a bottleneck for the entire pipeline. Furthermore, if geometric magnification like tessellation is enabled, it is not possible for the scheduling unit to predict how many sub-primitives will be generated from the geometric data of the original primitive, which makes load balancing a difficult problem to solve.
Disclosure of Invention
The present disclosure provides a new architecture for rendering using a TBDR mode, which can avoid the waste of geometric processing resources on dead primitives, and at the same time, it can save unnecessary memory usage for dead primitives. Furthermore, load balancing may be implemented for a multi-processor GPU system.
According to a first aspect of the present disclosure, there is provided an apparatus for tile-based deferred rendering, the apparatus comprising: a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering phase.
According to a second aspect of the present disclosure, there is provided a method for tile-based deferred rendering, the method comprising: generating primitive visibility information for each tile in a stage that is location-only dependent; and performing scheduling for a plurality of processor cores based on the primitive visibility information prior to a rendering phase.
According to a third aspect of the present disclosure, there is provided an apparatus for tile-based deferred rendering, the apparatus comprising: a processor; and a memory communicatively connected to the processor and adapted to store instructions that, when executed by the processor, cause the apparatus to perform the operations of the method according to the second aspect described above.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon instructions that, when executed, cause a processor of an apparatus for tile-based deferred rendering to perform the method according to the above second aspect.
By the method and the device, the waste of computing resources aiming at dead pixels caused during a geometric processing stage is eliminated, so that the processing period, the memory space and the power of a GPU system are saved; in addition, better GPU utilization is achieved by utilizing load balancing scheduling, and both geometric workload and pixel shading workload can be evenly distributed to the multiprocessor system.
Drawings
Exemplary embodiments of the present disclosure will now be described with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The terminology used in the detailed description of the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the disclosure. In the drawings, like numerals refer to like parts throughout.
FIG. 1 shows a schematic block diagram of dead primitive removal in TBDR mode, in accordance with an embodiment of the present disclosure.
Fig. 2 shows a schematic block diagram of load balancing with live primitive information in a TBDR mode according to an embodiment of the present disclosure.
Fig. 3 shows a block diagram of an apparatus for TBDR according to an embodiment of the present disclosure.
Fig. 4 shows a flow chart of a method for TBDR in accordance with an embodiment of the disclosure.
Fig. 5 shows a block diagram of an apparatus for TBDR according to an embodiment of the present disclosure.
Detailed Description
The apparatus, methods, and devices for TBDR are described below. In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the various embodiments of the disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and is intended to be illustrative. The scope of the disclosure is defined by the appended claims and equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein should be construed to have meanings consistent with their meanings in the context of this specification and the relevant art, unless specifically defined herein.
The present disclosure is described below with reference to block diagrams and/or flowchart illustrations of methods, apparatus, and/or computer program products according to embodiments of the disclosure. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computing device, special purpose computing device, and/or other programmable data processing apparatus, such that the instructions, which execute via the processor of the computing device and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the present disclosure may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present disclosure may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device: which can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
An electronic device uses a machine-readable medium (also referred to as a computer-readable medium), such as a machine-readable storage medium (e.g., a magnetic disk, an optical disk, a read-only memory (ROM), a flash memory, a phase change memory, etc.) and a machine-readable transmission medium (also referred to as a carrier) (e.g., an electrical, optical, radio frequency, acoustic, or other form of propagated signal — such as carrier waves, infrared signals, etc.), to store and transmit code (including software instructions and which may be referred to as computer program code or a computer program) and/or data (internally and/or over a network with other electronic devices). Accordingly, an electronic device (e.g., a computer) includes hardware and software, such as one or more processors, coupled to one or more machine-readable storage media to store code for execution by the one or more processors and/or to store data. For example, an electronic device may include non-volatile memory that may maintain code/data when the electronic device is turned off, and portions of the code to be executed by a processor are typically copied from the slower non-volatile memory to volatile memory (e.g., dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), etc.) of the electronic device when the electronic device is turned on. Typically, the electronic device also includes a set of physical network interfaces to network with other electronic devices (to transmit/receive code and/or data using a propagated signal). One or more portions of the present disclosure may be implemented using different combinations of software, firmware, and/or hardware.
Fig. 1 schematically illustrates a schematic block diagram of non-visible primitive removal in a TBDR mode according to an embodiment of the present disclosure.
As shown in FIG. 1, the location-only phase of pre-processing is used to collect primitive visibility information for each TBDR tile, while also accumulating the number of live primitives for each tile. In the following, live primitives are referred to as visible primitives, while correspondingly dead primitives are referred to as invisible primitives.
The location-only related stages have geometry shading, clipping, projection, culling, and rasterization stages that include vertex shader processing, tessellation, geometry shader processing, and so on. After tiling (i.e., concatenating an element to one screen tile), the rasterization and depth test unit is activated because the location of the rasterized pixel can be used to determine whether the primitive falls into the tile and whether the primitive is completely blocked by another primitive, and thus the primitive can be discarded in a later rendering stage. Finally, primitive visibility information for each tile is generated and output by the visibility engine. The primitive visibility information may include the number of visible primitives per tile.
During the entire phase, only vertex positions are fetched and used to save bandwidth and computational resources, the primitive visibility information for each tile is the output of the phase, pixel shading is skipped altogether, and no other information is generated at the end of the phase.
In a subsequent rendering stage, primitive visibility information generated in the position-only dependent stage will be extracted to align with the geometry data. As can be seen in fig. 1, only the visible primitives are passed to the pipeline for clipping and projection processing. The culling operation is skipped in this stage because invisible primitives have been marked and discarded as described above.
The position-only dependent and rendering stages described above generate primitive visibility information for each tile, skipping the invisible primitives, and are not costly. In this pre-processing stage, primitive information per tile, such as the number of visible primitives, is accumulated. By utilizing the process, unnecessary resource consumption of the invisible primitives, such as assembling, projecting, clipping and memory space of the invisible primitives, can be saved, so that the geometric overhead and memory occupation of processing the invisible primitives can be avoided.
FIG. 2 schematically illustrates a schematic block diagram of load balancing with primitive visibility information in a TBDR mode in accordance with an embodiment of the present disclosure.
In the case of a GPU system having multiple processors, the central scheduling unit may read visible primitive information generated by the stage associated with the location-only. This information may contain the number of visible primitives per screen tile, which the GPU scheduler may process through a dynamic scheduling algorithm to output the best tile group option for scheduling.
As shown in FIG. 2, the operations of GPU instance 0, GPU instance 1, … …, and GPU instance N, respectively, correspond to the operations of the rendering stage shown in FIG. 1, and the GPU scheduler may receive the number of visible primitives in each tile and then schedule them according to the number of visible primitives. For example, in one example, the system has 2 processor cores, the screen is divided into 4 tiles — tile 0 through tile 3, with the number of visible primitives for each tile being 100, 200, 300, and 400, respectively, then the GPU scheduler may assign tile 0 and tile 3 to processor core 0 and tile 1 and tile 2 to processor core 1. Thus, each processor core will process 500 primitives.
Scheduling is performed at a tile-based granularity, i.e., the GPU scheduler assigns each GPU instance in tile units (e.g., one or more tiles), which is much more efficient than complex vertex buffer pre-processing.
The GPU scheduler may efficiently use the visible primitive information per tile to schedule geometric workloads to multiple processors for load balancing. Due to the location-only dependent phase, the GPU scheduler can have final sub-primitive visibility even if geometry enlargement (tessellation or geometry shading) is enabled, to achieve better load-balancing scheduling, avoiding vertex buffer scanning.
Thus, in a multi-processor GPU system, the process shown in FIG. 2 better utilizes GPU computing resources and achieves higher geometry processing performance.
Fig. 3 schematically shows a block diagram of an apparatus 300 for TBDR according to an embodiment of the present disclosure.
Referring to fig. 3, an apparatus 300 for TBDR may include at least a visibility engine 301 and a scheduler 302. In one example, the visibility engine 301 can be a visibility engine, as shown in fig. 1, configured to generate primitive visibility information for each tile in stages that are location-only dependent. In one example, scheduler 302 may be a GPU scheduler, as shown in fig. 2, configured to perform scheduling for multiple processor cores based on primitive visibility information from visibility engine 301 prior to the rendering phase.
As an example, the visibility engine 301 may be further configured to accumulate a number of visible primitives per tile and include the number in primitive visibility information. The scheduling by the scheduler 302 may be performed based on the number included in the primitive visibility information.
As a further example, scheduler 302 may be further configured to evenly distribute the total number of visible primitives across the plurality of processor cores.
As an example, scheduling by scheduler 302 may be performed at a tile-based granularity.
As an example, the scheduler 302 may extract primitive visibility information for individual tiles to align it with geometry data so that only visible primitives enter the rendering stage.
As an example, the rendering stage may include cropping and projection operations, but not culling operations, since the invisible primitives have been previously discarded.
Some components are illustrated in fig. 3 as separate units. However, this merely indicates that the functions are separated. These units may be provided as separate elements. However, other arrangements are possible, for example, some of them may be combined into one unit. Any combination of elements may be implemented in any combination of software, hardware, and/or firmware in any suitable location. For example, there may be more controllers configured separately, or only one controller for all components.
The components shown in fig. 3 may constitute machine-executable instructions embodied in, for example, a machine-readable medium, which when executed by a machine, will cause the machine to perform the operations described. Further, any of these units may be implemented as hardware, such as an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 4 schematically illustrates a flow chart of a method 400 for TBDR in accordance with an embodiment of the present disclosure.
In one example, at block 401, primitive visibility information for each tile is generated in a stage that is location-only dependent. At block 402, scheduling for a plurality of processor cores is performed based on primitive visibility information prior to a rendering phase.
As an example, the generating of primitive visibility information may further include accumulating the number of visible primitives per tile and including the number in the primitive visibility information. The scheduling may be performed based on the number.
As a further example, scheduling may be performed by evenly distributing the total number of visible primitives across the plurality of processor cores.
As an example, scheduling may be performed at a tile-based granularity.
As an example, primitive visibility information may be extracted to align with the geometry data such that only visible primitives enter the rendering stage.
As an example, the rendering stage may include cropping and projection operations, but no culling operations.
Fig. 5 schematically shows a block diagram of an apparatus 500 for TBDR according to an embodiment of the present disclosure.
Referring to fig. 5, an apparatus 500 for TBDR may include at least a processor 501, memory 502, an interface 503, and a communication medium 504. The processor 501, memory 502, and interface 503 may be communicatively coupled to each other via a communication medium 504.
Processor 501 may include one or more processing units. The processing unit may be a physical device or an article of manufacture that includes one or more integrated circuits that read data and instructions from a computer-readable medium, such as the memory 502, and selectively execute the instructions. In various embodiments, the processor 501 may be implemented in various ways. As an example, processor 501 may be implemented as one or more processing cores. As another example, processor 501 may include one or more separate microprocessors. In yet another example, processor 501 may comprise an Application Specific Integrated Circuit (ASIC) that provides specific functionality. In yet another example, the processor 501 may provide specific functionality through the use of an ASIC and/or through the execution of computer-executable instructions.
Memory 502 may include one or more computer-usable or computer-readable storage media capable of storing data and/or computer-executable instructions. It should be understood that the storage medium may preferably be a non-transitory storage medium.
Interface 503 may be a device or article of manufacture that enables device 500 for TBDR to send data to and receive data from an external device.
A communication medium 504 may facilitate communication between the processor 501, the memory 502, and the interface 503. The communication medium 504 may be implemented in various ways. For example, the communication medium 504 may include a Peripheral Component Interconnect (PCI) bus, a PCI Express bus, an Accelerated Graphics Port (AGP) bus, a Serial Advanced Technology Attachment (ATA) interconnect, a parallel ATA interconnect, a fibre channel interconnect, a USB bus, a Small Computing System Interface (SCSI) interface, or other type of communication medium.
In the example of fig. 5, the instructions stored in memory 502 may include instructions that when executed by processor 501 cause apparatus 500 for TBDR to implement the method described with respect to fig. 4.
Embodiments of the present disclosure may be an article of manufacture in which a non-transitory machine-readable medium, such as a microelectronic memory, has stored thereon instructions (e.g., computer code) that program one or more signal processing components, referred to herein generally as "processors," to perform the operations described above. In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks and state machines). Alternatively, these operations may be performed by any combination of programmed signal processing components and fixed, hardwired circuit components.
It is appreciated that certain features of the application, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the application which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other embodiment of the application. Certain features described in the context of various embodiments should not be considered essential features of those embodiments unless the embodiments are not effective in the absence of those elements.
In the foregoing detailed description, embodiments of the disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made to the embodiments of the disclosure without departing from the spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Throughout the specification, some embodiments of the present disclosure have been presented through flow diagrams. It should be understood that the order of the operations described in these flowcharts is for illustration purposes only and is not intended as a limitation on the present disclosure. Those skilled in the art will recognize that variations of the flow diagrams may be made without departing from the spirit and scope of the disclosure as set forth in the following claims.

Claims (14)

1. An apparatus for tile-based deferred rendering, the apparatus comprising:
a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and
a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering stage.
2. The apparatus of claim 1, wherein the visibility engine is further configured to accumulate a number of visible primitives per tile and include the number in the primitive visibility information, and wherein the scheduling is performed based on the number.
3. The apparatus of claim 2, wherein the scheduler is further configured to evenly distribute the total number of visible primitives across the plurality of processor cores.
4. The apparatus of any of claims 1-3, wherein scheduling for the plurality of processor cores is performed at a tile-based granularity.
5. The apparatus of any of claims 1-3, wherein the primitive visibility information is extracted to align with geometry data such that only visible primitives enter the rendering stage.
6. The apparatus of any of claims 1 to 3, wherein the rendering stage includes cropping and projection, and does not include culling.
7. A method for tile-based deferred rendering, the method comprising:
generating primitive visibility information for each tile in a location-only dependent stage; and
scheduling for a plurality of processor cores is performed based on the primitive visibility information prior to a rendering phase.
8. The method of claim 7, wherein generating primitive visibility information for each tile further comprises accumulating a number of visible primitives for each tile and including the number in the primitive visibility information, and wherein performing scheduling for the plurality of processor cores further comprises performing scheduling based on the number.
9. The method of claim 8, wherein performing scheduling for the plurality of processor cores further comprises evenly distributing a total number of visible primitives across the plurality of processor cores.
10. The method of any of claims 7 to 9, wherein performing scheduling for the plurality of processor cores further comprises performing scheduling at a tile-based granularity.
11. The method of any of claims 7-9, wherein the primitive visibility information is extracted to align with geometry data such that only visible primitives enter the rendering stage.
12. The method of any of claims 7 to 9, wherein the rendering stage includes cropping and projecting, and does not include culling.
13. An apparatus for tile-based delayed rendering, the apparatus comprising:
a processor; and
a memory communicatively connected to the processor and adapted to store instructions that, when executed by the processor, cause the device to perform operations of the method of any of claims 7 to 12.
14. A computer readable medium having stored thereon instructions that, when executed, cause a processor of a device for tile-based delayed rendering to perform the method of any of claims 7 to 12.
CN202310032126.6A 2023-01-10 2023-01-10 Apparatus and method for tile-based deferred rendering Pending CN115775295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310032126.6A CN115775295A (en) 2023-01-10 2023-01-10 Apparatus and method for tile-based deferred rendering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310032126.6A CN115775295A (en) 2023-01-10 2023-01-10 Apparatus and method for tile-based deferred rendering

Publications (1)

Publication Number Publication Date
CN115775295A true CN115775295A (en) 2023-03-10

Family

ID=85393366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310032126.6A Pending CN115775295A (en) 2023-01-10 2023-01-10 Apparatus and method for tile-based deferred rendering

Country Status (1)

Country Link
CN (1) CN115775295A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820580A (en) * 2023-08-31 2023-09-29 摩尔线程智能科技(北京)有限责任公司 Instruction execution method, system and device, graphics processor and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198119A1 (en) * 2013-01-17 2014-07-17 Qualcomm Incorporated Rendering graphics data using visibility information
CN108305318A (en) * 2017-01-12 2018-07-20 想象技术有限公司 Graphics processing unit and the method for controlling rendering complexity using the instruction of the cost for the segment set for rendering space
CN108711133A (en) * 2017-04-01 2018-10-26 英特尔公司 The Immediate Mode based on segment of Z with early stage layering renders
US20190066354A1 (en) * 2017-08-31 2019-02-28 Hema C. Nalluri Apparatus and method for processing commands in tile-based renderers
CN110728616A (en) * 2018-06-29 2020-01-24 畅想科技有限公司 Tile allocation for processing cores within a graphics processing unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198119A1 (en) * 2013-01-17 2014-07-17 Qualcomm Incorporated Rendering graphics data using visibility information
CN108305318A (en) * 2017-01-12 2018-07-20 想象技术有限公司 Graphics processing unit and the method for controlling rendering complexity using the instruction of the cost for the segment set for rendering space
CN108711133A (en) * 2017-04-01 2018-10-26 英特尔公司 The Immediate Mode based on segment of Z with early stage layering renders
US20190066354A1 (en) * 2017-08-31 2019-02-28 Hema C. Nalluri Apparatus and method for processing commands in tile-based renderers
CN110728616A (en) * 2018-06-29 2020-01-24 畅想科技有限公司 Tile allocation for processing cores within a graphics processing unit

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820580A (en) * 2023-08-31 2023-09-29 摩尔线程智能科技(北京)有限责任公司 Instruction execution method, system and device, graphics processor and electronic equipment
CN116820580B (en) * 2023-08-31 2023-11-10 摩尔线程智能科技(北京)有限责任公司 Instruction execution method, system and device, graphics processor and electronic equipment

Similar Documents

Publication Publication Date Title
EP3471059B1 (en) Geometry to tiling arbiter for tile-based rendering system
US9483270B2 (en) Distributed tiled caching
KR101134241B1 (en) Fragment shader bypass in a graphics processing unit, and apparatus and method thereof
US20070091088A1 (en) System and method for managing the computation of graphics shading operations
US20170148203A1 (en) Multi-pass rendering in a screen space pipeline
US10332310B2 (en) Distributed index fetch, primitive assembly, and primitive batching
US10922868B2 (en) Split frame rendering
US10430989B2 (en) Multi-pass rendering in a screen space pipeline
US10796483B2 (en) Identifying primitives in input index stream
CN110728616A (en) Tile allocation for processing cores within a graphics processing unit
CN115775295A (en) Apparatus and method for tile-based deferred rendering
TW201435581A (en) Triggering performance event capture via pipelined state bundles
US9123153B2 (en) Scalable multi-primitive system
US10832465B2 (en) Use of workgroups in pixel shader
US20230377086A1 (en) Pipeline delay elimination with parallel two level primitive batch binning
CN117252751B (en) Geometric processing method, device, equipment and storage medium
US20240169649A1 (en) Graphics processors
US20240070961A1 (en) Vertex index routing for two level primitive batch binning
US20240169641A1 (en) Vertex index routing through culling shader for two level primitive batch binning
TW202240528A (en) Scalable primitive rate architecture for geometry processing
GB2603210A (en) Graphics Processors
WO2022203833A1 (en) Synchronization free cross pass binning through subpass interleaving
GB2440689A (en) Allocation of data storage in a multithreaded data storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination