CN115775295A - Apparatus and method for tile-based deferred rendering - Google Patents
Apparatus and method for tile-based deferred rendering Download PDFInfo
- Publication number
- CN115775295A CN115775295A CN202310032126.6A CN202310032126A CN115775295A CN 115775295 A CN115775295 A CN 115775295A CN 202310032126 A CN202310032126 A CN 202310032126A CN 115775295 A CN115775295 A CN 115775295A
- Authority
- CN
- China
- Prior art keywords
- tile
- rendering
- visibility information
- scheduling
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009877 rendering Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000001419 dependent effect Effects 0.000 claims description 7
- 230000003111 delayed effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 18
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 239000000872 buffer Substances 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Landscapes
- Image Generation (AREA)
Abstract
The present disclosure provides an apparatus for tile-based deferred rendering, comprising: a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering phase. By the method and the device, the processing period, the memory space and the power of the GPU system are saved, and better GPU utilization rate is realized by load balancing scheduling.
Description
Technical Field
The present disclosure relates to apparatus, methods, devices, and computer-readable media for tile-based delayed rendering (TBDR), and more particularly to a multi-stage rendering architecture with dead primitive (primary) removal and load balancing for TBDR architecture.
Background
TBDR is a popular modern Graphics Processing Unit (GPU) architecture due to its advantages in power and efficiency. The TBDR mode divides the screen into multiple tiles (tiles). When rendering a geometric object, each primitive will fall into one of the screen tiles, forming a list of primitives for each tile. Later in the pixel shading phase, the rasterization unit extracts the list of primitives to generate pixels. After an optional Hidden Surface Removal (HSR) unit, only visible pixels are delivered to the shading unit for texturing and shading. Thus, the texturing and shading process is delayed until primitive visibility is known, ensuring as low bandwidth usage and lowest processing cycles per frame as possible compared to non-delayed tile-based rendering.
However, in the TBDR mode, the geometry pipeline processes all the original primitives and writes them to the corresponding primitive list according to tile position. The whole geometry process has to handle the dead primitives that will eventually be rejected by the HSR unit. These dead primitives waste a significant amount of computational resources/cycles on geometric operations, such as vertex transformation, attribute interpolation, clipping, and primitive assembly. Furthermore, dead primitives will occupy device memory space to form a list of primitives, which may trigger a memory starvation problem to stop or restart the geometry process.
On the other hand, when multiple processors are enabled, load balancing is a very important research topic for achieving optimal rendering latency. Typically, a central scheduling unit is employed in order to pre-process vertex buffers and assign primitives to each processor for load balancing. In most cases, the central scheduling unit will eventually become a bottleneck for the entire pipeline. Furthermore, if geometric magnification like tessellation is enabled, it is not possible for the scheduling unit to predict how many sub-primitives will be generated from the geometric data of the original primitive, which makes load balancing a difficult problem to solve.
Disclosure of Invention
The present disclosure provides a new architecture for rendering using a TBDR mode, which can avoid the waste of geometric processing resources on dead primitives, and at the same time, it can save unnecessary memory usage for dead primitives. Furthermore, load balancing may be implemented for a multi-processor GPU system.
According to a first aspect of the present disclosure, there is provided an apparatus for tile-based deferred rendering, the apparatus comprising: a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering phase.
According to a second aspect of the present disclosure, there is provided a method for tile-based deferred rendering, the method comprising: generating primitive visibility information for each tile in a stage that is location-only dependent; and performing scheduling for a plurality of processor cores based on the primitive visibility information prior to a rendering phase.
According to a third aspect of the present disclosure, there is provided an apparatus for tile-based deferred rendering, the apparatus comprising: a processor; and a memory communicatively connected to the processor and adapted to store instructions that, when executed by the processor, cause the apparatus to perform the operations of the method according to the second aspect described above.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon instructions that, when executed, cause a processor of an apparatus for tile-based deferred rendering to perform the method according to the above second aspect.
By the method and the device, the waste of computing resources aiming at dead pixels caused during a geometric processing stage is eliminated, so that the processing period, the memory space and the power of a GPU system are saved; in addition, better GPU utilization is achieved by utilizing load balancing scheduling, and both geometric workload and pixel shading workload can be evenly distributed to the multiprocessor system.
Drawings
Exemplary embodiments of the present disclosure will now be described with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The terminology used in the detailed description of the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the disclosure. In the drawings, like numerals refer to like parts throughout.
FIG. 1 shows a schematic block diagram of dead primitive removal in TBDR mode, in accordance with an embodiment of the present disclosure.
Fig. 2 shows a schematic block diagram of load balancing with live primitive information in a TBDR mode according to an embodiment of the present disclosure.
Fig. 3 shows a block diagram of an apparatus for TBDR according to an embodiment of the present disclosure.
Fig. 4 shows a flow chart of a method for TBDR in accordance with an embodiment of the disclosure.
Fig. 5 shows a block diagram of an apparatus for TBDR according to an embodiment of the present disclosure.
Detailed Description
The apparatus, methods, and devices for TBDR are described below. In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the various embodiments of the disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and is intended to be illustrative. The scope of the disclosure is defined by the appended claims and equivalents thereof.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein should be construed to have meanings consistent with their meanings in the context of this specification and the relevant art, unless specifically defined herein.
The present disclosure is described below with reference to block diagrams and/or flowchart illustrations of methods, apparatus, and/or computer program products according to embodiments of the disclosure. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computing device, special purpose computing device, and/or other programmable data processing apparatus, such that the instructions, which execute via the processor of the computing device and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the present disclosure may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present disclosure may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device: which can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
An electronic device uses a machine-readable medium (also referred to as a computer-readable medium), such as a machine-readable storage medium (e.g., a magnetic disk, an optical disk, a read-only memory (ROM), a flash memory, a phase change memory, etc.) and a machine-readable transmission medium (also referred to as a carrier) (e.g., an electrical, optical, radio frequency, acoustic, or other form of propagated signal — such as carrier waves, infrared signals, etc.), to store and transmit code (including software instructions and which may be referred to as computer program code or a computer program) and/or data (internally and/or over a network with other electronic devices). Accordingly, an electronic device (e.g., a computer) includes hardware and software, such as one or more processors, coupled to one or more machine-readable storage media to store code for execution by the one or more processors and/or to store data. For example, an electronic device may include non-volatile memory that may maintain code/data when the electronic device is turned off, and portions of the code to be executed by a processor are typically copied from the slower non-volatile memory to volatile memory (e.g., dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), etc.) of the electronic device when the electronic device is turned on. Typically, the electronic device also includes a set of physical network interfaces to network with other electronic devices (to transmit/receive code and/or data using a propagated signal). One or more portions of the present disclosure may be implemented using different combinations of software, firmware, and/or hardware.
Fig. 1 schematically illustrates a schematic block diagram of non-visible primitive removal in a TBDR mode according to an embodiment of the present disclosure.
As shown in FIG. 1, the location-only phase of pre-processing is used to collect primitive visibility information for each TBDR tile, while also accumulating the number of live primitives for each tile. In the following, live primitives are referred to as visible primitives, while correspondingly dead primitives are referred to as invisible primitives.
The location-only related stages have geometry shading, clipping, projection, culling, and rasterization stages that include vertex shader processing, tessellation, geometry shader processing, and so on. After tiling (i.e., concatenating an element to one screen tile), the rasterization and depth test unit is activated because the location of the rasterized pixel can be used to determine whether the primitive falls into the tile and whether the primitive is completely blocked by another primitive, and thus the primitive can be discarded in a later rendering stage. Finally, primitive visibility information for each tile is generated and output by the visibility engine. The primitive visibility information may include the number of visible primitives per tile.
During the entire phase, only vertex positions are fetched and used to save bandwidth and computational resources, the primitive visibility information for each tile is the output of the phase, pixel shading is skipped altogether, and no other information is generated at the end of the phase.
In a subsequent rendering stage, primitive visibility information generated in the position-only dependent stage will be extracted to align with the geometry data. As can be seen in fig. 1, only the visible primitives are passed to the pipeline for clipping and projection processing. The culling operation is skipped in this stage because invisible primitives have been marked and discarded as described above.
The position-only dependent and rendering stages described above generate primitive visibility information for each tile, skipping the invisible primitives, and are not costly. In this pre-processing stage, primitive information per tile, such as the number of visible primitives, is accumulated. By utilizing the process, unnecessary resource consumption of the invisible primitives, such as assembling, projecting, clipping and memory space of the invisible primitives, can be saved, so that the geometric overhead and memory occupation of processing the invisible primitives can be avoided.
FIG. 2 schematically illustrates a schematic block diagram of load balancing with primitive visibility information in a TBDR mode in accordance with an embodiment of the present disclosure.
In the case of a GPU system having multiple processors, the central scheduling unit may read visible primitive information generated by the stage associated with the location-only. This information may contain the number of visible primitives per screen tile, which the GPU scheduler may process through a dynamic scheduling algorithm to output the best tile group option for scheduling.
As shown in FIG. 2, the operations of GPU instance 0, GPU instance 1, … …, and GPU instance N, respectively, correspond to the operations of the rendering stage shown in FIG. 1, and the GPU scheduler may receive the number of visible primitives in each tile and then schedule them according to the number of visible primitives. For example, in one example, the system has 2 processor cores, the screen is divided into 4 tiles — tile 0 through tile 3, with the number of visible primitives for each tile being 100, 200, 300, and 400, respectively, then the GPU scheduler may assign tile 0 and tile 3 to processor core 0 and tile 1 and tile 2 to processor core 1. Thus, each processor core will process 500 primitives.
Scheduling is performed at a tile-based granularity, i.e., the GPU scheduler assigns each GPU instance in tile units (e.g., one or more tiles), which is much more efficient than complex vertex buffer pre-processing.
The GPU scheduler may efficiently use the visible primitive information per tile to schedule geometric workloads to multiple processors for load balancing. Due to the location-only dependent phase, the GPU scheduler can have final sub-primitive visibility even if geometry enlargement (tessellation or geometry shading) is enabled, to achieve better load-balancing scheduling, avoiding vertex buffer scanning.
Thus, in a multi-processor GPU system, the process shown in FIG. 2 better utilizes GPU computing resources and achieves higher geometry processing performance.
Fig. 3 schematically shows a block diagram of an apparatus 300 for TBDR according to an embodiment of the present disclosure.
Referring to fig. 3, an apparatus 300 for TBDR may include at least a visibility engine 301 and a scheduler 302. In one example, the visibility engine 301 can be a visibility engine, as shown in fig. 1, configured to generate primitive visibility information for each tile in stages that are location-only dependent. In one example, scheduler 302 may be a GPU scheduler, as shown in fig. 2, configured to perform scheduling for multiple processor cores based on primitive visibility information from visibility engine 301 prior to the rendering phase.
As an example, the visibility engine 301 may be further configured to accumulate a number of visible primitives per tile and include the number in primitive visibility information. The scheduling by the scheduler 302 may be performed based on the number included in the primitive visibility information.
As a further example, scheduler 302 may be further configured to evenly distribute the total number of visible primitives across the plurality of processor cores.
As an example, scheduling by scheduler 302 may be performed at a tile-based granularity.
As an example, the scheduler 302 may extract primitive visibility information for individual tiles to align it with geometry data so that only visible primitives enter the rendering stage.
As an example, the rendering stage may include cropping and projection operations, but not culling operations, since the invisible primitives have been previously discarded.
Some components are illustrated in fig. 3 as separate units. However, this merely indicates that the functions are separated. These units may be provided as separate elements. However, other arrangements are possible, for example, some of them may be combined into one unit. Any combination of elements may be implemented in any combination of software, hardware, and/or firmware in any suitable location. For example, there may be more controllers configured separately, or only one controller for all components.
The components shown in fig. 3 may constitute machine-executable instructions embodied in, for example, a machine-readable medium, which when executed by a machine, will cause the machine to perform the operations described. Further, any of these units may be implemented as hardware, such as an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
Fig. 4 schematically illustrates a flow chart of a method 400 for TBDR in accordance with an embodiment of the present disclosure.
In one example, at block 401, primitive visibility information for each tile is generated in a stage that is location-only dependent. At block 402, scheduling for a plurality of processor cores is performed based on primitive visibility information prior to a rendering phase.
As an example, the generating of primitive visibility information may further include accumulating the number of visible primitives per tile and including the number in the primitive visibility information. The scheduling may be performed based on the number.
As a further example, scheduling may be performed by evenly distributing the total number of visible primitives across the plurality of processor cores.
As an example, scheduling may be performed at a tile-based granularity.
As an example, primitive visibility information may be extracted to align with the geometry data such that only visible primitives enter the rendering stage.
As an example, the rendering stage may include cropping and projection operations, but no culling operations.
Fig. 5 schematically shows a block diagram of an apparatus 500 for TBDR according to an embodiment of the present disclosure.
Referring to fig. 5, an apparatus 500 for TBDR may include at least a processor 501, memory 502, an interface 503, and a communication medium 504. The processor 501, memory 502, and interface 503 may be communicatively coupled to each other via a communication medium 504.
A communication medium 504 may facilitate communication between the processor 501, the memory 502, and the interface 503. The communication medium 504 may be implemented in various ways. For example, the communication medium 504 may include a Peripheral Component Interconnect (PCI) bus, a PCI Express bus, an Accelerated Graphics Port (AGP) bus, a Serial Advanced Technology Attachment (ATA) interconnect, a parallel ATA interconnect, a fibre channel interconnect, a USB bus, a Small Computing System Interface (SCSI) interface, or other type of communication medium.
In the example of fig. 5, the instructions stored in memory 502 may include instructions that when executed by processor 501 cause apparatus 500 for TBDR to implement the method described with respect to fig. 4.
Embodiments of the present disclosure may be an article of manufacture in which a non-transitory machine-readable medium, such as a microelectronic memory, has stored thereon instructions (e.g., computer code) that program one or more signal processing components, referred to herein generally as "processors," to perform the operations described above. In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks and state machines). Alternatively, these operations may be performed by any combination of programmed signal processing components and fixed, hardwired circuit components.
It is appreciated that certain features of the application, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the application which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other embodiment of the application. Certain features described in the context of various embodiments should not be considered essential features of those embodiments unless the embodiments are not effective in the absence of those elements.
In the foregoing detailed description, embodiments of the disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made to the embodiments of the disclosure without departing from the spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Throughout the specification, some embodiments of the present disclosure have been presented through flow diagrams. It should be understood that the order of the operations described in these flowcharts is for illustration purposes only and is not intended as a limitation on the present disclosure. Those skilled in the art will recognize that variations of the flow diagrams may be made without departing from the spirit and scope of the disclosure as set forth in the following claims.
Claims (14)
1. An apparatus for tile-based deferred rendering, the apparatus comprising:
a visibility engine configured to generate primitive visibility information for each tile in a stage related to location-only; and
a scheduler configured to perform scheduling for the plurality of processor cores based on the primitive visibility information prior to a rendering stage.
2. The apparatus of claim 1, wherein the visibility engine is further configured to accumulate a number of visible primitives per tile and include the number in the primitive visibility information, and wherein the scheduling is performed based on the number.
3. The apparatus of claim 2, wherein the scheduler is further configured to evenly distribute the total number of visible primitives across the plurality of processor cores.
4. The apparatus of any of claims 1-3, wherein scheduling for the plurality of processor cores is performed at a tile-based granularity.
5. The apparatus of any of claims 1-3, wherein the primitive visibility information is extracted to align with geometry data such that only visible primitives enter the rendering stage.
6. The apparatus of any of claims 1 to 3, wherein the rendering stage includes cropping and projection, and does not include culling.
7. A method for tile-based deferred rendering, the method comprising:
generating primitive visibility information for each tile in a location-only dependent stage; and
scheduling for a plurality of processor cores is performed based on the primitive visibility information prior to a rendering phase.
8. The method of claim 7, wherein generating primitive visibility information for each tile further comprises accumulating a number of visible primitives for each tile and including the number in the primitive visibility information, and wherein performing scheduling for the plurality of processor cores further comprises performing scheduling based on the number.
9. The method of claim 8, wherein performing scheduling for the plurality of processor cores further comprises evenly distributing a total number of visible primitives across the plurality of processor cores.
10. The method of any of claims 7 to 9, wherein performing scheduling for the plurality of processor cores further comprises performing scheduling at a tile-based granularity.
11. The method of any of claims 7-9, wherein the primitive visibility information is extracted to align with geometry data such that only visible primitives enter the rendering stage.
12. The method of any of claims 7 to 9, wherein the rendering stage includes cropping and projecting, and does not include culling.
13. An apparatus for tile-based delayed rendering, the apparatus comprising:
a processor; and
a memory communicatively connected to the processor and adapted to store instructions that, when executed by the processor, cause the device to perform operations of the method of any of claims 7 to 12.
14. A computer readable medium having stored thereon instructions that, when executed, cause a processor of a device for tile-based delayed rendering to perform the method of any of claims 7 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310032126.6A CN115775295A (en) | 2023-01-10 | 2023-01-10 | Apparatus and method for tile-based deferred rendering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310032126.6A CN115775295A (en) | 2023-01-10 | 2023-01-10 | Apparatus and method for tile-based deferred rendering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115775295A true CN115775295A (en) | 2023-03-10 |
Family
ID=85393366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310032126.6A Pending CN115775295A (en) | 2023-01-10 | 2023-01-10 | Apparatus and method for tile-based deferred rendering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115775295A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116820580A (en) * | 2023-08-31 | 2023-09-29 | 摩尔线程智能科技(北京)有限责任公司 | Instruction execution method, system and device, graphics processor and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140198119A1 (en) * | 2013-01-17 | 2014-07-17 | Qualcomm Incorporated | Rendering graphics data using visibility information |
CN108305318A (en) * | 2017-01-12 | 2018-07-20 | 想象技术有限公司 | Graphics processing unit and the method for controlling rendering complexity using the instruction of the cost for the segment set for rendering space |
CN108711133A (en) * | 2017-04-01 | 2018-10-26 | 英特尔公司 | The Immediate Mode based on segment of Z with early stage layering renders |
US20190066354A1 (en) * | 2017-08-31 | 2019-02-28 | Hema C. Nalluri | Apparatus and method for processing commands in tile-based renderers |
CN110728616A (en) * | 2018-06-29 | 2020-01-24 | 畅想科技有限公司 | Tile allocation for processing cores within a graphics processing unit |
-
2023
- 2023-01-10 CN CN202310032126.6A patent/CN115775295A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140198119A1 (en) * | 2013-01-17 | 2014-07-17 | Qualcomm Incorporated | Rendering graphics data using visibility information |
CN108305318A (en) * | 2017-01-12 | 2018-07-20 | 想象技术有限公司 | Graphics processing unit and the method for controlling rendering complexity using the instruction of the cost for the segment set for rendering space |
CN108711133A (en) * | 2017-04-01 | 2018-10-26 | 英特尔公司 | The Immediate Mode based on segment of Z with early stage layering renders |
US20190066354A1 (en) * | 2017-08-31 | 2019-02-28 | Hema C. Nalluri | Apparatus and method for processing commands in tile-based renderers |
CN110728616A (en) * | 2018-06-29 | 2020-01-24 | 畅想科技有限公司 | Tile allocation for processing cores within a graphics processing unit |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116820580A (en) * | 2023-08-31 | 2023-09-29 | 摩尔线程智能科技(北京)有限责任公司 | Instruction execution method, system and device, graphics processor and electronic equipment |
CN116820580B (en) * | 2023-08-31 | 2023-11-10 | 摩尔线程智能科技(北京)有限责任公司 | Instruction execution method, system and device, graphics processor and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3471059B1 (en) | Geometry to tiling arbiter for tile-based rendering system | |
US9483270B2 (en) | Distributed tiled caching | |
KR101134241B1 (en) | Fragment shader bypass in a graphics processing unit, and apparatus and method thereof | |
US20070091088A1 (en) | System and method for managing the computation of graphics shading operations | |
US20170148203A1 (en) | Multi-pass rendering in a screen space pipeline | |
US10332310B2 (en) | Distributed index fetch, primitive assembly, and primitive batching | |
US10922868B2 (en) | Split frame rendering | |
US10430989B2 (en) | Multi-pass rendering in a screen space pipeline | |
US10796483B2 (en) | Identifying primitives in input index stream | |
CN110728616A (en) | Tile allocation for processing cores within a graphics processing unit | |
CN115775295A (en) | Apparatus and method for tile-based deferred rendering | |
TW201435581A (en) | Triggering performance event capture via pipelined state bundles | |
US9123153B2 (en) | Scalable multi-primitive system | |
US10832465B2 (en) | Use of workgroups in pixel shader | |
US20230377086A1 (en) | Pipeline delay elimination with parallel two level primitive batch binning | |
CN117252751B (en) | Geometric processing method, device, equipment and storage medium | |
US20240169649A1 (en) | Graphics processors | |
US20240070961A1 (en) | Vertex index routing for two level primitive batch binning | |
US20240169641A1 (en) | Vertex index routing through culling shader for two level primitive batch binning | |
TW202240528A (en) | Scalable primitive rate architecture for geometry processing | |
GB2603210A (en) | Graphics Processors | |
WO2022203833A1 (en) | Synchronization free cross pass binning through subpass interleaving | |
GB2440689A (en) | Allocation of data storage in a multithreaded data storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |