US20210150658A1 - Reduced bandwidth tessellation factors - Google Patents
Reduced bandwidth tessellation factors Download PDFInfo
- Publication number
- US20210150658A1 US20210150658A1 US16/683,868 US201916683868A US2021150658A1 US 20210150658 A1 US20210150658 A1 US 20210150658A1 US 201916683868 A US201916683868 A US 201916683868A US 2021150658 A1 US2021150658 A1 US 2021150658A1
- Authority
- US
- United States
- Prior art keywords
- patches
- patch
- tessellation factors
- graphics
- fetcher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 claims abstract description 135
- 230000004044 response Effects 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims description 91
- 238000000034 method Methods 0.000 claims description 42
- IYLGZMTXKJYONK-ACLXAEORSA-N (12s,15r)-15-hydroxy-11,16-dioxo-15,20-dihydrosenecionan-12-yl acetate Chemical compound O1C(=O)[C@](CC)(O)C[C@@H](C)[C@](C)(OC(C)=O)C(=O)OCC2=CCN3[C@H]2[C@H]1CC3 IYLGZMTXKJYONK-ACLXAEORSA-N 0.000 claims description 12
- IYLGZMTXKJYONK-UHFFFAOYSA-N ruwenine Natural products O1C(=O)C(CC)(O)CC(C)C(C)(OC(C)=O)C(=O)OCC2=CCN3C2C1CC3 IYLGZMTXKJYONK-UHFFFAOYSA-N 0.000 claims description 12
- 230000006835 compression Effects 0.000 claims description 2
- 238000007906 compression Methods 0.000 claims description 2
- 239000000872 buffer Substances 0.000 description 14
- 230000008569 process Effects 0.000 description 13
- 230000008901 benefit Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000009877 rendering Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 101000651310 Desulfitobacterium hafniense (strain Y51) Trigger factor 2 Proteins 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 238000004804 winding Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Definitions
- a graphics processing unit processes three-dimensional (3-D) graphics using a graphics pipeline formed of a sequence of programmable shaders and fixed-function hardware blocks.
- a 3-D model of an object that is visible in a frame can be represented by a set of triangles, other polygons, or patches which are processed in the graphics pipeline to produce values of pixels for display to a user.
- the triangles, other polygons, or patches are collectively referred to as primitives.
- the process includes mapping tessellation factors to the primitives to represent finer levels of detail as indicated by the tessellation factors that specify the granularity of the primitives produced by a tessellation process.
- the GPU includes a dedicated memory that is used to store tessellation factors so that the tessellation factors are available for mapping to primitives that are being processed in the graphics pipeline.
- the tessellation factors stored in the dedicated GPU memory are populated by procedurally generating the data.
- the dedicated GPU memory is typically a relatively small memory, which limits the amount of tessellation factors that can be stored in the dedicated GPU memory. Furthermore, the overhead required to write the tessellation factors to and read the tessellation factors from memory can be significant.
- FIG. 1 is a block diagram of a processing system that includes a graphics processing unit (GPU) for creating visual images intended for output to a display in accordance with some embodiments.
- GPU graphics processing unit
- FIG. 2 depicts a graphics pipeline that is capable of processing high-order geometry primitives to generate rasterized images of three-dimensional (3D) scenes while storing and retrieving from memory a reduced amount of tessellation factors in accordance with some embodiments.
- FIG. 3 depicts a hull shader of the graphics pipeline of FIG. 2 bypassing writing tessellation factors to memory and sending an indication to a patch fetcher of the graphics pipeline in response to detecting that at least a threshold percentage of tessellation factors for a thread group have a value indicating that patches of the thread group are to be culled in accordance with some embodiments.
- FIG. 4 depicts a hull shader of the graphics pipeline of FIG. 2 bypassing writing tessellation factors to memory and sending an indication to a patch fetcher of the graphics pipeline in response to detecting that at least a threshold percentage tessellation factors for a thread group have a value indicating that patches of the thread group are to be passed to a tessellator stage of the graphics pipeline in accordance with some embodiments.
- FIG. 5 depicts a hull shader of the graphics pipeline of FIG. 2 writing a single instance of a tessellation factor to memory and sending an indication to a patch fetcher of the graphics pipeline that the single tessellation factor applies for all tessellation factors for a patch in accordance with some embodiments.
- FIG. 6 depicts a plurality of tessellation factors for a patch packaged in a single word in accordance with some embodiments.
- FIG. 7 is a flow diagram illustrating a method for bypassing writing at least a subset of tessellation factors to memory in accordance with some embodiments.
- a graphics pipeline for processing three-dimensional (3-D) graphics is formed of a sequence of fixed-function hardware block arrangements supported by programmable shaders and a memory. These arrangements are usually specified by a graphics application programming interface (API) processing order such as specified in specifications of Direct 3D 11, Microsoft DX 11/12 or Khronos Group OpenGL/Vulkan APIs.
- graphics application programming interface API
- One example of a graphics pipeline includes a geometry front-end that is implemented using a vertex shader and a hull shader that operate on high order primitives such as patches that represent a 3-D model of a scene.
- the geometry front-end provides the high order primitives like curved surface patches and tessellation factors generated by the hull shader to a tessellator that is implemented as a fixed function hardware block in some embodiments.
- Tessellation allows detail to be dynamically added and subtracted from a 3D polygon mesh based on control parameters.
- the tessellator generates lower order primitives (such as triangles, lines, and points) from the input higher order primitives based on tessellation parameters (also referred to herein as tessellation factors) which control the degree of fineness of the 3D polygon mesh.
- the tessellation allows for producing smoother surfaces than would be generated by the original 3D polygon mesh.
- Lower order primitives such as polygons are formed of interconnected vertices.
- common objects like meshes include a plurality of triangles formed of three vertices.
- the lower order primitives are provided to a geometry back-end that includes a geometry shader to replicate, shade or subdivide the lower order primitives.
- massive hair generation can be provided via functionality of the geometry shader.
- Vertices of the primitives generated by the portion of the graphics pipeline that handles the geometry workload in object space are then provided to the portion that handles pixel workloads in image space, e.g., via primitive, vertex, and index buffers as well as cache memory buffers.
- the pixel portion includes the arrangements of fixed function hardware combined with programmable pixel shaders to perform culling, rasterization, depth testing, color blending, and the like on the primitives to generate fragments or pixels from the input geometry primitives.
- the fragments are individual pixels or subpixels in some cases.
- a programmable pixel shader then shades the fragments to merge with scene frame image for display.
- FIGS. 1-7 disclose systems and techniques to improve the efficiency and bandwidth of graphics processing pipelines.
- a method of bypassing writing tessellation factors to and reading tessellation factors from a graphics memory includes detecting, at a hull shader of a graphics processing pipeline of a graphics processing unit (GPU), whether all the tessellation factors for a patch, or at least a threshold percentage of the tessellation factors for all patches in a thread group, have the same value, and whether at least a threshold percentage of the tessellation factors indicates either that the patches of the thread group are to be culled or that the patches of the thread group are to be passed to the tessellator.
- GPU graphics processing unit
- the hull shader bypasses writing the tessellation factors to the graphics memory and sends a message to the patch fetcher indicating that the tessellation factors for the thread group are to be discarded.
- the patch fetcher bypasses reading tessellation factors for the thread group from the graphics memory and discards the patches of the thread group.
- the hull shader determines that at least the threshold percentage of the tessellation factors for the thread group indicates that the patches of the thread group are to be passed to the tessellator stage (referred to herein as having tessellation factors with a value of one)
- the hull shader bypasses writing the tessellation factors for the thread group to the graphics memory and sends a message to the patch fetcher indicating that all of the tessellation factors for the thread group are indicate that the patches of the thread group are to be passed to the tessellator stage.
- the patch fetcher bypasses reading the tessellation factors from the graphics memory and provides the patches of the thread group to the tessellator stage.
- the hull shader determines that at least the threshold percentage of the tessellation factors for the thread group have values that are equal to each other but that are neither zero nor one, the hull shader writes a single instance of the value of the tessellation factors to the memory and sends a message to the patch fetcher indicating that the single value of the tessellation factors stored at the graphics memory applies to all of the tessellation factors for the patches of the thread group.
- the patch fetcher reads the single tessellation factor from the graphics memory and applies the single tessellation factor to each of the patches in the thread group before providing the patches to the tessellator.
- the hull fetcher performs integer compression to write more than one compressed tessellation factor for a patch in a single word to the graphics memory.
- an isoline patch is associated with two tessellation factors.
- the hull fetcher writes both tessellation factors for an isoline patch in a single word to the graphics memory.
- a triangle patch is associated with four tessellation factors.
- the hull fetcher writes all four tessellation factors associated with a triangle patch in a single word to the graphics memory.
- a quad patch is associated with six tessellation factors.
- the hull fetcher writes the first three tessellation factors associated with a quad patch in a first single word to the graphics memory and writes the remaining three tessellation factors associated with the quad patch in a second single word to the graphics memory.
- Each patch primitive type (e.g., isoline, triangle, and quad) is associated with either two, four, or six tessellation factors. Particularly for tessellation factors equal to zero or one, more bandwidth can be consumed writing and reading the tessellation factors to and from the graphics memory than is saved by any reduction in granularity of the tessellated primitives that are produced using the tessellation factors. By reducing the amount of data written to and read from the graphics memory, the graphics processing pipeline improves bandwidth and efficiency of the GPU.
- tessellation factors e.g., isoline, triangle, and quad
- FIG. 1 is a block diagram of a processing system 100 for implementing reduced bandwidth tessellation factors in accordance with some embodiments.
- the processing system 100 includes a central processing unit (CPU) 102 , a system memory 104 , a graphics processing subsystem 106 including a graphics processing unit (GPU) 108 , and a display device 110 communicably coupled together by a system data bus 112 .
- the system data bus 112 connects the CPU 102 , the system memory 104 , and the graphics processing subsystem 106 .
- the system memory 104 connects directly to the CPU 102 .
- the CPU 102 portions of the graphics processing subsystem 106 , the system data bus 112 , or any combination thereof, is integrated into a single processing unit. Further, in some embodiments, the functionality of the graphics processing subsystem 106 is included in a chipset or in some other type of special purpose processing unit or co-processor.
- the CPU 102 executes programming instructions stored in the system memory 104 , operates on data stored in the system memory 104 , sends instructions and/or data (e.g., work or tasks to complete) to the graphics processing unit 108 to complete, and configures portions of the graphics processing subsystem 106 for the GPU 108 to complete the work.
- the system memory 104 includes dynamic random access memory (DRAM) for storing programming instructions and data for processing by the CPU 102 and the graphics processing subsystem 106 .
- DRAM dynamic random access memory
- the CPU 102 sends instructions intended for processing at the GPU 108 to command buffers.
- the command buffer is located, for example, at system memory 104 coupled to the system data bus 112 .
- the CPU 102 sends graphics commands intended for the GPU 108 to a separate memory communicably coupled to the system data bus 112 .
- the command buffer temporarily stores a stream of graphics commands that include input to the GPU 108 .
- the stream of graphics commands includes, for example, one or more command packets and/or one or more state update packets.
- a command packet includes a draw command (also interchangeably referred to as a “draw call”) instructing the GPU 108 to execute processes on image data to be output for display.
- a draw command instructs the GPU 108 to render pixels defined by a group of one or more vertices (e.g., defined in a vertex buffer) stored in memory.
- the geometry defined by the group of one or more vertices corresponds, in some embodiments, to a plurality of primitives to be rendered.
- the GPU 108 receives and processes work transmitted from the CPU 102 .
- the GPU 108 processes the work to render and display graphics images on the display device 110 , such as by using one or more graphics pipelines 114 .
- the graphics pipeline 114 includes fixed function stages and programmable shader stages.
- the fixed function stages include typical hardware stages included in a fixed function pipeline of a GPU.
- the programmable shader stages include streaming multiprocessors. Each of the streaming multiprocessors is capable of executing a relatively large number of threads concurrently.
- each of the streaming multiprocessors is programmable to execute processing tasks relating to a wide variety of applications, including but not limited to linear and nonlinear data transforms, filtering of video and/or audio data, modeling operations (e.g., applying of physics to determine position, velocity, and other attributes of objects), and so on.
- the graphics processing subsystem 106 is used for non-graphics processing.
- the system memory 104 includes an application program 116 (e.g., an operating system or other application), an application programming interface (API) 118 , and a GPU driver 120 .
- the application program 116 generates calls to the API 118 for producing a desired set of results, typically in the form of a sequence of graphics images.
- the graphics processing subsystem 106 includes a GPU data bus 122 that communicably couples the GPU 108 to a graphics memory 124 .
- the GPU uses graphics memory 124 and system memory 104 , in any combination, for memory operations.
- the CPU 102 allocates portions of these memories for the GPU 108 to execute work.
- the GPU 108 receives instructions from the CPU 102 , processes the instructions to render graphics data and images, and stores images in the graphics memory 124 . Subsequently, the GPU 108 displays graphics images stored in the graphics memory 124 on the display device 110 .
- the graphics memory 124 stores data and programming used by the GPU 108 . As illustrated in FIG. 1 , the graphics memory 124 includes a frame buffer 126 that stores data for driving the display device 110 .
- the GPU 108 includes one or more compute units, such as one or more processing cores 128 that include one or more processing units 130 that execute a thread concurrently with execution of other threads in a wavefront, such as according to a single-instruction, multiple-data (SIMD) execution model.
- the processing units 130 are also interchangeably referred to as SIMD units.
- SIMD execution model is one in which multiple processing elements share a single program control flow unit and program counter and thus execute the same program but are able to execute that program with different data.
- the processing cores 128 of the GPU 108 are also interchangeably referred to as shader cores or streaming multi-processors (SMXs). The number of processing cores 128 that are implemented in the GPU 108 is a matter of design choice.
- Each of the one or more processing cores 128 executes a respective instantiation of a particular work-item to process incoming data, where the basic unit of execution in the one or more processing cores 128 is a work-item (e.g., a thread).
- Each work-item represents a single instantiation of, for example, a collection of parallel executions of a kernel invoked on a device by a command that is to be executed in parallel.
- a work-item is executed by one or more processing elements as part of a thread group (e.g., a work-group) executing at a processing core 128 .
- the GPU 108 issues and executes single processing unit 130 .
- Wavefronts are included in a “thread group,” which includes a collection of work-items designated to execute the same program.
- a thread group is executed by executing each of the wavefronts that make up the thread group.
- the wavefronts are executed sequentially on a single processing unit 130 or partially or fully in parallel on different SIMD units.
- all wavefronts from a thread group are processed at the same processing core 128 .
- Wavefronts are also interchangeably referred to as warps, vectors, or threads.
- wavefronts include instances of parallel execution of a shader program, where each wavefront includes multiple work-items that execute simultaneously on a single processing unit 130 in line with the SIMD paradigm (e.g., one instruction control unit executing the same stream of instructions with multiple data).
- a scheduler 132 performs operations related to scheduling various wavefronts on different processing cores 128 and processing units 130 , as well as performing other operations for orchestrating various tasks on the graphics processing subsystem 106 .
- the parallelism afforded by the one or more processing cores 128 is suitable for graphics related operations such as pixel value calculations, vertex transformations, tessellation, geometry shading operations, and other graphics operations.
- the graphics pipeline 114 accepts graphics processing commands from the CPU 102 and thus provides computation tasks to the one or more processing cores 128 for execution in parallel.
- Some graphics pipeline operations, such as pixel processing and other parallel computation operations, require that the same command stream or compute kernel be performed on streams or collections of input data elements. Respective instantiations of the same compute kernel are executed concurrently on multiple processing units 130 in the one or more processing cores 128 in order to process such data elements in parallel.
- a compute kernel is a function containing instructions declared in a program and executed on a processing core 128 .
- This function is also referred to as a kernel, a shader, a shader program, or a program.
- the GPU 108 includes a graphics pipeline 114 that reduces the number of tessellation factors written to and read from the graphics memory 124 .
- Abstract patch types include isoline, triangle, and quad.
- An isoline patch is a horizontal line defined by two tessellation factors.
- a triangle patch is a triangle defined by three outer tessellation factors and one inner tessellation factor, for a total of four tessellation factors.
- a quad patch is a square defined by four outer tessellation factors and two inner tessellation factors, for a total of six tessellation factors.
- each tessellation factor includes 32 bits.
- the graphics pipeline 114 detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches are either zero (i.e., indicate that the patches are to be culled) or one (i.e., indicate that the patches are to be passed to a tessellator stage of the graphics pipeline 114 ).
- the threshold is programmable and is set to a relatively high value, such as 98%.
- the graphics pipeline 114 In response to detecting that the threshold percentage of the tessellation factors for the thread group are the same (or, additionally in some embodiments, that the threshold percentage of the tessellation factors are either zero or one), the graphics pipeline 114 bypasses writing and reading at least a subset of the tessellation factors for the thread group of patches to and from the graphics memory 124 , thus reducing bandwidth and increasing efficiency of the graphics pipeline 114 .
- FIG. 2 depicts a graphics pipeline that is capable of processing high-order geometry primitives to generate rasterized images of three-dimensional (3D) scenes while storing and retrieving from memory a reduced amount of tessellation factors in accordance with some embodiments.
- FIG. 2 shows various elements and pipeline stages associated with a GPU. In some embodiments the graphics pipeline includes other elements and stages that are not illustrated in FIG. 2 . It should also be noted that FIG. 2 is only schematic, and that, for example, in some embodiments in practice the shown functional units and pipeline stages share hardware circuits, even though they are shown schematically as separate stages in FIG. 2 . It will also be appreciated that each of the stages, elements and units of the graphics processing pipeline 200 are implemented as desired and accordingly include, for example, appropriate circuitry and/or processing logic for performing the associated operation and functions.
- the graphics processing pipeline 200 is configured to render graphics as images that depict a scene which has three-dimensional geometry in virtual space (sometimes referred to herein as “world space”), but potentially a two-dimensional geometry.
- the graphics processing pipeline 200 typically receives a representation of a three-dimensional scene, processes the representation, and outputs a two-dimensional raster image.
- These stages of graphics processing pipeline 200 process data that is initially properties at end points (or vertices) of a geometric primitive, where the primitive provides information on an object being rendered.
- Typical primitives in three-dimensional graphics include triangles and lines, where the vertices of these geometric primitives provide information on, for example, x-y-z coordinates, texture, and reflectivity.
- graphics memory 210 includes a hierarchy of one or more memories or caches that are used to implement buffers and store tessellation factors, vertex data, texture data, and the like.
- the graphics memory 210 is implemented using some embodiments of the system memory 104 shown in FIG. 1 .
- the graphics memory 210 contains video memory and/or hardware state memory, including various buffers and/or graphics resources utilized in the rendering pipeline.
- one or more individual memory units of the graphics memory 210 is embodied as one or more video random access memory unit(s), one or more caches, one or more processor registers, and the like, depending on the nature of data at the particular stage in rendering. Accordingly, it is understood that graphics memory 210 refers to any processor accessible memory utilized in the graphics processing pipeline 200 .
- a processing unit such as a specialized GPU, is configured to perform various operations in the pipeline and read/write to the graphics memory 210 accordingly.
- the early stages of the graphics processing pipeline 200 include operations performed in world space before a scene is rasterized and converted to screen space as a set of discrete picture elements suitable for output on the pixel display device.
- various resources contained in the graphics memory 210 are utilized at the pipeline stages and inputs and outputs to the stages are temporarily stored in buffers contained in the graphics memory 210 before the final values of the images are determined.
- An input assembler stage 220 is configured to access information from the graphics memory 210 that is used to define objects that represent portions of a model of a scene. For example, in various embodiments, the input assembler stage 220 reads primitive data (e.g., points, lines and/or triangles) from user-filled buffers and assembles the data into primitives that will be used by other pipeline stages of the graphics processing pipeline 200 .
- primitive data e.g., points, lines and/or triangles
- the term “user” refers to the application program 116 or other entity that provides shader code and three-dimensional objects for rendering to the graphics processing pipeline 200 .
- the input assembler stage 220 assembles vertices into several different primitive types (such as line lists, triangle strips, or primitives with adjacency) based on the primitive data include in the user-filled buffers and formats the assembled primitives for use by the rest of the graphics processing pipeline 200 .
- primitive types such as line lists, triangle strips, or primitives with adjacency
- the graphics processing pipeline 200 operates on one or more virtual objects defined by a set of vertices set up in world space and having geometry that is defined with respect to coordinates in the scene.
- the input data utilized in the graphics processing pipeline 200 includes a polygon mesh model of the scene geometry whose vertices correspond to the primitives processed in the rendering pipeline in accordance with aspects of the present disclosure, and the initial vertex geometry is set up in the graphics memory during an application stage implemented by a CPU.
- a vertex processing stage 230 includes various computations to process the vertices of the objects in world space geometry.
- the vertex processing stage 230 includes a vertex shader stage 232 to perform vertex shader computations, which manipulate various parameter values of the vertices in the scene, such as position values (e.g., X-Y coordinate and Z-depth values), color values, lighting values, texture coordinates, and the like.
- the vertex shader computations are performed by one or more programmable vertex shaders 232 .
- the vertex shader computations are performed uniquely for each zone that an object overlaps, and an object zone index is utilized during vertex shading to determine which rendering context and the associated parameters that the object uses, and, accordingly, how the vertex values should be manipulated for later rasterization.
- the vertex shader stage 232 is implemented in software, logically receives a single vertex of a primitive as input, and outputs a single vertex.
- Some embodiments of vertex shaders implement single-instruction-multiple-data (SIMD) processing so that multiple vertices are processed concurrently.
- SIMD single-instruction-multiple-data
- the vertex processing stage 230 also optionally includes additional vertex processing computations, which subdivide primitives and generates new vertices and new geometries in world space.
- the vertex processing stage 230 includes a vertex shader stage 232 , a hull shader stage 233 , a patch fetcher 234 , a tessellator stage 235 , a domain shader stage 236 , and a geometry shader stage 237 .
- the hull shader stage 233 operates on input high-order patches or control points that are used to define the input patches.
- the hull shader stage 233 outputs tessellation factors and other patch data.
- Primitives generated by the hull shader stage 233 can be provided to the tessellator stage 235 by the patch fetcher 234 .
- the tessellator stage 235 receives objects (such as patches) from the hull shader stage 233 and generates information identifying primitives corresponding to the input object, e.g., by tessellating the input objects based on tessellation factors provided to the tessellator stage 235 by the hull shader stage 233 .
- Tessellation subdivides input higher-order primitives such as patches into a set of lower-order output primitives that represent finer levels of detail, e.g., as indicated by tessellation factors that specify the granularity of the primitives produced by the tessellation process.
- a model of a scene can therefore be represented by a smaller number of higher-order primitives (to save memory or bandwidth) and additional details can be added by tessellating the higher-order primitive.
- the domain shader stage 236 inputs a domain location and, in some implementations, other patch data.
- the domain shader stage 236 operates on the provided information and generates a single vertex for output based on the input domain location and other information.
- a geometry shader stage 237 receives an input primitive and outputs up to four primitives that are generated by the geometry shader stage 237 based on the input primitive.
- the geometry shader stage 237 retrieves vertex data from graphics memory 210 and generates new graphics primitives, such as lines and triangles, from the vertex data in graphics memory 210 .
- geometry shader stage 237 retrieves vertex data for a primitive, as a whole, and generates zero or more primitives.
- geometry shader stage 237 can operate on a triangle primitive with three vertices.
- the scene is defined by a set of vertices which each have a set of vertex parameter values stored in the graphics memory 210 .
- the vertex parameter values output from the vertex processing stage 230 include positions defined with different homogeneous coordinates for different zones.
- the graphics processing pipeline 200 then proceeds to rasterization processing stages 240 .
- the rasterization processing stages 240 perform shading operations and other operations such as clipping, perspective dividing, scissoring, and viewport selection, and the like.
- the rasterization processing stages 240 convert the scene geometry into screen space and a set of discrete picture elements (e.g., pixels used during the graphics processing pipeline, although it is noted that the term pixel does not necessarily mean that the pixel corresponds to a display pixel value in the final display buffer image).
- the virtual space geometry transforms to screen space geometry through operations that compute the projection of the objects and vertices from world space to the viewing window (or “viewport”) of the scene that is made up of a plurality of discrete screen space pixels sampled by the rasterizer.
- the screen area includes a plurality of distinct zones with different rendering parameters, which include different rasterization parameters for the different zones.
- the rasterization processing stage 240 depicted in the figure includes a primitive assembly stage 242 , which sets up the primitives defined by each set of vertices in the scene.
- Each vertex is defined by a vertex index, and each primitive is defined with respect to these vertex indices and stored in index buffers in the graphics memory 210 .
- the primitives should include at least triangles that are defined by three vertices each, but also include point primitives, line primitives, and other polygonal shapes.
- certain primitives are culled. For example, those primitives whose vertex indices and homogeneous coordinate space positions indicate a certain winding order are considered to be back-facing and therefore culled from the scene.
- Primitive assembly stage 242 also includes screen space transformations for the primitive vertices, which can include different screen space transform parameters for different zones of the screen area.
- the rasterization processing stage 240 performs clipping, a perspective divide to transform the points into homogeneous space and maps the vertices to the viewport.
- the raster data is snapped to integer locations that are then culled and clipped (to draw the minimum number of pixels), and per-pixel attributes are interpolated (from per-vertex attributes). In this manner, the rasterization processing stage 240 determines which pixel primitives overlap, clips primitives and prepares primitives for the pixel shader and determines how to invoke the pixel shader stage 250 .
- the hull shader stage 233 writes all tessellation factors for all patches to the graphics memory 210 and the patch fetcher 234 reads all tessellation factors for all patches from the graphics memory 210 , which can waste computing resources and create processing bottlenecks. For example, frequently there are large runs in which all or a high percentage of the tessellation factors are the same.
- the hull shader stage 233 detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same.
- the hull shader stage 233 further detects whether at least the threshold percentage of the tessellation factors for the thread group either indicate that the patches of the thread group are to be culled (e.g., have a value of zero) or indicate that the patches of the thread group are to be passed to the tessellator stage 235 (e.g., have a value of one). In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same, the hull shader stage 233 bypasses writing at least a subset of the tessellation factors to the graphics memory 210 .
- the hull shader stage 233 in response to detecting that at least the threshold percentage of tessellation factors for a thread group of patches all have the same value of zero or one, the hull shader stage 233 sends a message to the patch fetcher 234 .
- the hull shader stage 233 bypasses writing the tessellation factors to the graphics memory 210 and the patch fetcher 234 bypasses reading the tessellation factors from the graphics memory 210 in response to receiving the message.
- the hull shader stage 233 In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same, but are not equal to zero or one, the hull shader stage 233 writes a single instance of the value of the majority of tessellation factors to the graphics memory 210 and sends a message to the patch fetcher 234 indicating that the single value of the tessellation factors stored at the graphics memory 210 applies to all of the patches of the thread group.
- FIG. 3 depicts the hull shader stage 233 of the graphics pipeline of FIG. 2 bypassing writing tessellation factors to the graphics memory 210 and sending an indication to a patch fetcher 234 of the graphics pipeline that all tessellation factors for a thread group have a value indicating that the patches of the thread group are to be culled in accordance with some embodiments.
- the hull shader stage 233 In response to the hull shader stage 233 detecting that at least the threshold percentage of the tessellation factors for the patches of a thread group have a value indicating that the patches of the thread group are to be culled (e.g., have a value of zero), the hull shader stage 233 bypasses writing the tessellation factors for the thread group to the graphics memory 210 .
- the hull shader stage 233 also sends a message 302 to the patch fetcher 234 indicating that all of the tessellation factors for all of the patches of the thread group have a value indicating that the patches of the thread group are to be culled (e.g., are equal to zero).
- the patch fetcher 234 bypasses reading tessellation factors for the thread group from the graphics memory 210 . Because a tessellation factor of zero culls patches, the patch fetcher 234 additionally discards the patches of the thread group rather than passing them to the tessellator stage (not shown).
- FIG. 4 depicts the hull shader stage 233 of the graphics pipeline of FIG. 2 bypassing writing tessellation factors to the graphics memory 210 and sending an indication to a patch fetcher 234 of the graphics pipeline in response to detecting that at least the threshold percentage of the tessellation factors for a thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage in accordance with some embodiments.
- the hull shader stage 233 In response to the hull shader stage 233 detecting that at least the threshold percentage of the tessellation factors for the patches of a thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage (e.g., have a value of one), the hull shader stage 233 bypasses writing the tessellation factors for the thread group to the graphics memory 210 .
- the hull shader stage 233 also sends a message 402 to the patch fetcher 234 indicating that the tessellation factors for the patches of the thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage (e.g., are equal to one).
- the patch fetcher 234 bypasses reading tessellation factors for the thread group from the graphics memory 210 .
- the patch fetcher 234 additionally unrolls the patches of the thread group and passes them to the tessellator (not shown).
- FIG. 5 depicts the hull shader stage 233 of the graphics pipeline of FIG. 2 writing a single instance of a value of a tessellation factor to the graphics memory 210 and sending an indication to a patch fetcher 234 of the graphics pipeline that the single tessellation factor value applies for all tessellation factors for a patch in accordance with some embodiments.
- the hull shader stage 233 bypasses writing all of the tessellation factors for the patch to the graphics memory 210 .
- the hull shader stage 233 writes a single instance of the tessellation factor 502 to the graphics memory 210 and sends a flag 504 to the patch fetcher 234 indicating that the single instance of the tessellation factor value applies to all tessellation factors corresponding to the patch 506 .
- the patch fetcher 234 reads the patch 506 and the single instance of the tessellation factor 502 from the graphics memory 210 .
- the patch fetcher 234 applies to the tessellation factor 502 to all tessellation factors corresponding to the patch 506 and provides the patch 506 and the tessellation factor 502 to the tessellator stage 235 , which uses the tessellation factor to generate the final primitives.
- FIG. 6 depicts a plurality of tessellation factors for a patch packaged in a single word in accordance with some embodiments.
- a hull shader stage writes tessellation factors to a graphics memory in 32-bit words.
- Each tessellation factor has a value between 0 and 64. Accordingly, each tessellation factor can be written using 8 bits.
- the hull shader stage (not shown) packages a plurality of tessellation factors in a single word. For example, an isoline patch has two tessellation factors.
- the hull shader stage writes to the graphics memory (not shown) a single word 601 including a first tessellation factor TF- 1 602 and a second tessellation factor TF- 2 604 corresponding to an isoline patch.
- a triangle patch has four tessellation factors.
- the hull shader stage writes to the graphics memory a single word 611 including a first tessellation factor TF- 1 612 , a second tessellation factor TF- 2 614 , a third tessellation factor TF- 3 616 , and a fourth tessellation factor TF- 4 618 corresponding to a triangle patch.
- a quad patch has six tessellation factors.
- the hull shader stage packages the six tessellation factors corresponding to a quad patch into two words. For example, the hull shader stage writes to the graphics memory a first word 621 including a first tessellation factor TF- 1 622 , a second tessellation factor TF- 2 624 , and a third tessellation factor TF- 3 626 , and a second word 627 including a fourth tessellation factor TF- 4 628 , a fifth tessellation factor TF- 5 630 , and a sixth tessellation factor TF- 6 632 corresponding to a quad patch.
- the hull shader stage reduces the number of words of tessellation factors being written to and read from the graphics memory from two to one (in the case of an isoline patch), from four to one (in the case of a triangle patch), and from six to two (in the case of a quad patch).
- FIG. 7 is a flow diagram illustrating a method 700 for bypassing writing at least a subset of tessellation factors to memory in accordance with some embodiments.
- the method is implemented by the graphics pipeline 114 of FIG. 1 or the graphics processing pipeline 200 of FIG. 2 .
- the hull shader stage 233 determines whether at least a threshold percentage of the tessellation factors for all patches of a thread group have equal values. If, at block 702 , the hull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all patches of the thread group do not have equal values, the method flow continues to block 704 . At block 704 , the hull shader stage 233 determines whether all of the tessellation factors for a patch of the thread group have equal values. If, at block 704 , the hull shader stage 233 determines that all of the tessellation factors for the patch do not have equal values, the method flow continues to block 706 .
- the hull shader stage 233 writes the tessellation factors for the patch to the graphics memory 210 .
- the hull shader stage 233 writes a plurality of tessellation factors corresponding to the patch in a single word. For example, for an isoline patch having two tessellation factors, the hull shader stage 233 writes both tessellation factors for the isoline patch in a single word. For a triangle patch having four tessellation factors, the hull shader stage 233 writes all four tessellation factors for the triangle in a single word.
- the hull shader stage 233 writes, e.g., the first three tessellation factors for the quad patch in a first word and the second three tessellation factors for the quad patch in a second word.
- the method flow continues to block 708 .
- the hull shader stage 233 writes a single instance of the tessellation factor to the graphics memory 210 and send a flag 504 to the patch fetcher indicating that the single instance of the tessellation factor applies for all tessellation factors corresponding to the patch.
- the method flow continues to block 710 .
- the hull shader stage 233 determines whether at least the threshold percentage of the tessellation factors for all patches of the thread group have a value that indicates that the patches of the thread group are to be culled. If, at block 710 , the hull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all of the patches of the thread group have a value that indicates that the patches of the thread group are to be culled, the method flow continues to block 712 .
- the hull shader stage 233 bypasses writing the tessellation factors for the thread group to the graphics memory 210 and sends a message 302 to the patch fetcher 234 indicating that the tessellation factors for the thread group have a value that indicates that the patches of the thread group are to be culled.
- the patch fetcher 234 bypasses reading tessellation factors for the thread group from the graphics memory 210 and culls (discards) the patches of the thread group.
- the method flow continues to block 714 .
- the hull shader stage 233 determines whether at least the threshold percentage of the tessellation factors for all of patches of the thread group have a value that indicates that the patches of the thread group are to be passed to the tessellator stage 235 .
- the method flow continues to block 716 .
- the hull shader stage 233 bypasses writing the tessellation factors for the patches of the thread group to the graphics memory 210 and sends a message 402 to the patch fetcher 234 indicating that all of the tessellation factors for all of the patches of the thread group have a value that indicates that the patches of the thread group are to be passed to the tessellator stage 235 .
- the patch fetcher 234 In response to receiving the message 402 , the patch fetcher 234 unrolls the patches from the graphics memory 210 and provides the patches to the tessellator stage 235 . If, at block 714 , the hull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all of the patches of the thread group do not have a value that indicates that the patches of the thread group are to be passed to the tessellator stage 235 , the method flow continues to block 708 .
- the hull shader stage has determined that at least the threshold percentage of the tessellation factors for all of the patches of the thread group have equal values
- the hull shader writes a single instance of the most common tessellator factor value to the graphics memory 210 and sends a flag 504 to the patch fetcher indicating that the single tessellator factor stored at the graphics memory 210 applies for all of the tessellation factors for all of the patches of the thread group.
- a computer readable storage medium includes any non-transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system.
- Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media.
- optical media e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc
- magnetic media e.g., floppy disc, magnetic tape, or magnetic hard drive
- volatile memory e.g., random access memory (RAM) or cache
- non-volatile memory e.g., read-only memory (ROM) or Flash memory
- the computer readable storage medium in some embodiments is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
- the computing system e.g., system RAM or ROM
- fixedly attached to the computing system e.g., a magnetic hard drive
- removably attached to the computing system e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory
- USB Universal Serial Bus
- certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software.
- the software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium.
- the software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above.
- the non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like.
- the executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Image Generation (AREA)
Abstract
A graphics pipeline reduces the number of tessellation factors written to and read from a graphics memory. A hull shader stage of the graphics pipeline detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches have a same value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline. In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same (or, additionally, that at least the threshold percentage of the tessellation factors have a value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline), the hull shader stage bypasses writing at least a subset of the tessellation factors for the thread group of patches to the graphics memory, thus reducing bandwidth and increasing efficiency of the graphics pipeline.
Description
- A graphics processing unit (GPU) processes three-dimensional (3-D) graphics using a graphics pipeline formed of a sequence of programmable shaders and fixed-function hardware blocks. For example, a 3-D model of an object that is visible in a frame can be represented by a set of triangles, other polygons, or patches which are processed in the graphics pipeline to produce values of pixels for display to a user. The triangles, other polygons, or patches are collectively referred to as primitives. The process includes mapping tessellation factors to the primitives to represent finer levels of detail as indicated by the tessellation factors that specify the granularity of the primitives produced by a tessellation process. The GPU includes a dedicated memory that is used to store tessellation factors so that the tessellation factors are available for mapping to primitives that are being processed in the graphics pipeline. The tessellation factors stored in the dedicated GPU memory are populated by procedurally generating the data. The dedicated GPU memory is typically a relatively small memory, which limits the amount of tessellation factors that can be stored in the dedicated GPU memory. Furthermore, the overhead required to write the tessellation factors to and read the tessellation factors from memory can be significant.
- The present disclosure is better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items.
-
FIG. 1 is a block diagram of a processing system that includes a graphics processing unit (GPU) for creating visual images intended for output to a display in accordance with some embodiments. -
FIG. 2 depicts a graphics pipeline that is capable of processing high-order geometry primitives to generate rasterized images of three-dimensional (3D) scenes while storing and retrieving from memory a reduced amount of tessellation factors in accordance with some embodiments. -
FIG. 3 depicts a hull shader of the graphics pipeline ofFIG. 2 bypassing writing tessellation factors to memory and sending an indication to a patch fetcher of the graphics pipeline in response to detecting that at least a threshold percentage of tessellation factors for a thread group have a value indicating that patches of the thread group are to be culled in accordance with some embodiments. -
FIG. 4 depicts a hull shader of the graphics pipeline ofFIG. 2 bypassing writing tessellation factors to memory and sending an indication to a patch fetcher of the graphics pipeline in response to detecting that at least a threshold percentage tessellation factors for a thread group have a value indicating that patches of the thread group are to be passed to a tessellator stage of the graphics pipeline in accordance with some embodiments. -
FIG. 5 depicts a hull shader of the graphics pipeline ofFIG. 2 writing a single instance of a tessellation factor to memory and sending an indication to a patch fetcher of the graphics pipeline that the single tessellation factor applies for all tessellation factors for a patch in accordance with some embodiments. -
FIG. 6 depicts a plurality of tessellation factors for a patch packaged in a single word in accordance with some embodiments. -
FIG. 7 is a flow diagram illustrating a method for bypassing writing at least a subset of tessellation factors to memory in accordance with some embodiments. - A graphics pipeline for processing three-dimensional (3-D) graphics is formed of a sequence of fixed-function hardware block arrangements supported by programmable shaders and a memory. These arrangements are usually specified by a graphics application programming interface (API) processing order such as specified in specifications of Direct 3D 11, Microsoft DX 11/12 or Khronos Group OpenGL/Vulkan APIs. One example of a graphics pipeline includes a geometry front-end that is implemented using a vertex shader and a hull shader that operate on high order primitives such as patches that represent a 3-D model of a scene.
- The geometry front-end provides the high order primitives like curved surface patches and tessellation factors generated by the hull shader to a tessellator that is implemented as a fixed function hardware block in some embodiments. Tessellation allows detail to be dynamically added and subtracted from a 3D polygon mesh based on control parameters. The tessellator generates lower order primitives (such as triangles, lines, and points) from the input higher order primitives based on tessellation parameters (also referred to herein as tessellation factors) which control the degree of fineness of the 3D polygon mesh. The tessellation allows for producing smoother surfaces than would be generated by the original 3D polygon mesh. Lower order primitives such as polygons are formed of interconnected vertices. For example, common objects like meshes include a plurality of triangles formed of three vertices. The lower order primitives are provided to a geometry back-end that includes a geometry shader to replicate, shade or subdivide the lower order primitives. For example, massive hair generation can be provided via functionality of the geometry shader.
- Vertices of the primitives generated by the portion of the graphics pipeline that handles the geometry workload in object space are then provided to the portion that handles pixel workloads in image space, e.g., via primitive, vertex, and index buffers as well as cache memory buffers. The pixel portion includes the arrangements of fixed function hardware combined with programmable pixel shaders to perform culling, rasterization, depth testing, color blending, and the like on the primitives to generate fragments or pixels from the input geometry primitives. The fragments are individual pixels or subpixels in some cases. A programmable pixel shader then shades the fragments to merge with scene frame image for display.
-
FIGS. 1-7 disclose systems and techniques to improve the efficiency and bandwidth of graphics processing pipelines. In some embodiments, a method of bypassing writing tessellation factors to and reading tessellation factors from a graphics memory includes detecting, at a hull shader of a graphics processing pipeline of a graphics processing unit (GPU), whether all the tessellation factors for a patch, or at least a threshold percentage of the tessellation factors for all patches in a thread group, have the same value, and whether at least a threshold percentage of the tessellation factors indicates either that the patches of the thread group are to be culled or that the patches of the thread group are to be passed to the tessellator. If at least the threshold percentage of the tessellation factors for a thread group indicate that the patches of the thread group are to be culled (referred to herein as having tessellation factors with a value of zero), the hull shader bypasses writing the tessellation factors to the graphics memory and sends a message to the patch fetcher indicating that the tessellation factors for the thread group are to be discarded. In response to receiving the message, the patch fetcher bypasses reading tessellation factors for the thread group from the graphics memory and discards the patches of the thread group. - If the hull shader determines that at least the threshold percentage of the tessellation factors for the thread group indicates that the patches of the thread group are to be passed to the tessellator stage (referred to herein as having tessellation factors with a value of one), the hull shader bypasses writing the tessellation factors for the thread group to the graphics memory and sends a message to the patch fetcher indicating that all of the tessellation factors for the thread group are indicate that the patches of the thread group are to be passed to the tessellator stage. In response to receiving the message, the patch fetcher bypasses reading the tessellation factors from the graphics memory and provides the patches of the thread group to the tessellator stage.
- In some embodiments, if the hull shader determines that at least the threshold percentage of the tessellation factors for the thread group have values that are equal to each other but that are neither zero nor one, the hull shader writes a single instance of the value of the tessellation factors to the memory and sends a message to the patch fetcher indicating that the single value of the tessellation factors stored at the graphics memory applies to all of the tessellation factors for the patches of the thread group. In response to receiving the message, the patch fetcher reads the single tessellation factor from the graphics memory and applies the single tessellation factor to each of the patches in the thread group before providing the patches to the tessellator.
- If the tessellation factors for the patches of the thread group do not have values that are equal to each other, in some embodiments, the hull fetcher performs integer compression to write more than one compressed tessellation factor for a patch in a single word to the graphics memory. For example, an isoline patch is associated with two tessellation factors. Thus, in some embodiments, the hull fetcher writes both tessellation factors for an isoline patch in a single word to the graphics memory. Similarly, a triangle patch is associated with four tessellation factors. In some embodiments, the hull fetcher writes all four tessellation factors associated with a triangle patch in a single word to the graphics memory. A quad patch is associated with six tessellation factors. In some embodiments, the hull fetcher writes the first three tessellation factors associated with a quad patch in a first single word to the graphics memory and writes the remaining three tessellation factors associated with the quad patch in a second single word to the graphics memory.
- Each patch primitive type (e.g., isoline, triangle, and quad) is associated with either two, four, or six tessellation factors. Particularly for tessellation factors equal to zero or one, more bandwidth can be consumed writing and reading the tessellation factors to and from the graphics memory than is saved by any reduction in granularity of the tessellated primitives that are produced using the tessellation factors. By reducing the amount of data written to and read from the graphics memory, the graphics processing pipeline improves bandwidth and efficiency of the GPU.
-
FIG. 1 is a block diagram of aprocessing system 100 for implementing reduced bandwidth tessellation factors in accordance with some embodiments. Theprocessing system 100 includes a central processing unit (CPU) 102, asystem memory 104, agraphics processing subsystem 106 including a graphics processing unit (GPU) 108, and adisplay device 110 communicably coupled together by asystem data bus 112. As shown, thesystem data bus 112 connects theCPU 102, thesystem memory 104, and thegraphics processing subsystem 106. In other embodiments, thesystem memory 104 connects directly to theCPU 102. In some embodiments, theCPU 102, portions of thegraphics processing subsystem 106, thesystem data bus 112, or any combination thereof, is integrated into a single processing unit. Further, in some embodiments, the functionality of thegraphics processing subsystem 106 is included in a chipset or in some other type of special purpose processing unit or co-processor. - The
CPU 102 executes programming instructions stored in thesystem memory 104, operates on data stored in thesystem memory 104, sends instructions and/or data (e.g., work or tasks to complete) to thegraphics processing unit 108 to complete, and configures portions of thegraphics processing subsystem 106 for theGPU 108 to complete the work. In some embodiments, thesystem memory 104 includes dynamic random access memory (DRAM) for storing programming instructions and data for processing by theCPU 102 and thegraphics processing subsystem 106. - In various embodiments, the
CPU 102 sends instructions intended for processing at theGPU 108 to command buffers. In some embodiments, the command buffer is located, for example, atsystem memory 104 coupled to thesystem data bus 112. In other embodiments, theCPU 102 sends graphics commands intended for theGPU 108 to a separate memory communicably coupled to thesystem data bus 112. The command buffer temporarily stores a stream of graphics commands that include input to theGPU 108. The stream of graphics commands includes, for example, one or more command packets and/or one or more state update packets. In some embodiments, a command packet includes a draw command (also interchangeably referred to as a “draw call”) instructing theGPU 108 to execute processes on image data to be output for display. For example, a draw command instructs theGPU 108 to render pixels defined by a group of one or more vertices (e.g., defined in a vertex buffer) stored in memory. The geometry defined by the group of one or more vertices corresponds, in some embodiments, to a plurality of primitives to be rendered. - The
GPU 108 receives and processes work transmitted from theCPU 102. For example, in various embodiments, theGPU 108 processes the work to render and display graphics images on thedisplay device 110, such as by using one ormore graphics pipelines 114. Thegraphics pipeline 114 includes fixed function stages and programmable shader stages. The fixed function stages include typical hardware stages included in a fixed function pipeline of a GPU. The programmable shader stages include streaming multiprocessors. Each of the streaming multiprocessors is capable of executing a relatively large number of threads concurrently. Further, each of the streaming multiprocessors is programmable to execute processing tasks relating to a wide variety of applications, including but not limited to linear and nonlinear data transforms, filtering of video and/or audio data, modeling operations (e.g., applying of physics to determine position, velocity, and other attributes of objects), and so on. In other embodiments, thegraphics processing subsystem 106 is used for non-graphics processing. - As also shown, the
system memory 104 includes an application program 116 (e.g., an operating system or other application), an application programming interface (API) 118, and aGPU driver 120. Theapplication program 116 generates calls to the API 118 for producing a desired set of results, typically in the form of a sequence of graphics images. Thegraphics processing subsystem 106 includes aGPU data bus 122 that communicably couples theGPU 108 to agraphics memory 124. In various embodiments, the GPU usesgraphics memory 124 andsystem memory 104, in any combination, for memory operations. TheCPU 102 allocates portions of these memories for theGPU 108 to execute work. For example, in various embodiments, theGPU 108 receives instructions from theCPU 102, processes the instructions to render graphics data and images, and stores images in thegraphics memory 124. Subsequently, theGPU 108 displays graphics images stored in thegraphics memory 124 on thedisplay device 110. Thegraphics memory 124 stores data and programming used by theGPU 108. As illustrated inFIG. 1 , thegraphics memory 124 includes aframe buffer 126 that stores data for driving thedisplay device 110. - In various embodiments, the
GPU 108 includes one or more compute units, such as one ormore processing cores 128 that include one ormore processing units 130 that execute a thread concurrently with execution of other threads in a wavefront, such as according to a single-instruction, multiple-data (SIMD) execution model. Theprocessing units 130 are also interchangeably referred to as SIMD units. The SIMD execution model is one in which multiple processing elements share a single program control flow unit and program counter and thus execute the same program but are able to execute that program with different data. Theprocessing cores 128 of theGPU 108 are also interchangeably referred to as shader cores or streaming multi-processors (SMXs). The number ofprocessing cores 128 that are implemented in theGPU 108 is a matter of design choice. - Each of the one or
more processing cores 128 executes a respective instantiation of a particular work-item to process incoming data, where the basic unit of execution in the one ormore processing cores 128 is a work-item (e.g., a thread). Each work-item represents a single instantiation of, for example, a collection of parallel executions of a kernel invoked on a device by a command that is to be executed in parallel. A work-item is executed by one or more processing elements as part of a thread group (e.g., a work-group) executing at aprocessing core 128. In various embodiments, theGPU 108 issues and executessingle processing unit 130. Multiple wavefronts are included in a “thread group,” which includes a collection of work-items designated to execute the same program. A thread group is executed by executing each of the wavefronts that make up the thread group. In some embodiments, the wavefronts are executed sequentially on asingle processing unit 130 or partially or fully in parallel on different SIMD units. In other embodiments, all wavefronts from a thread group are processed at thesame processing core 128. Wavefronts are also interchangeably referred to as warps, vectors, or threads. In some embodiments, wavefronts include instances of parallel execution of a shader program, where each wavefront includes multiple work-items that execute simultaneously on asingle processing unit 130 in line with the SIMD paradigm (e.g., one instruction control unit executing the same stream of instructions with multiple data). Ascheduler 132 performs operations related to scheduling various wavefronts ondifferent processing cores 128 andprocessing units 130, as well as performing other operations for orchestrating various tasks on thegraphics processing subsystem 106. - The parallelism afforded by the one or
more processing cores 128 is suitable for graphics related operations such as pixel value calculations, vertex transformations, tessellation, geometry shading operations, and other graphics operations. Thegraphics pipeline 114 accepts graphics processing commands from theCPU 102 and thus provides computation tasks to the one ormore processing cores 128 for execution in parallel. Some graphics pipeline operations, such as pixel processing and other parallel computation operations, require that the same command stream or compute kernel be performed on streams or collections of input data elements. Respective instantiations of the same compute kernel are executed concurrently onmultiple processing units 130 in the one ormore processing cores 128 in order to process such data elements in parallel. As referred to herein, for example, a compute kernel is a function containing instructions declared in a program and executed on aprocessing core 128. This function is also referred to as a kernel, a shader, a shader program, or a program. - As described below in more detail with respect to
FIG. 2 , theGPU 108 includes agraphics pipeline 114 that reduces the number of tessellation factors written to and read from thegraphics memory 124. Abstract patch types include isoline, triangle, and quad. An isoline patch is a horizontal line defined by two tessellation factors. A triangle patch is a triangle defined by three outer tessellation factors and one inner tessellation factor, for a total of four tessellation factors. A quad patch is a square defined by four outer tessellation factors and two inner tessellation factors, for a total of six tessellation factors. In some embodiments, each tessellation factor includes 32 bits. Thus, writing all of the tessellation factors for all of the patches of a thread group to thegraphics memory 124 and reading all of the tessellation factors for the patches of the thread group from thegraphics memory 124 consumes significant bandwidth. Thegraphics pipeline 114 detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches are either zero (i.e., indicate that the patches are to be culled) or one (i.e., indicate that the patches are to be passed to a tessellator stage of the graphics pipeline 114). In some embodiments, the threshold is programmable and is set to a relatively high value, such as 98%. In response to detecting that the threshold percentage of the tessellation factors for the thread group are the same (or, additionally in some embodiments, that the threshold percentage of the tessellation factors are either zero or one), thegraphics pipeline 114 bypasses writing and reading at least a subset of the tessellation factors for the thread group of patches to and from thegraphics memory 124, thus reducing bandwidth and increasing efficiency of thegraphics pipeline 114. -
FIG. 2 depicts a graphics pipeline that is capable of processing high-order geometry primitives to generate rasterized images of three-dimensional (3D) scenes while storing and retrieving from memory a reduced amount of tessellation factors in accordance with some embodiments.FIG. 2 shows various elements and pipeline stages associated with a GPU. In some embodiments the graphics pipeline includes other elements and stages that are not illustrated inFIG. 2 . It should also be noted thatFIG. 2 is only schematic, and that, for example, in some embodiments in practice the shown functional units and pipeline stages share hardware circuits, even though they are shown schematically as separate stages inFIG. 2 . It will also be appreciated that each of the stages, elements and units of thegraphics processing pipeline 200 are implemented as desired and accordingly include, for example, appropriate circuitry and/or processing logic for performing the associated operation and functions. - In various embodiments, the
graphics processing pipeline 200 is configured to render graphics as images that depict a scene which has three-dimensional geometry in virtual space (sometimes referred to herein as “world space”), but potentially a two-dimensional geometry. Thegraphics processing pipeline 200 typically receives a representation of a three-dimensional scene, processes the representation, and outputs a two-dimensional raster image. These stages ofgraphics processing pipeline 200 process data that is initially properties at end points (or vertices) of a geometric primitive, where the primitive provides information on an object being rendered. Typical primitives in three-dimensional graphics include triangles and lines, where the vertices of these geometric primitives provide information on, for example, x-y-z coordinates, texture, and reflectivity. - Throughout the
graphics processing pipeline 200, data is read from and written to one or more memory units, which are generally denoted inFIG. 2 asgraphics memory 210. Thegraphics memory 210 includes a hierarchy of one or more memories or caches that are used to implement buffers and store tessellation factors, vertex data, texture data, and the like. Thegraphics memory 210 is implemented using some embodiments of thesystem memory 104 shown inFIG. 1 . - The
graphics memory 210 contains video memory and/or hardware state memory, including various buffers and/or graphics resources utilized in the rendering pipeline. In various embodiments, one or more individual memory units of thegraphics memory 210 is embodied as one or more video random access memory unit(s), one or more caches, one or more processor registers, and the like, depending on the nature of data at the particular stage in rendering. Accordingly, it is understood thatgraphics memory 210 refers to any processor accessible memory utilized in thegraphics processing pipeline 200. A processing unit, such as a specialized GPU, is configured to perform various operations in the pipeline and read/write to thegraphics memory 210 accordingly. - The early stages of the
graphics processing pipeline 200 include operations performed in world space before a scene is rasterized and converted to screen space as a set of discrete picture elements suitable for output on the pixel display device. Throughout thegraphics processing pipeline 200, various resources contained in thegraphics memory 210 are utilized at the pipeline stages and inputs and outputs to the stages are temporarily stored in buffers contained in thegraphics memory 210 before the final values of the images are determined. - An
input assembler stage 220 is configured to access information from thegraphics memory 210 that is used to define objects that represent portions of a model of a scene. For example, in various embodiments, theinput assembler stage 220 reads primitive data (e.g., points, lines and/or triangles) from user-filled buffers and assembles the data into primitives that will be used by other pipeline stages of thegraphics processing pipeline 200. As used herein, the term “user” refers to theapplication program 116 or other entity that provides shader code and three-dimensional objects for rendering to thegraphics processing pipeline 200. Theinput assembler stage 220 assembles vertices into several different primitive types (such as line lists, triangle strips, or primitives with adjacency) based on the primitive data include in the user-filled buffers and formats the assembled primitives for use by the rest of thegraphics processing pipeline 200. - In various embodiments, the
graphics processing pipeline 200 operates on one or more virtual objects defined by a set of vertices set up in world space and having geometry that is defined with respect to coordinates in the scene. For example, the input data utilized in thegraphics processing pipeline 200 includes a polygon mesh model of the scene geometry whose vertices correspond to the primitives processed in the rendering pipeline in accordance with aspects of the present disclosure, and the initial vertex geometry is set up in the graphics memory during an application stage implemented by a CPU. - A
vertex processing stage 230 includes various computations to process the vertices of the objects in world space geometry. In some embodiments, thevertex processing stage 230 includes avertex shader stage 232 to perform vertex shader computations, which manipulate various parameter values of the vertices in the scene, such as position values (e.g., X-Y coordinate and Z-depth values), color values, lighting values, texture coordinates, and the like. Preferably, the vertex shader computations are performed by one or moreprogrammable vertex shaders 232. The vertex shader computations are performed uniquely for each zone that an object overlaps, and an object zone index is utilized during vertex shading to determine which rendering context and the associated parameters that the object uses, and, accordingly, how the vertex values should be manipulated for later rasterization. In various embodiments, thevertex shader stage 232 is implemented in software, logically receives a single vertex of a primitive as input, and outputs a single vertex. Some embodiments of vertex shaders implement single-instruction-multiple-data (SIMD) processing so that multiple vertices are processed concurrently. - The
vertex processing stage 230 also optionally includes additional vertex processing computations, which subdivide primitives and generates new vertices and new geometries in world space. In the depicted embodiment, thevertex processing stage 230 includes avertex shader stage 232, ahull shader stage 233, apatch fetcher 234, atessellator stage 235, adomain shader stage 236, and ageometry shader stage 237. Thehull shader stage 233 operates on input high-order patches or control points that are used to define the input patches. Thehull shader stage 233 outputs tessellation factors and other patch data. Primitives generated by thehull shader stage 233 can be provided to thetessellator stage 235 by thepatch fetcher 234. Thetessellator stage 235 receives objects (such as patches) from thehull shader stage 233 and generates information identifying primitives corresponding to the input object, e.g., by tessellating the input objects based on tessellation factors provided to thetessellator stage 235 by thehull shader stage 233. Tessellation subdivides input higher-order primitives such as patches into a set of lower-order output primitives that represent finer levels of detail, e.g., as indicated by tessellation factors that specify the granularity of the primitives produced by the tessellation process. A model of a scene can therefore be represented by a smaller number of higher-order primitives (to save memory or bandwidth) and additional details can be added by tessellating the higher-order primitive. - The
domain shader stage 236 inputs a domain location and, in some implementations, other patch data. Thedomain shader stage 236 operates on the provided information and generates a single vertex for output based on the input domain location and other information. Ageometry shader stage 237 receives an input primitive and outputs up to four primitives that are generated by thegeometry shader stage 237 based on the input primitive. In some embodiments, thegeometry shader stage 237 retrieves vertex data fromgraphics memory 210 and generates new graphics primitives, such as lines and triangles, from the vertex data ingraphics memory 210. In particular,geometry shader stage 237 retrieves vertex data for a primitive, as a whole, and generates zero or more primitives. For example,geometry shader stage 237 can operate on a triangle primitive with three vertices. - Once the
vertex processing stage 230 is complete, the scene is defined by a set of vertices which each have a set of vertex parameter values stored in thegraphics memory 210. In certain implementations, the vertex parameter values output from thevertex processing stage 230 include positions defined with different homogeneous coordinates for different zones. - The
graphics processing pipeline 200 then proceeds to rasterization processing stages 240. The rasterization processing stages 240 perform shading operations and other operations such as clipping, perspective dividing, scissoring, and viewport selection, and the like. In various embodiments, the rasterization processing stages 240 convert the scene geometry into screen space and a set of discrete picture elements (e.g., pixels used during the graphics processing pipeline, although it is noted that the term pixel does not necessarily mean that the pixel corresponds to a display pixel value in the final display buffer image). The virtual space geometry transforms to screen space geometry through operations that compute the projection of the objects and vertices from world space to the viewing window (or “viewport”) of the scene that is made up of a plurality of discrete screen space pixels sampled by the rasterizer. In accordance with aspects of the present disclosure, the screen area includes a plurality of distinct zones with different rendering parameters, which include different rasterization parameters for the different zones. - The
rasterization processing stage 240 depicted in the figure includes aprimitive assembly stage 242, which sets up the primitives defined by each set of vertices in the scene. Each vertex is defined by a vertex index, and each primitive is defined with respect to these vertex indices and stored in index buffers in thegraphics memory 210. The primitives should include at least triangles that are defined by three vertices each, but also include point primitives, line primitives, and other polygonal shapes. During theprimitive assembly stage 242, certain primitives are culled. For example, those primitives whose vertex indices and homogeneous coordinate space positions indicate a certain winding order are considered to be back-facing and therefore culled from the scene.Primitive assembly stage 242 also includes screen space transformations for the primitive vertices, which can include different screen space transform parameters for different zones of the screen area. - The
rasterization processing stage 240 performs clipping, a perspective divide to transform the points into homogeneous space and maps the vertices to the viewport. The raster data is snapped to integer locations that are then culled and clipped (to draw the minimum number of pixels), and per-pixel attributes are interpolated (from per-vertex attributes). In this manner, therasterization processing stage 240 determines which pixel primitives overlap, clips primitives and prepares primitives for the pixel shader and determines how to invoke thepixel shader stage 250. - In traditional geometry pipelines, the
hull shader stage 233 writes all tessellation factors for all patches to thegraphics memory 210 and thepatch fetcher 234 reads all tessellation factors for all patches from thegraphics memory 210, which can waste computing resources and create processing bottlenecks. For example, frequently there are large runs in which all or a high percentage of the tessellation factors are the same. Thehull shader stage 233 detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same. If at least the threshold percentage of the tessellation factors for the thread group are the same, in some embodiments thehull shader stage 233 further detects whether at least the threshold percentage of the tessellation factors for the thread group either indicate that the patches of the thread group are to be culled (e.g., have a value of zero) or indicate that the patches of the thread group are to be passed to the tessellator stage 235 (e.g., have a value of one). In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same, thehull shader stage 233 bypasses writing at least a subset of the tessellation factors to thegraphics memory 210. For example, in response to detecting that at least the threshold percentage of tessellation factors for a thread group of patches all have the same value of zero or one, thehull shader stage 233 sends a message to thepatch fetcher 234. Thehull shader stage 233 bypasses writing the tessellation factors to thegraphics memory 210 and thepatch fetcher 234 bypasses reading the tessellation factors from thegraphics memory 210 in response to receiving the message. In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same, but are not equal to zero or one, thehull shader stage 233 writes a single instance of the value of the majority of tessellation factors to thegraphics memory 210 and sends a message to thepatch fetcher 234 indicating that the single value of the tessellation factors stored at thegraphics memory 210 applies to all of the patches of the thread group. -
FIG. 3 depicts thehull shader stage 233 of the graphics pipeline ofFIG. 2 bypassing writing tessellation factors to thegraphics memory 210 and sending an indication to apatch fetcher 234 of the graphics pipeline that all tessellation factors for a thread group have a value indicating that the patches of the thread group are to be culled in accordance with some embodiments. In response to thehull shader stage 233 detecting that at least the threshold percentage of the tessellation factors for the patches of a thread group have a value indicating that the patches of the thread group are to be culled (e.g., have a value of zero), thehull shader stage 233 bypasses writing the tessellation factors for the thread group to thegraphics memory 210. Thehull shader stage 233 also sends amessage 302 to thepatch fetcher 234 indicating that all of the tessellation factors for all of the patches of the thread group have a value indicating that the patches of the thread group are to be culled (e.g., are equal to zero). In response to receiving themessage 302, thepatch fetcher 234 bypasses reading tessellation factors for the thread group from thegraphics memory 210. Because a tessellation factor of zero culls patches, thepatch fetcher 234 additionally discards the patches of the thread group rather than passing them to the tessellator stage (not shown). -
FIG. 4 depicts thehull shader stage 233 of the graphics pipeline ofFIG. 2 bypassing writing tessellation factors to thegraphics memory 210 and sending an indication to apatch fetcher 234 of the graphics pipeline in response to detecting that at least the threshold percentage of the tessellation factors for a thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage in accordance with some embodiments. In response to thehull shader stage 233 detecting that at least the threshold percentage of the tessellation factors for the patches of a thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage (e.g., have a value of one), thehull shader stage 233 bypasses writing the tessellation factors for the thread group to thegraphics memory 210. Thehull shader stage 233 also sends amessage 402 to thepatch fetcher 234 indicating that the tessellation factors for the patches of the thread group have a value indicating that the patches of the thread group are to be passed to the tessellator stage (e.g., are equal to one). In response to receiving themessage 402, thepatch fetcher 234 bypasses reading tessellation factors for the thread group from thegraphics memory 210. Thepatch fetcher 234 additionally unrolls the patches of the thread group and passes them to the tessellator (not shown). -
FIG. 5 depicts thehull shader stage 233 of the graphics pipeline ofFIG. 2 writing a single instance of a value of a tessellation factor to thegraphics memory 210 and sending an indication to apatch fetcher 234 of the graphics pipeline that the single tessellation factor value applies for all tessellation factors for a patch in accordance with some embodiments. In response to detecting that all of the tessellation factors associated with apatch 506 have the same value, thehull shader stage 233 bypasses writing all of the tessellation factors for the patch to thegraphics memory 210. Instead, thehull shader stage 233 writes a single instance of thetessellation factor 502 to thegraphics memory 210 and sends aflag 504 to thepatch fetcher 234 indicating that the single instance of the tessellation factor value applies to all tessellation factors corresponding to thepatch 506. In response to receiving theflag 504, thepatch fetcher 234 reads thepatch 506 and the single instance of thetessellation factor 502 from thegraphics memory 210. Thepatch fetcher 234 applies to thetessellation factor 502 to all tessellation factors corresponding to thepatch 506 and provides thepatch 506 and thetessellation factor 502 to thetessellator stage 235, which uses the tessellation factor to generate the final primitives. -
FIG. 6 depicts a plurality of tessellation factors for a patch packaged in a single word in accordance with some embodiments. In some embodiments, a hull shader stage writes tessellation factors to a graphics memory in 32-bit words. Each tessellation factor has a value between 0 and 64. Accordingly, each tessellation factor can be written using 8 bits. To reduce the number of words being written to and read from thegraphics memory 210, in some embodiments the hull shader stage (not shown) packages a plurality of tessellation factors in a single word. For example, an isoline patch has two tessellation factors. The hull shader stage writes to the graphics memory (not shown) asingle word 601 including a first tessellation factor TF-1 602 and a second tessellation factor TF-2 604 corresponding to an isoline patch. As another example, a triangle patch has four tessellation factors. The hull shader stage writes to the graphics memory asingle word 611 including a first tessellation factor TF-1 612, a second tessellation factor TF-2 614, a third tessellation factor TF-3 616, and a fourth tessellation factor TF-4 618 corresponding to a triangle patch. Similarly, a quad patch has six tessellation factors. Because all six 8-bit tessellation factors cannot fit in a single 32-bit word, the hull shader stage packages the six tessellation factors corresponding to a quad patch into two words. For example, the hull shader stage writes to the graphics memory afirst word 621 including a first tessellation factor TF-1 622, a second tessellation factor TF-2 624, and a third tessellation factor TF-3 626, and asecond word 627 including a fourth tessellation factor TF-4 628, a fifth tessellation factor TF-5 630, and a sixth tessellation factor TF-6 632 corresponding to a quad patch. Thus, the hull shader stage reduces the number of words of tessellation factors being written to and read from the graphics memory from two to one (in the case of an isoline patch), from four to one (in the case of a triangle patch), and from six to two (in the case of a quad patch). -
FIG. 7 is a flow diagram illustrating amethod 700 for bypassing writing at least a subset of tessellation factors to memory in accordance with some embodiments. In some embodiments, the method is implemented by thegraphics pipeline 114 ofFIG. 1 or thegraphics processing pipeline 200 ofFIG. 2 . - At
block 702, thehull shader stage 233 determines whether at least a threshold percentage of the tessellation factors for all patches of a thread group have equal values. If, atblock 702, thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all patches of the thread group do not have equal values, the method flow continues to block 704. Atblock 704, thehull shader stage 233 determines whether all of the tessellation factors for a patch of the thread group have equal values. If, atblock 704, thehull shader stage 233 determines that all of the tessellation factors for the patch do not have equal values, the method flow continues to block 706. Atblock 706, thehull shader stage 233 writes the tessellation factors for the patch to thegraphics memory 210. In some embodiments, thehull shader stage 233 writes a plurality of tessellation factors corresponding to the patch in a single word. For example, for an isoline patch having two tessellation factors, thehull shader stage 233 writes both tessellation factors for the isoline patch in a single word. For a triangle patch having four tessellation factors, thehull shader stage 233 writes all four tessellation factors for the triangle in a single word. For a quad patch having six tessellation factors, thehull shader stage 233 writes, e.g., the first three tessellation factors for the quad patch in a first word and the second three tessellation factors for the quad patch in a second word. - If, at
block 704, thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for the patch have equal values, the method flow continues to block 708. Atblock 708, thehull shader stage 233 writes a single instance of the tessellation factor to thegraphics memory 210 and send aflag 504 to the patch fetcher indicating that the single instance of the tessellation factor applies for all tessellation factors corresponding to the patch. - If, at
block 702, thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all patches of the thread group have equal values, the method flow continues to block 710. Atblock 710, thehull shader stage 233 determines whether at least the threshold percentage of the tessellation factors for all patches of the thread group have a value that indicates that the patches of the thread group are to be culled. If, atblock 710, thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all of the patches of the thread group have a value that indicates that the patches of the thread group are to be culled, the method flow continues to block 712. At block 712, thehull shader stage 233 bypasses writing the tessellation factors for the thread group to thegraphics memory 210 and sends amessage 302 to thepatch fetcher 234 indicating that the tessellation factors for the thread group have a value that indicates that the patches of the thread group are to be culled. In response to receiving themessage 302, thepatch fetcher 234 bypasses reading tessellation factors for the thread group from thegraphics memory 210 and culls (discards) the patches of the thread group. - At
block 710, if thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all of patches of the thread group do not have a value that indicates that the patches of the thread group are to be culled, the method flow continues to block 714. Atblock 714, thehull shader stage 233 determines whether at least the threshold percentage of the tessellation factors for all of patches of the thread group have a value that indicates that the patches of the thread group are to be passed to thetessellator stage 235. If, atblock 714, the hull shader stage determines that at least the threshold percentage of the tessellation factors for all of the patches of the thread group have a value that indicates that the patches of the thread group are to be passed to thetessellator stage 235, the method flow continues to block 716. Atblock 716, thehull shader stage 233 bypasses writing the tessellation factors for the patches of the thread group to thegraphics memory 210 and sends amessage 402 to thepatch fetcher 234 indicating that all of the tessellation factors for all of the patches of the thread group have a value that indicates that the patches of the thread group are to be passed to thetessellator stage 235. In response to receiving themessage 402, thepatch fetcher 234 unrolls the patches from thegraphics memory 210 and provides the patches to thetessellator stage 235. If, atblock 714, thehull shader stage 233 determines that at least the threshold percentage of the tessellation factors for all of the patches of the thread group do not have a value that indicates that the patches of the thread group are to be passed to thetessellator stage 235, the method flow continues to block 708. In this case, because the hull shader stage has determined that at least the threshold percentage of the tessellation factors for all of the patches of the thread group have equal values, atblock 706, the hull shader writes a single instance of the most common tessellator factor value to thegraphics memory 210 and sends aflag 504 to the patch fetcher indicating that the single tessellator factor stored at thegraphics memory 210 applies for all of the tessellation factors for all of the patches of the thread group. - A computer readable storage medium includes any non-transitory storage medium, or combination of non-transitory storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium in some embodiments is embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).
- In some embodiments, certain aspects of the techniques described above are implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium are in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
- Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device are not necessarily required, and that one or more further activities could be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
- Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that could cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter can be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above can be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.
Claims (20)
1. A method comprising:
in response to detecting, at a hull shader stage of a graphics pipeline of a graphics processing unit (GPU), that at least a threshold percentage of tessellation factors corresponding to a plurality of patches in a thread group has the same value,
bypassing writing at least a subset of the tessellation factors corresponding to the plurality of patches to a graphics memory of the graphics pipeline.
2. The method of claim 1 , further comprising:
in response to detecting, at the hull shader stage, that at least the threshold percentage of tessellation factors indicate that the plurality of patches are to be culled, sending a message to a patch fetcher of the graphics pipeline indicating that the plurality of patches are to be culled;
at the patch fetcher, bypassing reading from the graphics memory the tessellation factors corresponding to the plurality of patches in response to receiving the message; and
discarding, at the patch fetcher, the plurality of patches.
3. The method of claim 1 , further comprising:
in response to detecting, at the hull shader stage, that at least the threshold percentage of tessellation factors corresponding to the plurality of patches indicate that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline, sending a message to a patch fetcher of the graphics pipeline that all of the tessellation factors corresponding to the plurality of patches indicate that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline;
bypassing writing the tessellation factors to the graphics memory;
at the patch fetcher, bypassing reading from the graphics memory the tessellation factors corresponding to the plurality of patches in response to receiving the message; and
sending the plurality of patches to the tessellator stage.
4. The method of claim 1 , further comprising:
in response to detecting, at the hull shader stage, that at least a threshold percentage of the tessellation factors corresponding to a patch of the plurality of patches have a same value that neither indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline, writing the value of the tessellation factors once to the graphics memory and sending a flag to a patch fetcher of the graphics pipeline indicating that the value applies for all tessellation factors corresponding to the patch;
in response to receiving the flag at the patch fetcher, reading the value from the graphics memory; and
applying the value for all tessellation factors corresponding to the patch.
5. The method of claim 1 , further comprising:
in response to detecting, at the hull shader stage, that all of the tessellation factors corresponding to a patch of the plurality of patches are not the same, writing at least one word to the graphics memory, the at least one word comprising a plurality of the tessellation factors corresponding to the patch with an indication that the at least one word comprises the plurality of the tessellation factors.
6. The method of claim 5 , wherein the patch comprises an isoline patch and the at least one word comprises two tessellation factors.
7. The method of claim 5 , wherein the patch comprises a triangle patch and the at least one word comprises four tessellation factors.
8. The method of claim 5 , wherein the patch comprises a quad patch and the at least one word comprises six tessellation factors.
9. The method of claim 5 , further comprising:
performing integer compression on the tessellation factors to generate compressed tessellation factors; and
writing the compressed tessellation factors to the graphics memory.
10. A method comprising:
in response to receiving, at a patch fetcher of a graphics pipeline of a graphics processing unit (GPU), a message indicating that at least a threshold percentage of tessellation factors corresponding to a plurality of patches in a thread group have the same value, bypassing reading at least a subset of the tessellation factors corresponding to the plurality of patches from a graphics memory of the graphics pipeline.
11. The method of claim 10 , further comprising:
in response to receiving, at the patch fetcher, an indication that at least the threshold percentage of the tessellation factors corresponding to the plurality of patches have a value indicating that the plurality of patches are to be culled, bypassing reading from the graphics memory the tessellation factors corresponding to the plurality of patches; and
discarding, at the patch fetcher, the plurality of patches.
12. The method of claim 10 , further comprising:
in response to receiving, at the patch fetcher, an indication that at least the threshold percentage of the tessellation factors corresponding to the plurality of patches have a value indicating that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline, sending the plurality of patches to the tessellator stage.
13. The method of claim 10 , further comprising:
in response to receiving, at the patch fetcher, a flag indicating that a single value at the graphics memory applies for all tessellation factors corresponding to a patch, reading the value from the graphics memory; and
applying the value for all tessellation factors corresponding to the patch.
14. The method of claim 10 , further comprising:
in response to receiving, at the patch fetcher, an indication that at least one word stored at the graphics memory comprises a plurality of tessellation factors corresponding to a patch, reading the at least one word from the graphics memory and applying the plurality of tessellation factors to the patch.
15. A graphics processing unit (GPU), comprising:
a patch fetcher configured to read tessellation factors stored at a graphics memory; and
a hull shader stage configured to:
bypass writing to the graphics memory at least a subset of tessellation factors corresponding to a plurality of patches in a thread group in response to detecting that at least a threshold percentage of the tessellation factors have the same value.
16. The GPU of claim 15 , wherein:
the hull shader stage is further configured to indicate to the patch fetcher that all of the tessellation factors corresponding to the plurality of patches have a value indicating that the plurality of patches are to be culled in response to detecting that at least the threshold percentage of the tessellation factors have a value indicating that the plurality of patches are to be culled; and
the patch fetcher is further configured to:
bypass reading from the graphics memory the tessellation factors corresponding to the plurality of patches in response to receiving the indication; and
discard the plurality of patches.
17. The GPU of claim 15 , further comprising:
a tessellator stage;
wherein the hull shader stage is further configured to indicate to the patch fetcher that all of the tessellation factors corresponding to the plurality of patches have a value indicating that the plurality of patches are to be passed to the tessellator stage in response to detecting that at least the threshold percentage of the tessellation factors corresponding to the plurality of patches have a value indicating that the plurality of patches are to be passed to the tessellator stage; and
wherein the patch fetcher is further configured to provide the plurality of patches to the tessellator stage of the GPU in response to receiving the indication.
18. The GPU of claim 15 , wherein:
the hull shader stage is further configured to write the value of the tessellation factors once to the graphics memory and send a flag to the patch fetcher indicating that the value applies for all tessellation factors corresponding to the patch in response to detecting that all of the tessellation factors corresponding to a patch of the plurality of patches have a same value that neither indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage; and
the patch fetcher is further configured to read the value from the graphics memory and apply the value for all tessellation factors corresponding to the patch in response to receiving the flag.
19. The GPU of claim 15 , wherein:
the hull shader stage is further configured to write at least one word to the graphics memory, the at least one word comprising a plurality of the tessellation factors corresponding to the patch with an indication that the at least one word comprises the plurality of the tessellation factors in response to detecting that all of the tessellation factors corresponding to a patch of the plurality of patches are not the same.
20. The GPU of claim 19 , wherein the patch comprises an isoline patch and the at least one word comprises two tessellation factors.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/683,868 US11010862B1 (en) | 2019-11-14 | 2019-11-14 | Reduced bandwidth tessellation factors |
EP20886217.7A EP4058973A4 (en) | 2019-11-14 | 2020-11-13 | Reduced bandwidth tessellation factors |
PCT/US2020/060506 WO2021097279A1 (en) | 2019-11-14 | 2020-11-13 | Reduced bandwidth tessellation factors |
KR1020227016239A KR20220100877A (en) | 2019-11-14 | 2020-11-13 | Reduce bandwidth tessellation factor |
JP2022524018A JP2023501921A (en) | 2019-11-14 | 2020-11-13 | Reduced bandwidth tessellation factor |
CN202080078546.2A CN114730452A (en) | 2019-11-14 | 2020-11-13 | Reducing bandwidth tessellation factors |
US17/318,523 US11532066B2 (en) | 2019-11-14 | 2021-05-12 | Reduced bandwidth tessellation factors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/683,868 US11010862B1 (en) | 2019-11-14 | 2019-11-14 | Reduced bandwidth tessellation factors |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/318,523 Continuation US11532066B2 (en) | 2019-11-14 | 2021-05-12 | Reduced bandwidth tessellation factors |
Publications (2)
Publication Number | Publication Date |
---|---|
US11010862B1 US11010862B1 (en) | 2021-05-18 |
US20210150658A1 true US20210150658A1 (en) | 2021-05-20 |
Family
ID=75910048
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/683,868 Active US11010862B1 (en) | 2019-11-14 | 2019-11-14 | Reduced bandwidth tessellation factors |
US17/318,523 Active 2039-12-20 US11532066B2 (en) | 2019-11-14 | 2021-05-12 | Reduced bandwidth tessellation factors |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/318,523 Active 2039-12-20 US11532066B2 (en) | 2019-11-14 | 2021-05-12 | Reduced bandwidth tessellation factors |
Country Status (6)
Country | Link |
---|---|
US (2) | US11010862B1 (en) |
EP (1) | EP4058973A4 (en) |
JP (1) | JP2023501921A (en) |
KR (1) | KR20220100877A (en) |
CN (1) | CN114730452A (en) |
WO (1) | WO2021097279A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230326134A1 (en) * | 2022-04-08 | 2023-10-12 | Qualcomm Incorporated | Variable rate tessellation |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10460513B2 (en) * | 2016-09-22 | 2019-10-29 | Advanced Micro Devices, Inc. | Combined world-space pipeline shader stages |
US11010862B1 (en) * | 2019-11-14 | 2021-05-18 | Advanced Micro Devices, Inc. | Reduced bandwidth tessellation factors |
US11055896B1 (en) * | 2020-02-25 | 2021-07-06 | Parallels International Gmbh | Hardware-assisted emulation of graphics pipeline |
US11500692B2 (en) * | 2020-09-15 | 2022-11-15 | Apple Inc. | Dynamic buffering control for compute work distribution |
Family Cites Families (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6901422B1 (en) | 2001-03-21 | 2005-05-31 | Apple Computer, Inc. | Matrix multiplication in a vector processing system |
GB2464292A (en) | 2008-10-08 | 2010-04-14 | Advanced Risc Mach Ltd | SIMD processor circuit for performing iterative SIMD multiply-accumulate operations |
KR20130141446A (en) * | 2010-07-19 | 2013-12-26 | 어드밴스드 마이크로 디바이시즈, 인코포레이티드 | Data processing using on-chip memory in multiple processing units |
US8499305B2 (en) * | 2010-10-15 | 2013-07-30 | Via Technologies, Inc. | Systems and methods for performing multi-program general purpose shader kickoff |
US10089774B2 (en) | 2011-11-16 | 2018-10-02 | Qualcomm Incorporated | Tessellation in tile-based rendering |
US9390554B2 (en) * | 2011-12-29 | 2016-07-12 | Advanced Micro Devices, Inc. | Off chip memory for distributed tessellation |
US9449419B2 (en) * | 2012-03-30 | 2016-09-20 | Intel Corporation | Post tessellation edge cache |
GB2500284B (en) * | 2012-09-12 | 2014-04-30 | Imagination Tech Ltd | Tile based computer graphics |
US8982124B2 (en) * | 2012-09-29 | 2015-03-17 | Intel Corporation | Load balancing and merging of tessellation thread workloads |
US9305397B2 (en) * | 2012-10-24 | 2016-04-05 | Qualcomm Incorporated | Vertex order in a tessellation unit |
GB2509113B (en) * | 2012-12-20 | 2017-04-26 | Imagination Tech Ltd | Tessellating patches of surface data in tile based computer graphics rendering |
US9384168B2 (en) | 2013-06-11 | 2016-07-05 | Analog Devices Global | Vector matrix product accelerator for microprocessor integration |
KR102104057B1 (en) * | 2013-07-09 | 2020-04-23 | 삼성전자 주식회사 | Tessellation method for assigning a tessellation factor per point and devices performing the method |
KR102072656B1 (en) * | 2013-07-16 | 2020-02-03 | 삼성전자 주식회사 | Tessellation device including cache, method thereof, and system including the tessellation device |
US9424039B2 (en) | 2014-07-09 | 2016-08-23 | Intel Corporation | Instruction for implementing vector loops of iterations having an iteration dependent condition |
KR102327144B1 (en) | 2014-11-26 | 2021-11-16 | 삼성전자주식회사 | Graphic processing apparatus and method for performing tile-based graphics pipeline thereof |
GB2552260B (en) * | 2015-06-05 | 2019-04-10 | Imagination Tech Ltd | Tessellation method |
GB2539042B (en) * | 2015-06-05 | 2019-08-21 | Imagination Tech Ltd | Tessellation method using displacement factors |
CN107533862B (en) | 2015-08-07 | 2021-04-13 | 慧与发展有限责任合伙企业 | Cross array, image processor and computing device |
KR102381945B1 (en) | 2015-11-18 | 2022-04-01 | 삼성전자주식회사 | Graphic processing apparatus and method for performing graphics pipeline thereof |
US20170358132A1 (en) * | 2016-06-12 | 2017-12-14 | Apple Inc. | System And Method For Tessellation In An Improved Graphics Pipeline |
US9953395B2 (en) * | 2016-08-29 | 2018-04-24 | Intel Corporation | On-die tessellation distribution |
US10037625B2 (en) * | 2016-09-15 | 2018-07-31 | Intel Corporation | Load-balanced tessellation distribution for parallel architectures |
GB2560709B (en) * | 2017-03-14 | 2021-02-24 | Imagination Tech Ltd | Graphics processing method and system for processing sub-primitives |
US10262388B2 (en) | 2017-04-10 | 2019-04-16 | Intel Corporation | Frequent data value compression for graphics processing units |
US10580209B2 (en) * | 2018-03-06 | 2020-03-03 | Qualcomm Incorporated | Removal of degenerated sub-primitives in tessellation |
GB2572619B (en) | 2018-04-05 | 2020-06-17 | Imagination Tech Ltd | Hardware Tessellation Units |
GB2575503B (en) * | 2018-07-13 | 2020-07-01 | Imagination Tech Ltd | Scalable parallel tessellation |
EP3911313A4 (en) * | 2018-11-20 | 2022-10-12 | NFlection Therapeutics, Inc. | Thienyl-aniline compounds for treatment of dermal disorders |
US11080928B2 (en) * | 2019-04-01 | 2021-08-03 | Qualcomm Incorporated | Methods and apparatus for visibility stream management |
US11010862B1 (en) * | 2019-11-14 | 2021-05-18 | Advanced Micro Devices, Inc. | Reduced bandwidth tessellation factors |
-
2019
- 2019-11-14 US US16/683,868 patent/US11010862B1/en active Active
-
2020
- 2020-11-13 WO PCT/US2020/060506 patent/WO2021097279A1/en unknown
- 2020-11-13 JP JP2022524018A patent/JP2023501921A/en active Pending
- 2020-11-13 EP EP20886217.7A patent/EP4058973A4/en active Pending
- 2020-11-13 KR KR1020227016239A patent/KR20220100877A/en unknown
- 2020-11-13 CN CN202080078546.2A patent/CN114730452A/en active Pending
-
2021
- 2021-05-12 US US17/318,523 patent/US11532066B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230326134A1 (en) * | 2022-04-08 | 2023-10-12 | Qualcomm Incorporated | Variable rate tessellation |
US11908079B2 (en) * | 2022-04-08 | 2024-02-20 | Qualcomm Incorporated | Variable rate tessellation |
Also Published As
Publication number | Publication date |
---|---|
EP4058973A1 (en) | 2022-09-21 |
US20210374898A1 (en) | 2021-12-02 |
US11010862B1 (en) | 2021-05-18 |
US11532066B2 (en) | 2022-12-20 |
KR20220100877A (en) | 2022-07-18 |
WO2021097279A1 (en) | 2021-05-20 |
CN114730452A (en) | 2022-07-08 |
EP4058973A4 (en) | 2023-10-18 |
JP2023501921A (en) | 2023-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11532066B2 (en) | Reduced bandwidth tessellation factors | |
US8698837B2 (en) | Path rendering with path clipping | |
US9569811B2 (en) | Rendering graphics to overlapping bins | |
TWI645371B (en) | Setting downstream render state in an upstream shader | |
US9299123B2 (en) | Indexed streamout buffers for graphics processing | |
US11715262B2 (en) | Optimizing primitive shaders | |
US20140267224A1 (en) | Handling post-z coverage data in raster operations | |
US9558573B2 (en) | Optimizing triangle topology for path rendering | |
US11609791B2 (en) | Precise suspend and resume of workloads in a processing unit | |
EP3701376B1 (en) | Wave creation control with dynamic resource allocation | |
US20210287418A1 (en) | Graphics processing unit render mode selection system | |
US20220206950A1 (en) | Selective generation of miss requests for cache lines | |
US20230169728A1 (en) | Throttling hull shaders based on tessellation factors in a graphics pipeline | |
US11620788B2 (en) | Graphics texture footprint discovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NIJASURE, MANGESH P.;LITWILLER, TAD;MARTIN, TODD;AND OTHERS;REEL/FRAME:051034/0019 Effective date: 20191113 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |