WO2017164924A1 - Système de reprojection de profondeur basée sur une gpu pour accélérer la génération d'un tampon de profondeur - Google Patents

Système de reprojection de profondeur basée sur une gpu pour accélérer la génération d'un tampon de profondeur Download PDF

Info

Publication number
WO2017164924A1
WO2017164924A1 PCT/US2016/050671 US2016050671W WO2017164924A1 WO 2017164924 A1 WO2017164924 A1 WO 2017164924A1 US 2016050671 W US2016050671 W US 2016050671W WO 2017164924 A1 WO2017164924 A1 WO 2017164924A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
geometric model
rendering
buffer
gpu
Prior art date
Application number
PCT/US2016/050671
Other languages
English (en)
Inventor
Jeremy S. Bennett
Michael B. Carter
Original Assignee
Siemens Product Lifecycle Management Software Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Product Lifecycle Management Software Inc. filed Critical Siemens Product Lifecycle Management Software Inc.
Publication of WO2017164924A1 publication Critical patent/WO2017164924A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Definitions

  • the present disclosure is directed, in general, to computer-aided design, visualization, and manufacturing systems, product lifecycle management ("PLM”) systems, and similar systems, that manage data for products and other items (collectively, "Product Data Management” systems or PDM systems).
  • PLM product lifecycle management
  • PDM systems manage PLM and other data. Improved systems are desirable.
  • Various disclosed embodiments include systems and methods for massive model visualization performed by a graphics processing unit (GPU) of a data processing system.
  • a method includes executing a rendering stage on a three-dimensional (3D) geometric model.
  • the method includes executing a strategy stage on the 3D geometric model, including projecting contents of a depth buffer for a current view of the 3D geometric model from contents of the depth buffer from a previous view of the 3D geometric model.
  • the method includes displaying the 3D geometric model according to the rendering stage and strategy stage.
  • Figure 1 illustrates a block diagram of a data processing system in which an embodiment can be implemented
  • Figure 2 illustrates an example of components that can be included in a massive model visualization system in accordance with disclosed embodiments
  • Figures 3 and 4 demonstrate how a spatial hierarchy can be mapped in accordance with disclosed embodiments
  • Figure 5 illustrates a Multi Draw Elements Indirect buffer, index vertex buffer object, and a vertex buffer object in accordance with disclosed embodiments
  • FIG. 6-9 illustrate processes in accordance with disclosed embodiments.
  • Figure 10 illustrates an example of a sub-pixel mesh in accordance with disclosed embodiments.
  • FIGURES 1 through 10 discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged device. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.
  • MMV Massive Model Visualization
  • VGR Visibility-guided rendering
  • Disclosed embodiments include a system for reprojecting the depth buffer from one frame into another frame without ever having to transfer data from the GPU in order to significantly reduce the amount of time it takes to populate the depth buffer. [0018] Disclosed embodiments also provide the benefit of providing an efficient means in which a depth buffer might be established for other rendering algorithms that require a pre-populated depth buffer.
  • FIG. 1 illustrates a block diagram of a data processing system in which an embodiment can be implemented, for example as a PDM system particularly configured by software or otherwise to perform the processes as described herein, and in particular as each one of a plurality of interconnected and communicating systems as described herein.
  • the data processing system depicted includes a processor 102 connected to a level two cache/bridge 104, which is connected in turn to a local system bus 106.
  • Local system bus 106 may be, for example, a peripheral component interconnect (PCI) architecture bus.
  • PCI peripheral component interconnect
  • main memory 108 main memory
  • graphics adapter 110 may be connected to display 11 1.
  • Processor 102 or graphics adapter 110 can include a graphics processing unit 128.
  • Peripherals such as local area network (LAN) / Wide Area Network / Wireless (e.g. WiFi) adapter 112, may also be connected to local system bus 106.
  • Expansion bus interface 114 connects local system bus 106 to input/output (I/O) bus 116.
  • I/O bus 1 16 is connected to keyboard/mouse adapter 118, disk controller 120, and I/O adapter 122.
  • Disk controller 120 can be connected to a storage 126, which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices.
  • ROMs read only memories
  • EEPROMs electrically programmable read only memories
  • CD-ROMs compact disk read only memories
  • DVDs digital versatile disks
  • audio adapter 124 Also connected to I/O bus 116 in the example shown is audio adapter 124, to which speakers (not shown) may be connected for playing sounds.
  • Keyboard/mouse adapter 118 provides a connection for a pointing device (not shown), such as a mouse, trackball, trackpointer, touchscreen, etc.
  • pointing device such as a mouse, trackball, trackpointer, touchscreen, etc.
  • a data processing system in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface.
  • the operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application.
  • a cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response.
  • One of various commercial operating systems such as a version of Microsoft WindowsTM, a product of Microsoft Corporation located in Redmond, Wash, may be employed if suitably modified.
  • the operating system is modified or created in accordance with the present disclosure as described.
  • LAN/ WAN/Wireless adapter 112 can be connected to a network 130 (not a part of data processing system 100), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet.
  • Data processing system 100 can communicate over network 130 with server system 140, which is also not part of data processing system 100, but can be implemented, for example, as a separate data processing system 100.
  • LMV Large Model Visualization
  • MMV Massive Model Visualization
  • a 1080p high definition screen has a resolution of 1920 by 1080 or just over 2 million pixels. If one was to render a relatively large model of 200 million triangles at most only 2 percent of those triangles could possibly contribute to the final image.
  • MMV technologies are about creating a system that is bound by screen space and not data size which is not the case with systems based purely on LMV technologies.
  • the product structure 204 is subdivided by a partitioner 206 into a spatial hierarchy 210 and a geometric cache 212 in a data cache 208.
  • the data cache 208 can contain anything from occurrences, polygons, or voxels depending up the level of subdivision that is deemed necessary.
  • a strategy stage 218 is executed over the spatial hierarchy 210 in order construct visibility data 228 of all data that is expected to contribute to the current frame.
  • This information is then fed into the renderer 220 that generates the final image, the loader 222 that ensures any required data is resident in the geometric cache, and the reaper 224 that ensures any data that has not been used recently is removed from the geometric cache.
  • the operations within render 216 can be executed in parallel.
  • the strategy 218 can generate the visibility data 228 for next frame while the Tenderer 220 is still rendering the current frame and the loader 222 and reaper 224 can run in a constant cycle executing data load and unloads as deemed necessary.
  • Renderer 220 produces the 3D model 226 for display as viewed from a specified viewpoint.
  • each of the components can play an important part when it comes to handling extremely large datasets.
  • the spatial hierarchy 210 generated by the partitioner 206 must provide enough spatial coherence between the cells for the strategy to be able to efficiently cull large batches of cells.
  • the partitioner 206 must also ensure that the data contained within the cells are sufficiently course so as minimize the amount of noncontributing geometry being used as one of the biggest challenges with extremely large datasets is they contain vastly more geometric information that can possibly be contained in main memory.
  • the render 216 components have to work together to manage the amount of data that is resident at any given point in time.
  • the loader 222 is responsible for loading data and needs to be agile enough to ensure data is available as quickly as possible when it is marked as needed.
  • Predictive algorithms can be used by the loader 222 to try and prefetch data that is likely to become visible so as to minimize any potential lag.
  • the reaper 224 is responsible for detecting and unloading data when it is no longer necessary and determining the best candidates for unloading if memory should approach the maximum threshold.
  • the strategy 218' s primary responsibility is to construct a list of visible occurrences.
  • the visibility determination process can be designed around GPU-based occlusion queries and use other culling techniques, such as view frustum and screen coverage, as a way of pruning the list of entities for which a query needs to be executed.
  • the strategy 218's secondary responsibility is to prune the list of visible data such that it meets the desired thresholds for both frame rate and memory footprint.
  • the visibility determination processes used to enable massive model visualization are not without their faults as they inherently introduce disocclussion artifacts.
  • Disocclussion artifacts occur whenever a visible shape is not rendered for one or more frames while it is visible causing a visible popping effect when the shape is rendered. This behavior is often the result of the visibility determination algorithm's inability to keep up with the visibility state changes that occur within a tree as the camera is moved through the scene. This behavior can also occur if the loader should fail to load the data before it is need.
  • GPU-based occlusion tests have been shown to be an effective tool for improving rendering performance in both industry and games.
  • Disclosed embodiments include novel improvements to a GPU based occlusion strategy for improving performance and reducing disocclusion artifacts.
  • GPU Based Depth Buffer Reprojection GPU based occlusion tests require a depth buffer be prepopulated with the depth values of potential occluders. Most approaches accomplish this by rendering either a potential occluder list or the existing render list into the depth buffer. On large models this can involve rendering millions of triangles, far more than there may be pixels on the screen, at significant cost.
  • a commonly used principle with MMV techniques is frame to frame to coherence or the notion that the visibility state of occurrences will not change significantly between frames. From this it can be extrapolated that the depth buffer used for occlusion culling will not alter significantly given the same.
  • Disclosed embodiments show how the depth buffer from a previous frame can be reprojected into the current viewpoint through the use of a sub-pixel mesh and a vertex shader to generate an approximation of the current depth buffer at near to no cost.
  • Disclosed embodiments can perform a batch query with spatial update, which is a significant improvement over previous approaches. It demonstrates how buffer write based occlusion culling can be applied to a spatial hierarchy without sacrificing the inherit benefits of previous front based approaches. The ability to query all cells in a single draw call allows for increased parallelism to be achieved on the GPU while still maintaining the ability to limit the scope of data loads and visibility changes within the hierarchy. [0035] Occlusion queries provide a means by which the GPU can be used to determine if a given set of primitives contribute to the final image and therefore frequently serve as the primary visibility test in massive model visibility determination algorithms.
  • GPU-based buffer write has been shown to be a viable alternative to GPU occlusion queries as it allows the visibility of all entities of interest to be obtained with a single draw call in order to significantly increase parallelism on the GPU.
  • Disclosed embodiments show how this can be effectively combined with a spatial hierarchy in order to increase its scalability to arbitrarily large data sets.
  • the spatial hierarchy is represented as a tree structure that stores the spatial hierarchy data for each cell.
  • a disclosed spatial hierarchy is based on a bounding volume hierarchy over occurrences. Each cell within the tree contains its bounding volume information, occurrence(s), and children cell(s). The same occurrence can appear in multiple cells and occurrences contained within a cell can be dynamically determined at run time.
  • the spatial hierarchy can be partitioned using any number of different algorithms, such as implemented: Median Cut, Octree-Hilbert, and Outside-In.
  • the bounding volume over occurrence allows the visibility state of a given cell to be directly translated to the visibility state of the occurrence. It also allows the occurrences that are contained within a given cell to be dynamically configured as cells become visible. Both of these features are useful for integrating directly against a PDM and enabling visibility guided interaction.
  • the spatial hierarchy supports having the same occurrence in multiple cells. This allows for a better subdivision while still allowing for cell visibility to be traced back to a specific occurrence.
  • a query representation is dynamically generated for each cell at run time. This allows for the representation to be matched to the visibility determination algorithm being used. For example, the system can dynamically generate a set of triangles representing the bounding volume using OpenGL occlusion queries.
  • the renderlist render process renders the list of all visible occurrence as efficiently as possible.
  • the data structure was designed to utilize modern GPU functionality while minimizing the potential L2 cache impact.
  • the current implementation is based around rendering unified vertex buffer objects, multiple shapes in the same VBO, with state information passed into shader through uniform buffer objects.
  • the depth values could be read back to the host in order to generate a traditional texture depth mesh which in turn could be rendered from the current view point in order to populate the depth buffer.
  • the cost of reading the depth buffer back is far too expensive for this to be practical as taking a performance hit to read the depth buffer back immediately would defeat the purpose, and delaying such that the depth buffer is from 2-3 frames prior has a greater potential to introduce artifacts.
  • the depth buffer could be treated as a point cloud that is easily transformed into the new view point as part of a vertex shader.
  • a frame buffer object (FBO) with a depth texture render target can be used to capture the state of the depth buffer after the rendering of all visible opaque geometry has completed by blitting the buffer from the main frame buffer.
  • the sub-pixel mesh described above is rendered using a vertex shader to dynamically transform all the vertices from the previous view point to the current view point using the values from the depth texture as the initial depth offset of the vertices. If, during the fragment shader, a fragment is detected as having had an initial depth value at the depth buffer maximum, it is discarded. This ensures depth values are only propagated for those pixels caused by rendered geometry.
  • Some approaches utilize OpenGL occlusion queries to decide the visibility state of cells within a spatial hierarchy.
  • One basic algorithm is to traverse a spatial hierarchy in a screen depth first order and execute an individual occlusion query for each cell whose visibility state is question.
  • These approaches result in the alternative representations of each cell being individually rendered along this front as well as multiple state transfers in order read back the results from the queries.
  • Modern GPUs run optimally when processing large batches of data in parallel. In terms of render this means pushing as many triangles as possible in a single draw call which runs counter to the way that traditional occlusion queries are executed.
  • Disclosed embodiments demonstrate how the visibility determination process utilizes buffer writes instead of occlusion queries.
  • the basic approach includes allocating an integer buffer object for storing occlusion results and executing a render operation that renders the shape as a single batch and populates the buffer with the visibility state of individual cells.
  • the cell alternative representations can be combined into a single draw call that ensures all the representation are rendered as a single batch while still maintaining a means by which fragments can be traced back to the originating cells.
  • the alternative representation can be rendered, for example, using glDrawRangedElements and triangles or glDrawRangeElements with GL Primitive Restarts and TriStripSets.
  • Triangle-based rendering does not provide an inherit means to trace the resulting fragments back to the originating cell, so disclosed embodiments can add an additional per-primitive attribute that contains the cell's ID that can be forwarded from the vertex shader into the fragment shader.
  • Encoding each alternative representation as a single TriStripSet allows each cell to be uniquely identified within the fragment shader by using the primitive ID. Both methods can be integrated into the current visibility determination process in lieu of GL occlusion queries.
  • disclosed processes can map the buffer into memory to provide access to the pixel values for all cells in a single operation.
  • all values are initially set to 0.
  • a tree traversal is then used to propagate the values from the buffer to render data associated with each cell. Traversal along a given path is terminated if a cell is not considered visible, the cell has not been configured, or the geometric data associated with cell has not been loaded.
  • This embodiment queries the visibility state of all cells in the spatial hierarchy eliminating the need to post propagate the visibility state of children cells to their parents as commonly found in approaches based upon GL occlusion queries.
  • Disclosed embodiments include novel improvements that are effective in not only improving the performance of a PDM or CAD system, but also in reducing some of the undesirable artifacts that often occur when using culling techniques.
  • buffer writes provide a viable alternative to the tradition GL occlusion queries. Whereas traditional occlusion queries are limited to only querying a single entity at a time, buffer write can be used to query several entities in one go. Through this approach it was shown that buffer write can be effectively used to query the visibility state of an entire spatial hierarchy in the time it would normally take to query only a handful of it cells. In order to ensure that only the truly visible geometry is loaded, the traversal of the spatial hierarchy for update is stopped whenever an unloaded cell is encountered. Further, the query set associated with the spatial hierarchy can be split in to smaller sets such that higher level sets can be used to filter on whether lower level sets even need to be queried.
  • the spatial hierarchy visibility front refers to the point at which cells transition from visible to culled along a given path of traversal.
  • occlusion query solutions can be placed into three primary categories: CPU Based culling, GPU Occlusion Queries, and GPU Texture Write.
  • CPU Based culling Systems in this category rely on the CPU based algorithm as the primary source of occlusion culling. There has been a recent resurgence in this approach in the game industry as the GPU is often viewed as a scarce resource best left for more important tasks such as rendering. Prime examples of this are the FrostBite 2 and CryEngine 3 games engines. Both of these approaches use a software rasterizer on sub thread to execute screen space culling of objects based upon their AABBs or OBBs.
  • One problem with these approaches is that they assume the increases in number of available CPU cores will help them to perform as well as, if not better than, hardware optimized for handling this very problem and whose performance gains far outstrip the CPU.
  • Another problem with these approaches is they do not take into account that GPUs and their APIs are fast approaching the point of executing the entire culling and render list generation process on the GPU.
  • GPU Occlusion Queries Systems in this category rely on GPU Occlusion Queries as the primary source of occlusion culling.
  • GPU Gems 2 showed that it was possible to implement an algorithm that interleaves the rendering of visible occurrence with GL Occlusion queries over a spatial hierarchy in order to minimize GPU stalls when retrieving occlusion results. This solution was adapted and used with success to increase the render performance.
  • the disclosed MMV solution utilizes an iterative solution where previous occlusion queries are retrieved occlusion queries are executed on visibility front in the spatial hierarchy every odd frame in order to avoid GPU stalls.
  • the primary problems with using GL occlusion queries is their parallelism is limited as entities must be queried one at a time.
  • GPU Buffer Write Disclosed embodiments include systems and methods for executing batch occlusion queries on the GPU over the cells of a spatial hierarchy.
  • the GPU realizes the most parallels when the number of triangles rendered or queried by a single draw call is large. This is because the GPU is typically not allowed to overlap computation between successive draw calls.
  • the system operates by generating a single vertex buffer object (VBO) that contains the bounding volumes for all cells within a spatial hierarchy (SH).
  • a multi draw elements indirect (MDEI) buffer is then setup such that the bounding volume of each cell is uniquely referenced.
  • the depth buffer is populated with the potential occluder by either rendering the previous renderlist or reproject the depth buffer from the previous frame.
  • a single draw call is then executed using the MDEI buffer and carefully crafted fragment shader that atomically increment the pixel value associated with the unique ID of a bounding volume of a buffer object whenever one of its fragment passes the depth test.
  • the resulting buffer is copied into another buffer that been persistently mapped into a pointer on the host. This pointer is then indexed parallel to the cells in the SH in order to retrieve the current pixel value for each cell. These values are used for generating the render list, loading data, and selecting LODs.
  • the cell list may be split into multiple segments representing different sub regions of the spatial hierarchy.
  • This system is different from occlusion-query based solutions as it is capable of executing all queries and retrieving all results with single call. Furthermore the system can query any subset of the cells in the spatial hierarchy.
  • This solution is different from existing texture write based approaches as disclosed embodiments are designed to operate on the cells of spatial hierarchy, whereas other techniques are designed around occurrences. This allows achievement of better scalability as the spatial hierarchy allows reduction of the number of occurrences that may be accidentally loaded and allows significant reduction in the number of bounding volumes that need to be rendered by breaking the spatial hierarchy into multiple sub- regions and using the results from higher regions to determine if a sub region needs to be checked. Additionally, the spatial hierarchy can be leveraged when updating the visibility state so as to minimize the impact when first entering a region in which the current depth information is unknown.
  • the massive model rendering process can be split into two main pipeline stages: rendering and strategy.
  • the render stage is responsible for generating the on-screen image through multi-stage rendering of a render list.
  • the strategy stage is responsible for generating a render list of visible geometry to be used in the render stage.
  • the render stage can be broken into four primary sub-stages.
  • the first stage is the rendering of opaque geometry into both the color and depth buffers.
  • the second stage is the blitting of the current depth buffer into a depth texture for use in the spatial strategy.
  • the third stage is rendering transparent geometry into both the color and depth buffers.
  • the fourth and final stage of rendering is again the blitting of the depth buffer into a depth texture for potential use in the spatial strategy.
  • the strategy stage can be broken into four sub-stages: obtain results, update renderlist, render depth, and execute query. The strategy is responsible for executing occlusion queries.
  • the render depth stage is responsible for populating the depth buffer on the GPU with the depth values for potential occluders. This is accomplished by either rendering the previous render list or by utilizing techniques as described in the provisional patent application incorporated herein.
  • the execute query stage is responsible for executing occlusion queries over the cells of a spatial hierarchy in order to determine if they are culled or visible from the current viewpoint.
  • occlusion queries are executed by rendering the bounding volumes of all cells in the spatial hierarchy in single draw call and utilizing a fragment shader to atomically increment the pixel associated with a cell each time one of the fragments produced by its bounding volume is not culled by the depth buffer. This operation results in a buffer that tallies the number of visible fragments for each cell in the spatial hierarchy.
  • the cells may be queried in groups so as to reduce the number of bounding volumes that need to be rendered in order to determine the visibility of all cells. When this occurs, the results from the previous frame for higher-level groups can be used to determine if a lower level group should query.
  • the obtain results stage is responsible for retrieving the results from the queries. It iterates through the cells on the spatial hierarchy and retrieves the visibility results for each cell from an index parallel buffer object that has been persistently mapped on the CPU. Iteration of child cells may be stopped in order to ensure any occurrences associated with a visible parent are loaded and rendered prior to potentially marking the child as visible. This optimization helps to limit the number of cells that are accidentally marked as visible, and therefore configure or load their occurrences, in regions of space in which the current depth buffer is unknown.
  • the buffer object used for capturing the per cell pixel value is made available to the CPU in a form that does not block the GPU. This data structure is designed to be indexed parallel to the cells in spatial hierarchy.
  • the update renderlist stage is responsible for generating a renderlist based upon the current visibility results to be used in the primary render stage. It iterates through the cells on the spatial hierarchy and for any cell that is marked as visible checks to see if there are associated occurrences. If there are occurrences and they are loaded they are inserted into the renderlist. If there are occurrences, but they are not loaded, a request for loading may occur here.
  • FIGs 3 and 4 demonstrate how a spatial hierarchy 300/400 can be mapped for batch query of the entire spatial hierarchy, or in the case of an extremely large hierarchy, multiple sets for batch query.
  • the spatial hierarchy is implemented as a vector of cells in which each cell can be uniquely identified by index.
  • the structure of the hierarchy is established through each cell containing a parent index, a child index, and the number of children.
  • the cells are defined in a depth first order that guarantees that children cells have a larger index and are grouped such that all cells within a sub-tree have contiguous indices as shown in Fig. 3.
  • a breadth-first ordering as illustrated in Fig. 4 results in multi-query groups where not all cells in a sub-tree have contiguous indices, as illustrated by the sub-trees of node two, which would include cells 4, 5, 8, 9, 10 and 11.
  • a buffer object containing integer values is allocated in parallel to the cells of the spatial hierarchy.
  • occlusion tests if a fragment associated with the bounding volume of a cell should pass the depth test, the value contained within the associated element is incremented, resulting in the buffer contain the pixel hit count for all tested cells.
  • a secondary buffer is allocated and persistently mapped, such that values can be read back from it through a direct pointer access. At the end of each occlusion pass, the values from the primary buffer are copied into this secondary buffer. This subtle enhancement allows the primary buffer to remain only in GPU memory which significantly improves the performance of the write operations.
  • Multi-Draw Indirect Modern GPUs are optimized for executing work in parallel. The best performance is therefore achieved when processing large batches.
  • the system batches the rendering of the cell bounding volumes to perform occlusion tests. It defines the batches in such a way that the associated cell for any given volume can be uniquely identified in both the vertex and fragment shader.
  • FIG. 5 illustrates a Multi Draw Elements Indirect (MDEI) buffer 502, index VBO 504, and vertex VBO 506 in accordance with disclosed embodiments.
  • MDEI Multi Draw Elements Indirect
  • the MDEI buffer is parallel to the number of the cells and is initialized such that each DrawElementsIndirectBuffer contains the Firstlndex and BaseVertex for the geometric information of the corresponding cell in the VBO and the Baselnstance is set to the corresponding cell index.
  • This setup allows all cells or a sub-tree of cells to be rendered in a single draw, if so desired, and, more importantly, allows the fragment shader to readily identify the associated cells' bounding volume for a given fragment.
  • Disclosed embodiments have the potential to significantly increase the rendering performance of MMV systems. Early tests have shown a 2-3x increase in the frame rate of several large data sets. It also improves upon the accuracy of occlusion tests and the responsiveness to changes in the visibility state of the spatial hierarchy cells. The batch query contributes significantly to the increase in performance. Various embodiments eliminate the use of expensive GL Occlusion queries and reduce the number of occlusion tests / result retrievals from the number of cells on the front to 1 or the number of visible batches in the case of a large SH.
  • the batch query over cells significantly contributes to performance, accuracy, and responsiveness.
  • Various embodiments update the pixel value for all cells every frame and improves the level of detail (LOD) selection based upon cell pixel value.
  • LOD level of detail
  • Various embodiments allow cell-visibility state changes to be reflected almost immediately in the SH.
  • the single draw call for all queries significantly improves performance.
  • Various embodiments reduce CPU overhead and increase parallelism on the GPU.
  • the multiple batch sets for large spatial hierarchies significantly improves performance and scalability.
  • Various embodiments reduce the number of bounding volumes to be rendered in order to determine visibility of all cells.
  • Figure 6 illustrates a process in accordance with disclosed embodiments that can be performed, for example, by one or more data processing systems 100, referred to generically as "the system” below.
  • This figure provides an overview flowchart of a disclosed massive model rendering process in accordance with disclosed embodiments, including a GPU occlusion query with batch query of spatial hierarchy.
  • FIG. 6 illustrates one example of a high-level process
  • Figs. 7, 8, and 9 illustrate subprocesses that can be used as part of various steps in the process of Fig. 6.
  • the system initializes a rendering process for a 3D model (605).
  • This process can include receiving the 3D model and otherwise initializing the data structures and GPU for performing a rendering process as described herein.
  • the 3D model can have multiple parts or assemblies, and when rendered as a solid model, only some portions of the 3D model should be displayed for any giving "viewpoint" while other portions are occluded (such as portions facing the backside of the model as compared to the viewpoint or portions "behind” other portions of the model).
  • This process can include generating a query representation for each cell in a spatial hierarchy.
  • Figure 7 illustrates an example of subprocesses that can be performed as part of step 605.
  • the system can receive the 3D model (705). "Receiving,” as used herein, can include loading from storage, receiving from another device or process, receiving via an interaction with a user, and otherwise. Receiving the 3D model can be implemented by receiving a product structure of the 3D model, as described above with respect to Fig. 2. [0078]
  • the system can generate one or more buffers for displaying the 3D model (710). In various embodiments, this can include a buffer that contains the number of visible fragments for each cell in the spatial hierarchy, and can include a texture buffer that is persistently mapped by the system to a pointer that can be offset by cell index in order to retrieve the results for a particular cell.
  • Each cell represents a geometric bounding volume that encompasses some portion of the 3D model, and preferably every portion of the 3D model is included in some cell.
  • the cell data is stored in a spatial hierarchy that represents the spatial location of each cell and its respective portions of the 3D model. In this way, the spatial hierarchy can identify the spatial/geometric location of any part, assembly, subassembly, or other portion of the 3D model according to its cell.
  • the cells can be processed in spatial- hierarchy groups so as to reduce the number of bounding volumes that need to be rendered in order to determine the visibility of all cells.
  • the system can map the buffers into memory (715).
  • the system can generate depth textures and FBO (720).
  • the frame buffer object (FBO) with a depth texture render target can be used to capture the state of the depth buffer after the rendering of all visible opaque geometry has completed.
  • the system can use the GPU to generate a depth texture with a same pixel format as a source depth buffer.
  • the system can use the GPU to generate a frame buffer object and bind a depth texture as a render target.
  • the system can generate query groups (725).
  • the occlusion queries can be processed in query groups corresponding to regions of the spatial-hierarchy.
  • the system executes a rendering stage on the 3D model (610).
  • the rendering stage can include renderlist generation that uses visibility determination to create list of all occurrences and their associated state that contribute to the current view of the 3D model.
  • Figure 8 illustrates an example of subprocesses that can be performed as part of step 610.
  • the system can render opaque geometry of the 3D model from an opaque renderlist (805).
  • the system can generate an opaque renderlist based upon the current visibility that identifies each portion of the model that is visible and opaque.
  • the system can iterate through the cells in the spatial hierarchy and, for any cell that is marked as visible, check to see if there are associated occurrences. If there are occurrences and they are loaded, they are inserted into the opaque renderlist, and then the opaque geometry of the opaque renderlist is rendered.
  • the system can use the GPU to blit a current depth buffer into a frame buffer object after all opaque geometry has finished rendering.
  • the system can capture the depth of a plurality of pixels resulting from the rendering of the opaque renderlist (810) and store it in a depth buffer.
  • the system can render transparent geometry of the 3D geometric model from a transparent renderlist (815). As part of this process, the system can generate a transparent renderlist based upon the current visibility that identifies each portion of the model that is visible but transparent. Further transparent occurrences can be ignored as they are not likely to occlude other occurrences.
  • the system can capture the depth of a plurality of pixels resulting from the rendering of the opaque and transparent renderlist (810) and store it in a depth buffer.
  • the system executes a strategy stage on the 3D model (615).
  • the strategy stage includes at least executing one or more occlusion queries over cells of a spatial hierarchy in order to determine if each cell is culled or visible from a current viewpoint.
  • the system projects contents of a depth buffer for a current view of the 3D model from contents of the depth buffer from a previous view of the 3D model.
  • the strategy stage can include one or more of an obtain results substage, and an update rendering substage, and can include one or more of a render depth substage, a reproject depth substage, a render renderlist substage, and an execute query substage.
  • Figure 9 illustrates an example of subprocesses that can be performed as part of step 615.
  • the system can update the renderlist (905).
  • the visibility value for all occurrences is initially set to 0.
  • the spatial hierarchy tree is then traversed such that the pixels value for all visible cells can be propagated to their contained occurrences. If an occurrence is referenced by multiple visible cells, the largest pixel value is used. Once traversal is complete all occurrences with a pixel value greater than a preset threshold are harvested into a render list with the level of detail selection for a particular occurrence being based upon its pixel value.
  • the resulting render list can then be sorted based upon material properties so as to minimize the amount of state changes that must occur whenever it is rendered.
  • the system can render depth (910).
  • the renderlist from the previous frame can be rendered slightly offset back into the depth buffer in order initialize it for executing GPU-based occlusion queries.
  • the bound state during rendering is limited to only that state that can potentially influence the depth buffer results. Further transparent occurrences are ignored as they are not likely to occlude other occurrences.
  • the system can render the renderlist or reproject the depth (915).
  • the renderlist render process renders the list of all visible occurrence as efficiently as possible.
  • reprojecting the depth includes projecting contents of a depth buffer for a current view of the 3D model from contents of the depth buffer from a previous view of the 3D model.
  • the system can execute occlusion queries over the cells of a spatial hierarchy in order to determine if they are culled or visible from the current viewpoint (935). [0096] Returning to Fig. 6, the system displays the 3D model according to the rendering stage and the strategy stage (620).
  • Disclosed embodiments include systems and methods for doing GPU based depth reprojection for accelerating depth buffer generation.
  • Various embodiments operate by capturing the depth buffer at one view point and dynamically reprojecting the depth buffer at another view point without the data ever having to the leave the GPU.
  • the depth buffer is reprojected using a sub-pixel mesh that allows for the depth buffer to be automatically up or down sampled as necessary and naturally prevent gaps from forming as part of the reprojection.
  • FIG. 10 illustrates an example of a sub-pixel mesh 1000 in accordance with disclosed embodiments.
  • the sub-pixel mesh is used to ensure the reprojected depth buffer does not produce gaps and is conservative in nature. It contains a grid of points 1010 that is 1 pixel wider in each direction than the number of pixels in the originating render context.
  • the texture coordinate of the inner points correspond to the pixel coordinates of the originating render context.
  • the exterior points duplicate the texture coordinates of their neighboring interior points.
  • a quad or tristripset is formed between each set of neighboring points thus preventing gaps from forming as the points are reprojected from one viewpoint to another.
  • the exterior points provide additional guards when translating or rotating a pixel that was previously at the edge of the window such that off-screen geometry is treated as though it is at the same depth value as the edge pixel.
  • This system differentiates from prior systems in many ways, including, but not limited to, in that it is designed to operate purely on the GPU with no need of ever transferring data back to the CPU. In doing so, it is able to achieve significant performance gains and significantly reduce the amount of lag between when a depth buffer is captured and when it can possibly be used.
  • This system also differentiates from prior systems in the fact that it requires only a single pass to complete the reprojection. Prior approaches require multiple passes for down sampling the depth buffer and filling in gaps that occur as part of the reprojection. The use of a sub-pixel mesh naturally prevents gaps from occurring and allows for the depth buffer to be dynamically down samples if so desired.
  • the system differentiates from the other systems in other ways as well.
  • disclosed embodiments require only a single pass for generating the occlusion depth buffer.
  • disclosed embodiments can execute the depth reprojection at full resolution.
  • disclosed embodiments do not generate or rely on generation of a depth hierarchy.
  • reprojecting the depth buffer from one frame into another in order to exploit frame to frame coherency has been shown as an effective alternative to the traditional approach of rendering the previous frames render list to populate the depth buffer.
  • this approach was able to make use of advance shaders to reproject the depth buffer from one view to another without the data ever leaving the GPU.
  • Such a process can completely eliminate the measured cost of producing the required depth buffer for occlusion culling.
  • Other embodiments improve upon the performance gains by utilizing blit pixel when the view is detected as the same or down sample the captured depth buffer into a smaller viewport a reprojection time.
  • texture writes provide a viable alternative to GL occlusion queries. Whereas traditional occlusion queries are limited to only querying a single entity at a time, texture write can be used to query several entities in one go. Texture write can be effectively used to query the visibility state of an entire spatial hierarchy in the time it would normally take to query only a handful of it cells. Other embodiments can split the spatial hierarchy into smaller query sets such that higher level sets can be used to filter on whether lower level sets even need to be queried, or the set of cells being queried can be adjusted based upon the current visibility front in effort to reduce the overhead on larger spatial hierarchies.
  • Various embodiments address this problem using a non-iterative GPU culling algorithm by rendering the opaque render list into the depth buffer. This approach can be used for generating a depth buffer whenever it is needed for things such as early-z culling so as eliminate overdraw.
  • Disclosed embodiments include a Depth Buffer that stores the Z coordinates of each pixel of a rendered frame. Disclosed embodiments can include occlusion culling to avoid rendering of objects that cannot be seen by the current camera. Disclosed embodiments include reprojection to change the coordinate system of data from one system to another.
  • the process can be performed completely on the GPU, significantly improving performance. It eliminates the performance penalty of having to copy the depth buffer back to the CPU that is common in other algorithms. It reduces the amount of lag between when the depth buffer was produced and when it can be used for reprojection. It allows for the execution of the reprojection and any down sample in a single pass that produces a conservation depth buffer without gaps. Disclosed techniques can be applied to other rendering algorithms that rely on a prepopulated depth buffer.
  • Disclosed embodiments include a method for MMV performed by a GPU of a data processing system.
  • a method includes executing a rendering stage on a 3D geometric model.
  • the method includes executing a strategy stage on the 3D geometric model.
  • the method includes displaying the 3D geometric model according to the rendering stage and strategy stage.
  • the rendering stage includes rendering opaque geometry of the 3D geometric model from an opaque renderlist, capturing opaque depth of a plurality of pixels in the opaque renderlist, rendering transparent geometry of the 3D geometric model from a transparent renderlist, and capturing transparent depth of a plurality of pixels in the transparent renderlist.
  • the strategy stage includes an obtain results substage, an updated culling substage, and an update rendering substage.
  • the strategy stage includes a render depth substage, a reproject depth substage, a render renderlist substage, and an execute query substage.
  • the GPU generates a depth texture with a same pixel format as a source depth buffer.
  • the GPU generates a frame buffer object and binds a depth texture as a render target.
  • the GPU blits a current depth buffer into a frame buffer object after all opaque geometry has finished rendering.
  • machine usable/readable or computer usable/readable mediums include: nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).
  • ROMs read only memories
  • EEPROMs electrically programmable read only memories
  • user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

L'invention concerne des procédés de visualisation de modèles massifs exécutés par une unité de traitement graphique (GPU) (128) d'un système de traitement de données (100), et des systèmes correspondants. Un procédé consiste à exécuter une étape de rendu (610) sur un modèle géométrique tridimensionnel (3D) (226). Le procédé consiste à exécuter une étape stratégique (615) sur le modèle géométrique 3D (226), ladite étapes comprenant la projection (915) du contenu d'un tampon de profondeur pour une vue actuelle du modèle géométrique 3D (226) à partir du contenu du tampon de profondeur pour une vue précédente du modèle géométrique 3D (226). Le procédé consiste à afficher (620) le modèle géométrique 3D (226) d'après l'étape de rendu (610) et l'étape stratégique (615).
PCT/US2016/050671 2016-03-21 2016-09-08 Système de reprojection de profondeur basée sur une gpu pour accélérer la génération d'un tampon de profondeur WO2017164924A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662311075P 2016-03-21 2016-03-21
US62/311,075 2016-03-21

Publications (1)

Publication Number Publication Date
WO2017164924A1 true WO2017164924A1 (fr) 2017-09-28

Family

ID=59899646

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/050671 WO2017164924A1 (fr) 2016-03-21 2016-09-08 Système de reprojection de profondeur basée sur une gpu pour accélérer la génération d'un tampon de profondeur

Country Status (1)

Country Link
WO (1) WO2017164924A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354067A (zh) * 2020-03-02 2020-06-30 成都偶邦智能科技有限公司 一种基于Unity3D引擎的多模型同屏渲染方法
WO2020209962A1 (fr) * 2019-04-09 2020-10-15 Microsoft Technology Licensing, Llc Rendu hybride
US11496985B2 (en) 2018-08-13 2022-11-08 Zte Corporation Method for determining time difference of arrival, and communication device and system
WO2023224757A1 (fr) * 2022-05-19 2023-11-23 Microsoft Technology Licensing, Llc Rastérisation potentiellement occluse
CN117557740A (zh) * 2024-01-10 2024-02-13 四川见山科技有限责任公司 三维模型分割层级切换方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030043148A1 (en) * 2001-09-06 2003-03-06 Lin-Tien Mei Method for accelerated triangle occlusion culling
US20080079719A1 (en) * 2006-09-29 2008-04-03 Samsung Electronics Co., Ltd. Method, medium, and system rendering 3D graphic objects
US7508390B1 (en) * 2004-08-17 2009-03-24 Nvidia Corporation Method and system for implementing real time soft shadows using penumbra maps and occluder maps
EP2348407A1 (fr) * 2009-12-22 2011-07-27 Intel Corporation Compilation pour une unité de mesurage programmable
US20140306958A1 (en) * 2013-04-12 2014-10-16 Dynamic Digital Depth Research Pty Ltd Stereoscopic rendering system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030043148A1 (en) * 2001-09-06 2003-03-06 Lin-Tien Mei Method for accelerated triangle occlusion culling
US7508390B1 (en) * 2004-08-17 2009-03-24 Nvidia Corporation Method and system for implementing real time soft shadows using penumbra maps and occluder maps
US20080079719A1 (en) * 2006-09-29 2008-04-03 Samsung Electronics Co., Ltd. Method, medium, and system rendering 3D graphic objects
EP2348407A1 (fr) * 2009-12-22 2011-07-27 Intel Corporation Compilation pour une unité de mesurage programmable
US20140306958A1 (en) * 2013-04-12 2014-10-16 Dynamic Digital Depth Research Pty Ltd Stereoscopic rendering system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11496985B2 (en) 2018-08-13 2022-11-08 Zte Corporation Method for determining time difference of arrival, and communication device and system
WO2020209962A1 (fr) * 2019-04-09 2020-10-15 Microsoft Technology Licensing, Llc Rendu hybride
US11170579B2 (en) 2019-04-09 2021-11-09 Microsoft Technology Licensing, Llc Hybrid rendering
CN111354067A (zh) * 2020-03-02 2020-06-30 成都偶邦智能科技有限公司 一种基于Unity3D引擎的多模型同屏渲染方法
CN111354067B (zh) * 2020-03-02 2023-08-22 成都偶邦智能科技有限公司 一种基于Unity3D引擎的多模型同屏渲染方法
WO2023224757A1 (fr) * 2022-05-19 2023-11-23 Microsoft Technology Licensing, Llc Rastérisation potentiellement occluse
CN117557740A (zh) * 2024-01-10 2024-02-13 四川见山科技有限责任公司 三维模型分割层级切换方法、装置、电子设备及存储介质
CN117557740B (zh) * 2024-01-10 2024-04-09 四川见山科技有限责任公司 三维模型分割层级切换方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
US11138782B2 (en) Systems and methods for rendering optical distortion effects
US11069124B2 (en) Systems and methods for reducing rendering latency
US8760450B2 (en) Real-time mesh simplification using the graphics processing unit
US10032308B2 (en) Culling objects from a 3-D graphics pipeline using hierarchical Z buffers
CN113781625B (zh) 适用于光线追踪的基于硬件的技术
WO2017164924A1 (fr) Système de reprojection de profondeur basée sur une gpu pour accélérer la génération d'un tampon de profondeur
Greß et al. GPU‐based collision detection for deformable parameterized surfaces
US10699467B2 (en) Computer-graphics based on hierarchical ray casting
US10553012B2 (en) Systems and methods for rendering foveated effects
Liu et al. Octree rasterization: Accelerating high-quality out-of-core GPU volume rendering
Sintorn et al. Compact precomputed voxelized shadows
Pidhorskyi et al. syGlass: Interactive exploration of multidimensional images using virtual reality head-mounted displays
Schütz et al. Software rasterization of 2 billion points in real time
Vasilakis et al. Depth-fighting aware methods for multifragment rendering
JP2017199354A (ja) 3dシーンのグローバル・イルミネーションの描画
Vasilakis et al. k+-buffer: Fragment synchronized k-buffer
Mattausch et al. CHC+ RT: Coherent hierarchical culling for ray tracing
Papaioannou et al. Real-time volume-based ambient occlusion
Lee et al. Hierarchical raster occlusion culling
JP2008305347A (ja) 干渉判定情報の生成方法及び装置
Xue et al. Efficient GPU out-of-core visualization of large-scale CAD models with voxel representations
Eisemann et al. Visibility sampling on gpu and applications
Vollmer et al. Hierarchical spatial aggregation for level-of-detail visualization of 3D thematic data
Müller et al. Optimised molecular graphics on the hololens
WO2017164923A1 (fr) Requête d'occlusion par lots de gpu avec mise à jour spatiale

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16895714

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16895714

Country of ref document: EP

Kind code of ref document: A1