WO2017164924A1

WO2017164924A1 - System for gpu based depth reprojection for accelerating depth buffer generation

Info

Publication number: WO2017164924A1
Application number: PCT/US2016/050671
Authority: WO
Inventors: Jeremy S. Bennett; Michael B. Carter
Original assignee: Siemens Product Lifecycle Management Software Inc.
Priority date: 2016-03-21
Filing date: 2016-09-08
Publication date: 2017-09-28

Abstract

Methods for massive model visualization performed by a graphics processing unit (GPU) (128) of a data processing system (100), and corresponding systems. A method includes executing a rendering stage (610) on a three-dimensional (3D) geometric model (226). The method includes executing a strategy stage (615) on the 3D geometric model (226), including projecting (915) contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226). The method includes displaying (620) the 3D geometric model (226) according to the rendering stage (610) and strategy stage (615).

Description

SYSTEM FOR GPU BASED DEPTH REPROJECTION FOR ACCELERATING

DEPTH BUFFER GENERATION

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of the filing date of United States Provisional Patent Application 62/311,075, filed March 21, 2016, which is hereby incorporated by reference.

TECHNICAL FIELD

[0002] The present disclosure is directed, in general, to computer-aided design, visualization, and manufacturing systems, product lifecycle management ("PLM") systems, and similar systems, that manage data for products and other items (collectively, "Product Data Management" systems or PDM systems).

BACKGROUND OF THE DISCLOSURE

[0003] PDM systems manage PLM and other data. Improved systems are desirable.

SUMMARY OF THE DISCLOSURE

[0004] Various disclosed embodiments include systems and methods for massive model visualization performed by a graphics processing unit (GPU) of a data processing system. A method includes executing a rendering stage on a three-dimensional (3D) geometric model. The method includes executing a strategy stage on the 3D geometric model, including projecting contents of a depth buffer for a current view of the 3D geometric model from contents of the depth buffer from a previous view of the 3D geometric model. The method includes displaying the 3D geometric model according to the rendering stage and strategy stage.

[0005] The foregoing has outlined rather broadly the features and technical advantages of the present disclosure so that those skilled in the art may better understand the detailed description that follows. Additional features and advantages of the disclosure will be described hereinafter that form the subject of the claims. Those skilled in the art will appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. Those skilled in the art will also realize that such equivalent constructions do not depart from the spirit and scope of the disclosure in its broadest form.

[0006] Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words or phrases used throughout this patent document: the terms "include" and "comprise," as well as derivatives thereof, mean inclusion without limitation; the term "or" is inclusive, meaning and/or; the phrases "associated with" and "associated therewith," as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term "controller" means any device, system or part thereof that controls at least one operation, whether such a device is implemented in hardware, firmware, software or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, and those of ordinary skill in the art will understand that such definitions apply in many, if not most, instances to prior as well as future uses of such defined words and phrases. While some terms may include a wide variety of embodiments, the appended claims may expressly limit these terms to specific embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which:

[0008] Figure 1 illustrates a block diagram of a data processing system in which an embodiment can be implemented;

[0009] Figure 2 illustrates an example of components that can be included in a massive model visualization system in accordance with disclosed embodiments;

[0010] Figures 3 and 4 demonstrate how a spatial hierarchy can be mapped in accordance with disclosed embodiments;

[0011] Figure 5 illustrates a Multi Draw Elements Indirect buffer, index vertex buffer object, and a vertex buffer object in accordance with disclosed embodiments;

[0012] Figures 6-9 illustrate processes in accordance with disclosed embodiments; and

[0013] Figure 10 illustrates an example of a sub-pixel mesh in accordance with disclosed embodiments.

DETAILED DESCRIPTION

[0014] FIGURES 1 through 10, discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged device. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.

[0015] The ever increasing complex designs of companies are leading to a data explosion that is far outstripping the growth in computing processing power. The traditional large model visualization approaches used for rendering these data sets are quickly becoming insufficient, leading to a greater adoption of the new massive model visualization approaches that were designed from the ground up to handle these arbitrarily-sized data sets. Most of these new approaches utilize GPU occlusion queries as means by which the amount of data that needs to be loaded and rendered is limited to only that which can potentially contribute to the final image. By doing so, these approaches introduce disocclusion artifacts that are often perceived as reducing the quality of the resulting visualization as one maneuvers a camera throughout the scene. Disclosed embodiments illustrate that the atomic texture writes and multi draw indirect can be used to not only increase the performance of an existing system based upon occlusion queries but also reduce the amount of perceived disocclussion artifacts as validated through a user study.

[0016] Massive Model Visualization (MMV) systems are able to render models with millions of parts by identifying the (typically small) subset of part occurrences that is actually needed to produce a correct image. Visibility-guided rendering (VGR) algorithms traverse a pre-computed spatial structure in order to determine which occurrences are potentially visible from a given eye point in an efficient manner.

[0017] Disclosed embodiments include a system for reprojecting the depth buffer from one frame into another frame without ever having to transfer data from the GPU in order to significantly reduce the amount of time it takes to populate the depth buffer. [0018] Disclosed embodiments also provide the benefit of providing an efficient means in which a depth buffer might be established for other rendering algorithms that require a pre-populated depth buffer.

[0019] Figure 1 illustrates a block diagram of a data processing system in which an embodiment can be implemented, for example as a PDM system particularly configured by software or otherwise to perform the processes as described herein, and in particular as each one of a plurality of interconnected and communicating systems as described herein. The data processing system depicted includes a processor 102 connected to a level two cache/bridge 104, which is connected in turn to a local system bus 106. Local system bus 106 may be, for example, a peripheral component interconnect (PCI) architecture bus. Also connected to local system bus in the depicted example are a main memory 108 and a graphics adapter 110. The graphics adapter 110 may be connected to display 11 1. Processor 102 or graphics adapter 110 can include a graphics processing unit 128.

[0020] Other peripherals, such as local area network (LAN) / Wide Area Network / Wireless (e.g. WiFi) adapter 112, may also be connected to local system bus 106. Expansion bus interface 114 connects local system bus 106 to input/output (I/O) bus 116. I/O bus 1 16 is connected to keyboard/mouse adapter 118, disk controller 120, and I/O adapter 122. Disk controller 120 can be connected to a storage 126, which can be any suitable machine usable or machine readable storage medium, including but not limited to nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), magnetic tape storage, and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs), and other known optical, electrical, or magnetic storage devices.

[0021] Also connected to I/O bus 116 in the example shown is audio adapter 124, to which speakers (not shown) may be connected for playing sounds. Keyboard/mouse adapter 118 provides a connection for a pointing device (not shown), such as a mouse, trackball, trackpointer, touchscreen, etc. [0022] Those of ordinary skill in the art will appreciate that the hardware depicted in Figure 1 may vary for particular implementations. For example, other peripheral devices, such as an optical disk drive and the like, also may be used in addition or in place of the hardware depicted. The depicted example is provided for the purpose of explanation only and is not meant to imply architectural limitations with respect to the present disclosure.

[0023] A data processing system in accordance with an embodiment of the present disclosure includes an operating system employing a graphical user interface. The operating system permits multiple display windows to be presented in the graphical user interface simultaneously, with each display window providing an interface to a different application or to a different instance of the same application. A cursor in the graphical user interface may be manipulated by a user through the pointing device. The position of the cursor may be changed and/or an event, such as clicking a mouse button, generated to actuate a desired response.

[0024] One of various commercial operating systems, such as a version of Microsoft Windows™, a product of Microsoft Corporation located in Redmond, Wash, may be employed if suitably modified. The operating system is modified or created in accordance with the present disclosure as described.

[0025] LAN/ WAN/Wireless adapter 112 can be connected to a network 130 (not a part of data processing system 100), which can be any public or private data processing system network or combination of networks, as known to those of skill in the art, including the Internet. Data processing system 100 can communicate over network 130 with server system 140, which is also not part of data processing system 100, but can be implemented, for example, as a separate data processing system 100.

[0026] The digital era has brought about the adoption of computer systems throughout the entire product development lifecycle. Companies are now more capable then ever of breaking new ground with each successive generation of their products resulting in ever increasing complex designs. This in turn is leading to a data explosion that is far outpacing the increases in the processing power of computer systems being used to build them. For example, the current generation of CAD and visualization software is not capable of rendering the current generation of airplanes and ships in their entirety.

[0027] The current generation of software is designed around Large Model Visualization (LMV) technologies. These technologies work by traversing a product structure, usually represented as a scene graph, and using various techniques such as view frustum culling, size culling, occlusion culling, and level of detail representation in order to limit the number of polygons that are rendered when viewing the scene. The problem with this approach is it is linear in nature and therefore the computational cost grows at the same rate as the data size increases.

[0028] "Massive Model Visualization" (MMV) is the term used to encompass new technologies that aim at being able to handle this problem. The key principle of MMV is that the number of polygons that can potentially contribute to a rendered image from a given viewpoint is limited by the total number of pixels available in the image and not by the total number of available polygons. A 1080p high definition screen has a resolution of 1920 by 1080 or just over 2 million pixels. If one was to render a relatively large model of 200 million triangles at most only 2 percent of those triangles could possibly contribute to the final image. MMV technologies are about creating a system that is bound by screen space and not data size which is not the case with systems based purely on LMV technologies.

[0029] There are several components that can be included in a MMV system to make this possible, as shown in Figure 2. As part of a preprocess operation 202, the product structure 204, such as in the form of a scene graph, is subdivided by a partitioner 206 into a spatial hierarchy 210 and a geometric cache 212 in a data cache 208. The data cache 208 can contain anything from occurrences, polygons, or voxels depending up the level of subdivision that is deemed necessary. During render 216 a strategy stage 218 is executed over the spatial hierarchy 210 in order construct visibility data 228 of all data that is expected to contribute to the current frame. This information is then fed into the renderer 220 that generates the final image, the loader 222 that ensures any required data is resident in the geometric cache, and the reaper 224 that ensures any data that has not been used recently is removed from the geometric cache. The operations within render 216 can be executed in parallel. The strategy 218 can generate the visibility data 228 for next frame while the Tenderer 220 is still rendering the current frame and the loader 222 and reaper 224 can run in a constant cycle executing data load and unloads as deemed necessary. Renderer 220 produces the 3D model 226 for display as viewed from a specified viewpoint.

[0030] In order for a MMV system to be successful, each of the components can play an important part when it comes to handling extremely large datasets. The spatial hierarchy 210 generated by the partitioner 206 must provide enough spatial coherence between the cells for the strategy to be able to efficiently cull large batches of cells. The partitioner 206 must also ensure that the data contained within the cells are sufficiently course so as minimize the amount of noncontributing geometry being used as one of the biggest challenges with extremely large datasets is they contain vastly more geometric information that can possibly be contained in main memory. The render 216 components have to work together to manage the amount of data that is resident at any given point in time. The loader 222 is responsible for loading data and needs to be agile enough to ensure data is available as quickly as possible when it is marked as needed. Predictive algorithms can be used by the loader 222 to try and prefetch data that is likely to become visible so as to minimize any potential lag. The reaper 224 is responsible for detecting and unloading data when it is no longer necessary and determining the best candidates for unloading if memory should approach the maximum threshold. The strategy 218' s primary responsibility is to construct a list of visible occurrences. The visibility determination process can be designed around GPU-based occlusion queries and use other culling techniques, such as view frustum and screen coverage, as a way of pruning the list of entities for which a query needs to be executed. The strategy 218's secondary responsibility is to prune the list of visible data such that it meets the desired thresholds for both frame rate and memory footprint.

[0031] The visibility determination processes used to enable massive model visualization are not without their faults as they inherently introduce disocclussion artifacts. Disocclussion artifacts occur whenever a visible shape is not rendered for one or more frames while it is visible causing a visible popping effect when the shape is rendered. This behavior is often the result of the visibility determination algorithm's inability to keep up with the visibility state changes that occur within a tree as the camera is moved through the scene. This behavior can also occur if the loader should fail to load the data before it is need.

[0032] GPU-based occlusion tests have been shown to be an effective tool for improving rendering performance in both industry and games. Disclosed embodiments include novel improvements to a GPU based occlusion strategy for improving performance and reducing disocclusion artifacts.

[0033] GPU Based Depth Buffer Reprojection: GPU based occlusion tests require a depth buffer be prepopulated with the depth values of potential occluders. Most approaches accomplish this by rendering either a potential occluder list or the existing render list into the depth buffer. On large models this can involve rendering millions of triangles, far more than there may be pixels on the screen, at significant cost. A commonly used principle with MMV techniques is frame to frame to coherence or the notion that the visibility state of occurrences will not change significantly between frames. From this it can be extrapolated that the depth buffer used for occlusion culling will not alter significantly given the same. Disclosed embodiments show how the depth buffer from a previous frame can be reprojected into the current viewpoint through the use of a sub-pixel mesh and a vertex shader to generate an approximation of the current depth buffer at near to no cost.

[0034] Disclosed embodiments can perform a batch query with spatial update, which is a significant improvement over previous approaches. It demonstrates how buffer write based occlusion culling can be applied to a spatial hierarchy without sacrificing the inherit benefits of previous front based approaches. The ability to query all cells in a single draw call allows for increased parallelism to be achieved on the GPU while still maintaining the ability to limit the scope of data loads and visibility changes within the hierarchy. [0035] Occlusion queries provide a means by which the GPU can be used to determine if a given set of primitives contribute to the final image and therefore frequently serve as the primary visibility test in massive model visibility determination algorithms.

[0036] GPU-based buffer write has been shown to be a viable alternative to GPU occlusion queries as it allows the visibility of all entities of interest to be obtained with a single draw call in order to significantly increase parallelism on the GPU. Disclosed embodiments show how this can be effectively combined with a spatial hierarchy in order to increase its scalability to arbitrarily large data sets.

[0037] In various embodiments, the spatial hierarchy is represented as a tree structure that stores the spatial hierarchy data for each cell. A disclosed spatial hierarchy is based on a bounding volume hierarchy over occurrences. Each cell within the tree contains its bounding volume information, occurrence(s), and children cell(s). The same occurrence can appear in multiple cells and occurrences contained within a cell can be dynamically determined at run time. The spatial hierarchy can be partitioned using any number of different algorithms, such as implemented: Median Cut, Octree-Hilbert, and Outside-In.

[0038] The bounding volume over occurrence allows the visibility state of a given cell to be directly translated to the visibility state of the occurrence. It also allows the occurrences that are contained within a given cell to be dynamically configured as cells become visible. Both of these features are useful for integrating directly against a PDM and enabling visibility guided interaction.

[0039] The spatial hierarchy supports having the same occurrence in multiple cells. This allows for a better subdivision while still allowing for cell visibility to be traced back to a specific occurrence.

[0040] A query representation is dynamically generated for each cell at run time. This allows for the representation to be matched to the visibility determination algorithm being used. For example, the system can dynamically generate a set of triangles representing the bounding volume using OpenGL occlusion queries. [0041] The renderlist render process renders the list of all visible occurrence as efficiently as possible. The data structure was designed to utilize modern GPU functionality while minimizing the potential L2 cache impact. The current implementation is based around rendering unified vertex buffer objects, multiple shapes in the same VBO, with state information passed into shader through uniform buffer objects.

[0042] There are several options for reprojecting the depth values from one view point into another to create an approximation of the current depth buffer. The depth values could be read back to the host in order to generate a traditional texture depth mesh which in turn could be rendered from the current view point in order to populate the depth buffer. The cost of reading the depth buffer back is far too expensive for this to be practical as taking a performance hit to read the depth buffer back immediately would defeat the purpose, and delaying such that the depth buffer is from 2-3 frames prior has a greater potential to introduce artifacts. The depth buffer could be treated as a point cloud that is easily transformed into the new view point as part of a vertex shader. While this has the potential to be extremely fast, it would cause holes in the calculated depth buffer as deeper points will travel farther as the model is rotated. While conservative from a visibility perspective, this has the potential to cause a significant amount of hidden geometry to be loaded as it is briefly marked as visible. An alternative to the point cloud is to render quads that span between pixels in the depth buffer so as to produce a depth buffer that errs on the side of caution when it comes to hidden objects becoming visible. In a worst case scenario, this could require an additional strategy iteration before some falsely-hidden occurrences become visible.

[0043] To support reprojecting of the depth buffer from a previous frame, a frame buffer object (FBO) with a depth texture render target can be used to capture the state of the depth buffer after the rendering of all visible opaque geometry has completed by blitting the buffer from the main frame buffer. During the render depth process, the sub-pixel mesh described above is rendered using a vertex shader to dynamically transform all the vertices from the previous view point to the current view point using the values from the depth texture as the initial depth offset of the vertices. If, during the fragment shader, a fragment is detected as having had an initial depth value at the depth buffer maximum, it is discarded. This ensures depth values are only propagated for those pixels caused by rendered geometry.

[0044] Some approaches utilize OpenGL occlusion queries to decide the visibility state of cells within a spatial hierarchy. One basic algorithm is to traverse a spatial hierarchy in a screen depth first order and execute an individual occlusion query for each cell whose visibility state is question. These approaches result in the alternative representations of each cell being individually rendered along this front as well as multiple state transfers in order read back the results from the queries. Modern GPUs run optimally when processing large batches of data in parallel. In terms of render this means pushing as many triangles as possible in a single draw call which runs counter to the way that traditional occlusion queries are executed.

[0045] Disclosed embodiments demonstrate how the visibility determination process utilizes buffer writes instead of occlusion queries. The basic approach includes allocating an integer buffer object for storing occlusion results and executing a render operation that renders the shape as a single batch and populates the buffer with the visibility state of individual cells.

[0046] In order for the occlusion queries to be executed in parallel, the cell alternative representations can be combined into a single draw call that ensures all the representation are rendered as a single batch while still maintaining a means by which fragments can be traced back to the originating cells. In order to ensure a single batch, the alternative representation can be rendered, for example, using glDrawRangedElements and triangles or glDrawRangeElements with GL Primitive Restarts and TriStripSets. Triangle-based rendering does not provide an inherit means to trace the resulting fragments back to the originating cell, so disclosed embodiments can add an additional per-primitive attribute that contains the cell's ID that can be forwarded from the vertex shader into the fragment shader. Encoding each alternative representation as a single TriStripSet allows each cell to be uniquely identified within the fragment shader by using the primitive ID. Both methods can be integrated into the current visibility determination process in lieu of GL occlusion queries.

[0047] Instead of reading back individual occlusion queries, disclosed processes can map the buffer into memory to provide access to the pixel values for all cells in a single operation. When updating the visibility results in the render data, all values are initially set to 0. A tree traversal is then used to propagate the values from the buffer to render data associated with each cell. Traversal along a given path is terminated if a cell is not considered visible, the cell has not been configured, or the geometric data associated with cell has not been loaded. This embodiment queries the visibility state of all cells in the spatial hierarchy eliminating the need to post propagate the visibility state of children cells to their parents as commonly found in approaches based upon GL occlusion queries.

[0048] Disclosed embodiments include novel improvements that are effective in not only improving the performance of a PDM or CAD system, but also in reducing some of the undesirable artifacts that often occur when using culling techniques.

[0049] Reprojecting the depth buffer from one frame into another in order to exploit frame to frame coherency has been shown as an effective alternative to the traditional approach of rendering the previous frames render list to populate the depth buffer. Whereas similar techniques require that the depth buffer is read back to the CPU for processing, this approach was able to make use of advanced shaders to re-project the depth buffer from one view to another without the data ever leaving the GPU. On several data sets it was shown that this new algorithm practically eliminates the measured cost of producing the required depth buffer for occlusion culling. Other embodiments can improve upon the performance gains by utilizing a blit pixel approach when the view is detected as the same or down sample the captured depth buffer into a smaller viewport a reprojection time.

[0050] Disclosed embodiments show that buffer writes provide a viable alternative to the tradition GL occlusion queries. Whereas traditional occlusion queries are limited to only querying a single entity at a time, buffer write can be used to query several entities in one go. Through this approach it was shown that buffer write can be effectively used to query the visibility state of an entire spatial hierarchy in the time it would normally take to query only a handful of it cells. In order to ensure that only the truly visible geometry is loaded, the traversal of the spatial hierarchy for update is stopped whenever an unloaded cell is encountered. Further, the query set associated with the spatial hierarchy can be split in to smaller sets such that higher level sets can be used to filter on whether lower level sets even need to be queried. This allows the number of cells that need to be queried to be limited, which is important on extremely large spatial hierarchies. Various embodiments can also limit cell visibility state changes to only those cells within a certain distance of the spatial hierarchy visibility front in order to further minimize disocclusion artifacts and potentially improve data load behavior. The spatial hierarchy visibility front refers to the point at which cells transition from visible to culled along a given path of traversal.

[0051] According to various disclosed embodiments, occlusion query solutions can be placed into three primary categories: CPU Based culling, GPU Occlusion Queries, and GPU Texture Write.

[0052] CPU Based culling: Systems in this category rely on the CPU based algorithm as the primary source of occlusion culling. There has been a recent resurgence in this approach in the game industry as the GPU is often viewed as a scarce resource best left for more important tasks such as rendering. Prime examples of this are the FrostBite 2 and CryEngine 3 games engines. Both of these approaches use a software rasterizer on sub thread to execute screen space culling of objects based upon their AABBs or OBBs. One problem with these approaches is that they assume the increases in number of available CPU cores will help them to perform as well as, if not better than, hardware optimized for handling this very problem and whose performance gains far outstrip the CPU. Another problem with these approaches is they do not take into account that GPUs and their APIs are fast approaching the point of executing the entire culling and render list generation process on the GPU.

[0053] GPU Occlusion Queries: Systems in this category rely on GPU Occlusion Queries as the primary source of occlusion culling. GPU Gems 2 showed that it was possible to implement an algorithm that interleaves the rendering of visible occurrence with GL Occlusion queries over a spatial hierarchy in order to minimize GPU stalls when retrieving occlusion results. This solution was adapted and used with success to increase the render performance. The disclosed MMV solution utilizes an iterative solution where previous occlusion queries are retrieved occlusion queries are executed on visibility front in the spatial hierarchy every odd frame in order to avoid GPU stalls. The primary problems with using GL occlusion queries is their parallelism is limited as entities must be queried one at a time.

[0054] GPU Buffer Write: Disclosed embodiments include systems and methods for executing batch occlusion queries on the GPU over the cells of a spatial hierarchy. The GPU realizes the most parallels when the number of triangles rendered or queried by a single draw call is large. This is because the GPU is typically not allowed to overlap computation between successive draw calls. The system operates by generating a single vertex buffer object (VBO) that contains the bounding volumes for all cells within a spatial hierarchy (SH). A multi draw elements indirect (MDEI) buffer is then setup such that the bounding volume of each cell is uniquely referenced. During the strategy the depth buffer is populated with the potential occluder by either rendering the previous renderlist or reproject the depth buffer from the previous frame. A single draw call is then executed using the MDEI buffer and carefully crafted fragment shader that atomically increment the pixel value associated with the unique ID of a bounding volume of a buffer object whenever one of its fragment passes the depth test. The resulting buffer is copied into another buffer that been persistently mapped into a pointer on the host. This pointer is then indexed parallel to the cells in the SH in order to retrieve the current pixel value for each cell. These values are used for generating the render list, loading data, and selecting LODs. For larger spatial hierarchies, the cell list may be split into multiple segments representing different sub regions of the spatial hierarchy. The results for cells that are higher in the spatial tree can then be used to determine if a particular sub region needs to be queried, significantly reducing the number of bounding volumes that may need to be rendered in a given frame. [0055] This system is different from CPU-side approaches as it utilizes parallelism inherent to the GPU to accelerate the occlusion tests.

[0056] This system is different from occlusion-query based solutions as it is capable of executing all queries and retrieving all results with single call. Furthermore the system can query any subset of the cells in the spatial hierarchy.

[0057] This solution is different from existing texture write based approaches as disclosed embodiments are designed to operate on the cells of spatial hierarchy, whereas other techniques are designed around occurrences. This allows achievement of better scalability as the spatial hierarchy allows reduction of the number of occurrences that may be accidentally loaded and allows significant reduction in the number of bounding volumes that need to be rendered by breaking the spatial hierarchy into multiple sub- regions and using the results from higher regions to determine if a sub region needs to be checked. Additionally, the spatial hierarchy can be leveraged when updating the visibility state so as to minimize the impact when first entering a region in which the current depth information is unknown.

[0058] The massive model rendering process can be split into two main pipeline stages: rendering and strategy. The render stage is responsible for generating the on-screen image through multi-stage rendering of a render list. The strategy stage is responsible for generating a render list of visible geometry to be used in the render stage.

[0059] According to disclosed embodiments, the render stage can be broken into four primary sub-stages. The first stage is the rendering of opaque geometry into both the color and depth buffers. The second stage is the blitting of the current depth buffer into a depth texture for use in the spatial strategy. The third stage is rendering transparent geometry into both the color and depth buffers. The fourth and final stage of rendering is again the blitting of the depth buffer into a depth texture for potential use in the spatial strategy. [0060] According to disclosed embodiments, the strategy stage can be broken into four sub-stages: obtain results, update renderlist, render depth, and execute query. The strategy is responsible for executing occlusion queries.

[0061] The render depth stage is responsible for populating the depth buffer on the GPU with the depth values for potential occluders. This is accomplished by either rendering the previous render list or by utilizing techniques as described in the provisional patent application incorporated herein.

[0062] The execute query stage is responsible for executing occlusion queries over the cells of a spatial hierarchy in order to determine if they are culled or visible from the current viewpoint. According to disclosed embodiments, occlusion queries are executed by rendering the bounding volumes of all cells in the spatial hierarchy in single draw call and utilizing a fragment shader to atomically increment the pixel associated with a cell each time one of the fragments produced by its bounding volume is not culled by the depth buffer. This operation results in a buffer that tallies the number of visible fragments for each cell in the spatial hierarchy. For large spatial hierarchies the cells may be queried in groups so as to reduce the number of bounding volumes that need to be rendered in order to determine the visibility of all cells. When this occurs, the results from the previous frame for higher-level groups can be used to determine if a lower level group should query.

[0063] The obtain results stage is responsible for retrieving the results from the queries. It iterates through the cells on the spatial hierarchy and retrieves the visibility results for each cell from an index parallel buffer object that has been persistently mapped on the CPU. Iteration of child cells may be stopped in order to ensure any occurrences associated with a visible parent are loaded and rendered prior to potentially marking the child as visible. This optimization helps to limit the number of cells that are accidentally marked as visible, and therefore configure or load their occurrences, in regions of space in which the current depth buffer is unknown. [0064] According to various embodiments, the buffer object used for capturing the per cell pixel value is made available to the CPU in a form that does not block the GPU. This data structure is designed to be indexed parallel to the cells in spatial hierarchy.

[0065] The update renderlist stage is responsible for generating a renderlist based upon the current visibility results to be used in the primary render stage. It iterates through the cells on the spatial hierarchy and for any cell that is marked as visible checks to see if there are associated occurrences. If there are occurrences and they are loaded they are inserted into the renderlist. If there are occurrences, but they are not loaded, a request for loading may occur here.

[0066] Figures 3 and 4 demonstrate how a spatial hierarchy 300/400 can be mapped for batch query of the entire spatial hierarchy, or in the case of an extremely large hierarchy, multiple sets for batch query. The spatial hierarchy is implemented as a vector of cells in which each cell can be uniquely identified by index. The structure of the hierarchy is established through each cell containing a parent index, a child index, and the number of children. In order to facilitate several algorithmic optimizations, the cells are defined in a depth first order that guarantees that children cells have a larger index and are grouped such that all cells within a sub-tree have contiguous indices as shown in Fig. 3. By contrast, a breadth-first ordering as illustrated in Fig. 4 results in multi-query groups where not all cells in a sub-tree have contiguous indices, as illustrated by the sub-trees of node two, which would include cells 4, 5, 8, 9, 10 and 11.

[0067] A buffer object containing integer values is allocated in parallel to the cells of the spatial hierarchy. When executing occlusion tests, if a fragment associated with the bounding volume of a cell should pass the depth test, the value contained within the associated element is incremented, resulting in the buffer contain the pixel hit count for all tested cells. A secondary buffer is allocated and persistently mapped, such that values can be read back from it through a direct pointer access. At the end of each occlusion pass, the values from the primary buffer are copied into this secondary buffer. This subtle enhancement allows the primary buffer to remain only in GPU memory which significantly improves the performance of the write operations. [0068] Multi-Draw Indirect: Modern GPUs are optimized for executing work in parallel. The best performance is therefore achieved when processing large batches. From a render perspective this means packing as much geometry as possible into each draw call. In order to help better facilitate this, several new draw calls have been added to the OpenGL specification with one of particular interest that was formalized as part of OpenGL 4.3 being glMultiDrawElementlndirect.

[0069] The system batches the rendering of the cell bounding volumes to perform occlusion tests. It defines the batches in such a way that the associated cell for any given volume can be uniquely identified in both the vertex and fragment shader.

[0070] The geometric information for rendering the bounding volumes of all cells is stored sequentially by the system in a single vertex buffer object (VBO) pair. Figure 5 illustrates a Multi Draw Elements Indirect (MDEI) buffer 502, index VBO 504, and vertex VBO 506 in accordance with disclosed embodiments. The MDEI buffer is parallel to the number of the cells and is initialized such that each DrawElementsIndirectBuffer contains the Firstlndex and BaseVertex for the geometric information of the corresponding cell in the VBO and the Baselnstance is set to the corresponding cell index. This setup allows all cells or a sub-tree of cells to be rendered in a single draw, if so desired, and, more importantly, allows the fragment shader to readily identify the associated cells' bounding volume for a given fragment.

[0071] Disclosed embodiments have the potential to significantly increase the rendering performance of MMV systems. Early tests have shown a 2-3x increase in the frame rate of several large data sets. It also improves upon the accuracy of occlusion tests and the responsiveness to changes in the visibility state of the spatial hierarchy cells. The batch query contributes significantly to the increase in performance. Various embodiments eliminate the use of expensive GL Occlusion queries and reduce the number of occlusion tests / result retrievals from the number of cells on the front to 1 or the number of visible batches in the case of a large SH.

[0072] The batch query over cells significantly contributes to performance, accuracy, and responsiveness. Various embodiments update the pixel value for all cells every frame and improves the level of detail (LOD) selection based upon cell pixel value. Various embodiments allow cell-visibility state changes to be reflected almost immediately in the SH. The single draw call for all queries significantly improves performance. Various embodiments reduce CPU overhead and increase parallelism on the GPU. The multiple batch sets for large spatial hierarchies significantly improves performance and scalability. Various embodiments reduce the number of bounding volumes to be rendered in order to determine visibility of all cells.

[0073] Figure 6 illustrates a process in accordance with disclosed embodiments that can be performed, for example, by one or more data processing systems 100, referred to generically as "the system" below. This figure provides an overview flowchart of a disclosed massive model rendering process in accordance with disclosed embodiments, including a GPU occlusion query with batch query of spatial hierarchy.

[0074] Fig. 6 illustrates one example of a high-level process, while Figs. 7, 8, and 9 illustrate subprocesses that can be used as part of various steps in the process of Fig. 6.

[0075] The system initializes a rendering process for a 3D model (605). This process can include receiving the 3D model and otherwise initializing the data structures and GPU for performing a rendering process as described herein. The 3D model can have multiple parts or assemblies, and when rendered as a solid model, only some portions of the 3D model should be displayed for any giving "viewpoint" while other portions are occluded (such as portions facing the backside of the model as compared to the viewpoint or portions "behind" other portions of the model). This process can include generating a query representation for each cell in a spatial hierarchy.

[0076] Figure 7 illustrates an example of subprocesses that can be performed as part of step 605.

[0077] The system can receive the 3D model (705). "Receiving," as used herein, can include loading from storage, receiving from another device or process, receiving via an interaction with a user, and otherwise. Receiving the 3D model can be implemented by receiving a product structure of the 3D model, as described above with respect to Fig. 2. [0078] The system can generate one or more buffers for displaying the 3D model (710). In various embodiments, this can include a buffer that contains the number of visible fragments for each cell in the spatial hierarchy, and can include a texture buffer that is persistently mapped by the system to a pointer that can be offset by cell index in order to retrieve the results for a particular cell.

[0079] Each cell represents a geometric bounding volume that encompasses some portion of the 3D model, and preferably every portion of the 3D model is included in some cell. The cell data is stored in a spatial hierarchy that represents the spatial location of each cell and its respective portions of the 3D model. In this way, the spatial hierarchy can identify the spatial/geometric location of any part, assembly, subassembly, or other portion of the 3D model according to its cell. The cells can be processed in spatial- hierarchy groups so as to reduce the number of bounding volumes that need to be rendered in order to determine the visibility of all cells.

[0080] The system can map the buffers into memory (715).

[0081] The system can generate depth textures and FBO (720). The frame buffer object (FBO) with a depth texture render target can be used to capture the state of the depth buffer after the rendering of all visible opaque geometry has completed. The system can use the GPU to generate a depth texture with a same pixel format as a source depth buffer. The system can use the GPU to generate a frame buffer object and bind a depth texture as a render target.

[0082] The system can generate query groups (725). For large models or large spatial hierarchies, the occlusion queries can be processed in query groups corresponding to regions of the spatial-hierarchy.

[0083] Returning to Fig. 6, the system executes a rendering stage on the 3D model (610). The rendering stage can include renderlist generation that uses visibility determination to create list of all occurrences and their associated state that contribute to the current view of the 3D model. [0084] Figure 8 illustrates an example of subprocesses that can be performed as part of step 610.

[0085] The system can render opaque geometry of the 3D model from an opaque renderlist (805). As part of this process, the system can generate an opaque renderlist based upon the current visibility that identifies each portion of the model that is visible and opaque. The system can iterate through the cells in the spatial hierarchy and, for any cell that is marked as visible, check to see if there are associated occurrences. If there are occurrences and they are loaded, they are inserted into the opaque renderlist, and then the opaque geometry of the opaque renderlist is rendered. The system can use the GPU to blit a current depth buffer into a frame buffer object after all opaque geometry has finished rendering.

[0086] The system can capture the depth of a plurality of pixels resulting from the rendering of the opaque renderlist (810) and store it in a depth buffer.

[0087] The system can render transparent geometry of the 3D geometric model from a transparent renderlist (815). As part of this process, the system can generate a transparent renderlist based upon the current visibility that identifies each portion of the model that is visible but transparent. Further transparent occurrences can be ignored as they are not likely to occlude other occurrences.

[0088] The system can capture the depth of a plurality of pixels resulting from the rendering of the opaque and transparent renderlist (810) and store it in a depth buffer.

[0089] Returning to Fig. 6, the system executes a strategy stage on the 3D model (615). In various embodiments, the strategy stage includes at least executing one or more occlusion queries over cells of a spatial hierarchy in order to determine if each cell is culled or visible from a current viewpoint. As part of this step, and as described in more detail herein, the system projects contents of a depth buffer for a current view of the 3D model from contents of the depth buffer from a previous view of the 3D model. [0090] The strategy stage can include one or more of an obtain results substage, and an update rendering substage, and can include one or more of a render depth substage, a reproject depth substage, a render renderlist substage, and an execute query substage.

[0091] Figure 9 illustrates an example of subprocesses that can be performed as part of step 615.

[0092] The system can update the renderlist (905). In various embodiments, to update the renderlist, the visibility value for all occurrences is initially set to 0. The spatial hierarchy tree is then traversed such that the pixels value for all visible cells can be propagated to their contained occurrences. If an occurrence is referenced by multiple visible cells, the largest pixel value is used. Once traversal is complete all occurrences with a pixel value greater than a preset threshold are harvested into a render list with the level of detail selection for a particular occurrence being based upon its pixel value. The resulting render list can then be sorted based upon material properties so as to minimize the amount of state changes that must occur whenever it is rendered.

[0093] The system can render depth (910). During the render depth process, the renderlist from the previous frame can be rendered slightly offset back into the depth buffer in order initialize it for executing GPU-based occlusion queries. The bound state during rendering is limited to only that state that can potentially influence the depth buffer results. Further transparent occurrences are ignored as they are not likely to occlude other occurrences.

[0094] The system can render the renderlist or reproject the depth (915). The renderlist render process renders the list of all visible occurrence as efficiently as possible. As described herein, reprojecting the depth includes projecting contents of a depth buffer for a current view of the 3D model from contents of the depth buffer from a previous view of the 3D model.

[0095] The system can execute occlusion queries over the cells of a spatial hierarchy in order to determine if they are culled or visible from the current viewpoint (935). [0096] Returning to Fig. 6, the system displays the 3D model according to the rendering stage and the strategy stage (620).

[0097] Disclosed embodiments include systems and methods for doing GPU based depth reprojection for accelerating depth buffer generation. Various embodiments operate by capturing the depth buffer at one view point and dynamically reprojecting the depth buffer at another view point without the data ever having to the leave the GPU. The depth buffer is reprojected using a sub-pixel mesh that allows for the depth buffer to be automatically up or down sampled as necessary and naturally prevent gaps from forming as part of the reprojection.

[0098] Figure 10 illustrates an example of a sub-pixel mesh 1000 in accordance with disclosed embodiments. The sub-pixel mesh is used to ensure the reprojected depth buffer does not produce gaps and is conservative in nature. It contains a grid of points 1010 that is 1 pixel wider in each direction than the number of pixels in the originating render context. The texture coordinate of the inner points correspond to the pixel coordinates of the originating render context. The exterior points duplicate the texture coordinates of their neighboring interior points. A quad or tristripset is formed between each set of neighboring points thus preventing gaps from forming as the points are reprojected from one viewpoint to another. The exterior points provide additional guards when translating or rotating a pixel that was previously at the edge of the window such that off-screen geometry is treated as though it is at the same depth value as the edge pixel.

[0099] This system differentiates from prior systems in many ways, including, but not limited to, in that it is designed to operate purely on the GPU with no need of ever transferring data back to the CPU. In doing so, it is able to achieve significant performance gains and significantly reduce the amount of lag between when a depth buffer is captured and when it can possibly be used. This system also differentiates from prior systems in the fact that it requires only a single pass to complete the reprojection. Prior approaches require multiple passes for down sampling the depth buffer and filling in gaps that occur as part of the reprojection. The use of a sub-pixel mesh naturally prevents gaps from occurring and allows for the depth buffer to be dynamically down samples if so desired.

[0100] The system differentiates from the other systems in other ways as well. For example, disclosed embodiments require only a single pass for generating the occlusion depth buffer. As anther example, disclosed embodiments can execute the depth reprojection at full resolution. As anther example, disclosed embodiments do not generate or rely on generation of a depth hierarchy.

[0101] According to disclosed embodiments, reprojecting the depth buffer from one frame into another in order to exploit frame to frame coherency has been shown as an effective alternative to the traditional approach of rendering the previous frames render list to populate the depth buffer. Whereas other techniques require that the depth buffer is read back to the CPU for processing, this approach was able to make use of advance shaders to reproject the depth buffer from one view to another without the data ever leaving the GPU. Such a process can completely eliminate the measured cost of producing the required depth buffer for occlusion culling. Other embodiments improve upon the performance gains by utilizing blit pixel when the view is detected as the same or down sample the captured depth buffer into a smaller viewport a reprojection time.

[0102] Disclosed embodiments show that texture writes provide a viable alternative to GL occlusion queries. Whereas traditional occlusion queries are limited to only querying a single entity at a time, texture write can be used to query several entities in one go. Texture write can be effectively used to query the visibility state of an entire spatial hierarchy in the time it would normally take to query only a handful of it cells. Other embodiments can split the spatial hierarchy into smaller query sets such that higher level sets can be used to filter on whether lower level sets even need to be queried, or the set of cells being queried can be adjusted based upon the current visibility front in effort to reduce the overhead on larger spatial hierarchies. Other embodiments can also limit cell visibility state changes to only those cells within a certain distance of the front in order to further minimize disocclusion artifacts and potentially improve data load behavior. [0103] Reprojection issues can be addressed using an iterative GPU culling algorithm that interleaves visible geometry render with the executing of occlusion queries. The depth buffer is naturally established when rendering visible geometry.

[0104] Various embodiments address this problem using a non-iterative GPU culling algorithm by rendering the opaque render list into the depth buffer. This approach can be used for generating a depth buffer whenever it is needed for things such as early-z culling so as eliminate overdraw.

[0105] Disclosed embodiments include a Depth Buffer that stores the Z coordinates of each pixel of a rendered frame. Disclosed embodiments can include occlusion culling to avoid rendering of objects that cannot be seen by the current camera. Disclosed embodiments include reprojection to change the coordinate system of data from one system to another.

[0106] According to various embodiments, the process can be performed completely on the GPU, significantly improving performance. It eliminates the performance penalty of having to copy the depth buffer back to the CPU that is common in other algorithms. It reduces the amount of lag between when the depth buffer was produced and when it can be used for reprojection. It allows for the execution of the reprojection and any down sample in a single pass that produces a conservation depth buffer without gaps. Disclosed techniques can be applied to other rendering algorithms that rely on a prepopulated depth buffer.

[0107] Disclosed embodiments include a method for MMV performed by a GPU of a data processing system. A method includes executing a rendering stage on a 3D geometric model. The method includes executing a strategy stage on the 3D geometric model. The method includes displaying the 3D geometric model according to the rendering stage and strategy stage. In various embodiments, the rendering stage includes rendering opaque geometry of the 3D geometric model from an opaque renderlist, capturing opaque depth of a plurality of pixels in the opaque renderlist, rendering transparent geometry of the 3D geometric model from a transparent renderlist, and capturing transparent depth of a plurality of pixels in the transparent renderlist. In various embodiments, the strategy stage includes an obtain results substage, an updated culling substage, and an update rendering substage. In various embodiments, the strategy stage includes a render depth substage, a reproject depth substage, a render renderlist substage, and an execute query substage. In various embodiments, the GPU generates a depth texture with a same pixel format as a source depth buffer. In various embodiments, the GPU generates a frame buffer object and binds a depth texture as a render target. In various embodiments, the GPU blits a current depth buffer into a frame buffer object after all opaque geometry has finished rendering.

[0108] Of course, those of skill in the art will recognize that, unless specifically indicated or required by the sequence of operations, certain steps in the processes described above may be omitted, performed concurrently or sequentially, or performed in a different order.

[0109] Those skilled in the art will recognize that, for simplicity and clarity, the full structure and operation of all data processing systems suitable for use with the present disclosure is not being depicted or described herein. Instead, only so much of a data processing system as is unique to the present disclosure or necessary for an understanding of the present disclosure is depicted and described. The remainder of the construction and operation of data processing system 100 may conform to any of the various current implementations and practices known in the art.

[0110] The systems and methods disclosed herein can be combined or modified with features of other systems and methods as discussed in the following documents, each of which is hereby incorporated by reference:

• A. Dietrich, E. Gobbetti, D. Manocha, F. Marton, R. Pajarola, P. Slusallek and S.- e. Yoon, "Interactive Massive Model Rendering," in ACM SIGGRAPH ASIA 2008 Courses, Singapore, 2008.

• D. Kasik, "Visibility-guided rendering to accelerate 3D graphics hardware performance," in ACM SIGGRAPH 2007 Courses, San Diego, CA, ACM, 2007. D. Aliaga, J. Cohen, A. Wilson, E. Baker, H. Zhang, C. Enkson, K. Hoff, T. Hudson, W. Stuerzlinger, R. Bastos, M. Whitton, F. Brooks and D. Manocha, "MMR: an interactive massive model rendering system using geometric and image-based acceleration.," in Proc. 1999 Symp. Interactive 3D Graph., Atlanta, GA, 1999, pp. 199-206.

D. Bartz, D. Staneker, W. Straber, B. Cripe, T. Gaskins, K. Orton, M. Carter, A. Johannsen and J. Trom, "Jupiter: A Toolkit for Interactive Large Model Visualization," in Proc, IEEE 2001 Symp. Parallel and Large-data Visualization and Graph., Piscataway, NJ, IEEE Press, 2001, pp. 129-134.

B. Bruderlin, M. Heyer and S. Pfutzner, "Interviews3D: A Platform for Interactive Handling of Massive Data Sets," IEEE Comput. Graph. AppL, vol. 27, no. 6, pp. 48-59, 2007.

D. J. MacDonald and K. S. Booth, "Heuristics for ray tracing using space subdivision," Vis. Comput, vol. 6, no. 3, pp. 153-165, 1990.

V. Harvan, "Analysis of cache sensitive representations for binary space partitioning trees," Informatica, vol. 29, no. 3, pp. 203-210, 1999.

E. Gobbetti and F. Marton, "Far Voxels: A Multiresolution Framework for Interactive Rendering of Huge Complex 3D Models on Commodity Graphics Platforms.," ACM Trans. Graph., vol. 24, no. 3, pp. 878-885, 2005.

J. Bittner, M. Wimmer, H. Pringer and W. Purgathofer, "Coherent hierarchical culling: Hardware Occlusion Queries Made Useful," Comput. Graph. Forum, vol. 23, no. 3, pp. 615-624, 2004.

J. Anderson, "Parallel Graphics in Frostbite - Current & Future," in ACM SIGGRAPH 2009 Courses, New Orleans, 2009.

A. Kaplanyan, "CryEngine 3: Reaching the speed of light," in ACM SIGGRAPH Courses, Los Angeles, 2010. N. Kasyan, N. Schutlz and T. Sousa, "Secrets of Cry ENGINE 3 Graphics Technology," in ACM SIGGRAPH Courses, Vancouver, 2011.

M. Tavenrath and C. Kubisch, "Advance Scenegraph Rendering Pipeline," in GPU Technology Conf, San Jose, 2013

U. Haar and S. Aaltonen, "GPU-Driven Rendering Pipelines," in ACM SIGGRAPH Courses, Los Angeles, 2015.

O. Mattausch, "Visibility Computations for Real-Time Rendering in General," in Ph.D. dissertation, Institute of Comput. Graph, and Algorithms, Vienna Univ. of Technology, Vienna, 2010.

R. W. F. Xiong, "A Stratified Rendering Algorithm for Virtual Walkthroughs of Large Environements," M.S. thesis, Elect, and Comput. Sci., Massachusetts Institute of Technology, Boston, MT, 1996.

J. Shade, D. Lischinski, D. Salesin, T. DeRose and J. Snyder, "Hierarchical Image Caching for Accelerated Walkthroughs of Complex Environments," in Proc. 23rd Ann. Conf. Comput. Graph, and Interactive Techniques, New York, NY, ACM, 1996, pp. 75-82.

P. W. C. Maciel and P. Shirley, "Visual Navigation of Large Environments Using Textured Clusters," in Proc. 1995 Symp. Interactive 3D Graph., New York, NY, ACM, 1995, pp. 95-102.

D. A aga, J. Cohen, A. Wilson, E. Baker, H. Zhang, C. Enkson, K. Hoff, T. Hudson, W. Stuerzlinger, R. Bastos, M. Whitton, F. Brooks and D. Manocha, A Framework for real-time walkthroughs of massive models, Tech. Rep. UNC TR# 98-013, Comput Sci. Dept., Univ. of North Carolina at Chapel Hill, 1998.

S. Jeschke and M. Winnier, "Textured Depth Meshes for Real-Time Rendering of Arbitrary Scenes," in Proc. 13th Eurographics Workshop on Rendering of Arbitrary Scenes, Aire-la-Ville, Switzerland, Eurographics Association, 2002, pp. 181-190.

A. Wilson and D. Manocha, "Simplify Complex Environments using Incremental Textured Depth Meshes," in ACM Trans. Graph., New York, NY, ACM, 2003, pp. 678-688.

R. Pajarola, M. Sainz and Y. Meng, "Depth-Mesh Objects: Fast Depth-Image Meshing and Warping," Tech. Rep. UCI-ICS-03-02, The School of Inform, and Comput. Sci., Univ. of California, Irvine, 2003.

AMD Developer Relations, "GCN Performance Tweets," Advanced Micro Devices, Sunnyvale, 2013.

H. Zhang, D. Manocha, T. Hudson and K. E. Hoff III, "Visibility Culling using Hierarchical Occlusion Maps," Proc. 24th Annu. Conf. Comput Graph, and Interactive Techniques, pp. 77-88, 1997.

3Dinteractive, "Products - 3Dinteractive GmbH," 2009. [Online]. Available: http://www.3dinteractive.de/products/products.html. [Accessed 22 04 2009].

Out Of Control, "OpenGL," The Khronos Group, 2013. [Online]. Available: http://www.opengl.org/registry/. [Accessed 6 12 2013].

C. T. Silva and W. T. Correa, "Method for out-of core rendering of large 3D models". United State of America Patent 6,933,946, 23 August 2005.

C. Riccio and S. Lilley, "Introducing the Programmable Vertex Pulling Rendering Pipeline," in GPU Pro 4, New York, A K Peters/CRC Press, 2013, pp. 21-38.

T. Akenine-Moller, E. Haines and N. Hoffman, Real-Time Rendering, Natick, MA: AK Peters, 2008.

U. Assarsson and T. Moller, "Optimized View Frustum Culling Algorithms," Tech. Rep. 99-3, Chalmers Univ. of Technology, Sweden, 1999. B. Chamberlain, T. DeRose, D. Lischinski, D. Sales and J. Snyder, "Fast rendering of complex environments using a spatial hierarchy.," in Proc. Conf. Graph. Interface '96, Toronto, Ontario, Canada, Canadian Information Processing Society, 1996, pp. 132-141.

U. Assarsson and T. Moller, "Optimized View Frustum Culling Algorithm for Bounding Boxes," J. of Graph. Tools, vol. 5, no. 1, pp. 9-22, September 2000.

D. Bartz, M. Meibner and T. Huttner, "OpenGL assisted occlusion culling for large polygonal models.," Comput. and Graph., vol. 23, no. 3, pp. 667-669, 1999.

W. T. Correa, J. T. Klosowski and C. T. Silva, "Visibility-Based Prefetching for Interactive Out-of-Core Rendering," in Proc. 2003 IEEE Symp. Parallel and Large-Data Visualization and Graph., Washington, DC, IEE Computer Society, 2003, p. 2.

M. MeiBner, D. Bartz, T. Huttner and G. E. Miiller, Generation of Subdivision Hierarchies for Efficient Occlusion Culling of Large Polygonal Models, Tech. Rep. WSI-99-13, Dept. of Comput. Sci., Univ. of Tubingen, 1999.

J. Goldsmith and J. Salmon, "Automatic creation of object hierarchies for ray tracing," IEEE Comp. Graph, and Applications, vol. 7, no. 5, pp. 14-20, 1987.

E. Gobetti, D. Kasik and S.-e. Yoon, "Technical Strategies for Massive Model Visualization," in Proc. ACM Solid and Physical Modeling Symp., New York, NY, ACM Press, 2008, pp. 405-415.

J. Clark, "Hierarchical geometric models for visible surface algorithms," Commun. ACM, vol. 19, no. 10, pp. 547-554, 1976.

S.-e. Yoon, E. Gobbetti, D. Kasik and D. Manocha, "Real-time Massive Model Rendering," in Synthesis Lectures Comput Graph and Animation, vol. 2, Morgan and Claypool, 2008. • T. Hudson, D. Manocha, J. Cohen, M. Lin, K. Hoff and H. Zhang, "Accelerated Occlusion Culling using Shadow Frusta," in Proc. 13th Annual Symp. Computational Geometry, New York, NY, ACM, 1997, pp. 1-10.

• N. K. Govindaraju, A. Sud, S.-E. Yoon and D. Manocha, "Interactive visibility culling in complex environments with occlusion-switches," in Proc. 2003 Symp. Interactive3D Graph., Monterey, CA, 2003.

• L. Darsa, B. Costa and A. Varshney, "Navigating Static Environments Using Image-Space Simplification and Morphing," in Proc. 1997 Symp. Interactive 3D Graph., New York, NY, ACM, 1997, pp. 25-34.

• K. Weaver, "Design and evaluation of a perceptually adaptive rendering system for immersive virtual reality environments," M.S. thesis, Human Comput. Interaction, Iowa State Univ, Ames, IA, 2007.

• T. W. Group, "The Walkthru Project," Univ. North Carolina Chapel Hill, 20 March 2001. [Online]. Available: http://www.cs.unc.edu/~walk/. [Accessed 24 February 2009].

• United States Patent 6,215,496.

• United States Patent 9,076,265.

• United States Patent 6,609,474.

• United States Patent 9,053,254.

• United States Patent 6,727,899.

• United States Patent 6,933,946.

• Concurrently-filed, commonly assigned United States/PCT Patent Application for "GPU Batch Occlusion Query With Spatial Update."

[0111] It is important to note that while the disclosure includes a description in the context of a fully functional system, those skilled in the art will appreciate that at least portions of the mechanism of the present disclosure are capable of being distributed in the form of instructions contained within a machine-usable, computer-usable, or computer- readable medium in any of a variety of forms, and that the present disclosure applies equally regardless of the particular type of instruction or signal bearing medium or storage medium utilized to actually carry out the distribution. Examples of machine usable/readable or computer usable/readable mediums include: nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), and user-recordable type mediums such as floppy disks, hard disk drives and compact disk read only memories (CD-ROMs) or digital versatile disks (DVDs).

[0112] Although an exemplary embodiment of the present disclosure has been described in detail, those skilled in the art will understand that various changes, substitutions, variations, and improvements disclosed herein may be made without departing from the spirit and scope of the disclosure in its broadest form.

[0113] None of the description in the present application should be read as implying that any particular element, step, or function is an essential element which must be included in the claim scope: the scope of patented subject matter is defined only by the allowed claims. Moreover, none of these claims are intended to invoke 35 USC §112(f) unless the exact words "means for" are followed by a participle. The use of terms such as (but not limited to) "mechanism," "module," "device," "unit," "component," "element," "member," "apparatus," "machine," "system," "processor," or "controller," within a claim is understood and intended to refer to structures known to those skilled in the relevant art, as further modified or enhanced by the features of the claims themselves, and is not intended to invoke 35 U.S.C. §112(f).

Claims

WHAT IS CLAIMED IS:

1. A method for massive model visualization, the method performed by a graphics processing unit (GPU) (128) of a data processing system (100) and comprising: executing a rendering stage (610) on a three-dimensional (3D) geometric model

(226);

executing a strategy stage (615) on the 3D geometric model (226), including projecting (915) contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226); and

displaying (620) the 3D geometric model (226) according to the rendering stage (610) and strategy stage (615).

2. The method of claim 1, wherein the rendering stage (610) includes:

rendering opaque geometry (805) of the 3D geometric model (226) from an opaque renderlist;

capturing opaque depth (810) of a plurality of pixels in the opaque renderlist; rendering transparent geometry (815) of the 3D geometric model (226) from a transparent renderlist; and

capturing transparent depth (820) of a plurality of pixels in the transparent renderlist.

3. The method of claim 1, wherein the strategy stage (615) includes at least one of a render depth substage (910), a reproject depth substage (915), and an execute query substage (935).

4. The method of claim 1, wherein projecting (915) the contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226) is performed by the GPU (128), without utilizing another processor of the data processing system (100).

The method of claim 1, wherein the GPU (128) generates a depth texture with a same pixel format as a source depth buffer.

The method of claim 1, wherein the GPU (128) generates a frame buffer object and binds a depth texture as a render target and blits a current depth buffer into the frame buffer object after all opaque geometry has finished rendering.

The method of claim 1, wherein projecting (915) contents of a depth buffer is performed using a sib-pixel mesh (1000).

A data processing system (100) comprising:

a processor (102);

a graphics processing unit (GPU) (128); and

an accessible memory (108), the data processing system (100) particularly configured to

execute a rendering stage (610) on a three-dimensional (3D) geometric model (226);

execute a strategy stage (615) on the 3D geometric model (226), including projecting (915) contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226); and

display (620) the 3D geometric model (226) according to the rendering stage (610) and strategy stage (615).

9. The data processing system of claim 8, wherein the rendering stage (610) includes:

10. The data processing system of claim 8, wherein the strategy stage (615) includes at least one of a render depth substage (910), a reproject depth substage (915), and an execute query substage (935).

11. The data processing system of claim 8, wherein projecting (915) the contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226) is performed by the GPU (128), without utilizing the processor (102).

12. The data processing system of claim 8, wherein the GPU (128) generates a depth texture with a same pixel format as a source depth buffer.

13. The data processing system of claim 8, wherein the GPU (128) generates a frame buffer object and binds a depth texture as a render target and blits a current depth buffer into the frame buffer object after all opaque geometry has finished rendering.

14. The data processing system of claim 8, wherein projecting (915) contents of a depth buffer is performed using a sib-pixel mesh (1000).

15. A non-transitory computer-readable medium encoded with executable instructions that, when executed, cause a graphics processing unit (GPU) of a data processing system to:

execute a rendering stage on a three-dimensional (3D) geometric model;

execute a strategy stage on the 3D geometric model, including projecting contents of a depth buffer for a current view of the 3D geometric model from contents of the depth buffer from a previous view of the 3D geometric model; and

display the 3D geometric model according to the rendering stage (610) and strategy stage (615).

16. The non-transitory computer-readable medium of claim 15, wherein the rendering stage (610) includes:

17. The non-transitory computer-readable medium of claim 15, wherein the strategy stage (615) includes at least one of a render depth substage (910), a reproject depth substage (915), and an execute query substage (935).

18. The non-transitory computer-readable medium of claim 17, wherein projecting (915) the contents of a depth buffer for a current view of the 3D geometric model (226) from contents of the depth buffer from a previous view of the 3D geometric model (226) is performed by the GPU (128), without utilizing another processor of the data processing system (100). The non- transitory computer-readable medium of claim 15, wherein the GPU (128) generates a depth texture with a same pixel format as a source depth buffer.

The non-transitory computer-readable medium of claim 15, wherein projecting (915) contents of a depth buffer is performed using a sib-pixel mesh (1000).