CN113628316B

CN113628316B - Techniques for anisotropic texture filtering using ray cones

Info

Publication number: CN113628316B
Application number: CN202110470873.9A
Authority: CN
Inventors: T·阿肯宁-莫勒; J·博克尚斯基
Original assignee: Nvidia Corp
Current assignee: Nvidia Corp
Priority date: 2020-05-08
Filing date: 2021-04-28
Publication date: 2023-12-01
Anticipated expiration: 2041-04-28
Also published as: DE102021111812A1; CN113628316A

Abstract

Techniques for anisotropic texture filtering using ray cones are disclosed. One embodiment of a method for computing texture color includes tracking a ray cone through a graphical scene, determining at least one axis of an ellipse formed by the intersection of the ray cone with a plane associated with a geometry within the graphical scene at an impact point, computing one or more gradients along the at least one axis of the ellipse, and computing the texture color based on the one or more gradients and the texture.

Description

Techniques for anisotropic texture filtering using ray cones

Cross Reference to Related Applications

The present application claims the benefit of priority from U.S. provisional patent application serial No. 63/030162 entitled "texture filtering technique for ray tracing (Texture Filtering Technologies for Ray Tracing)" filed on month 5 and 26 of 2020, which claims the benefit of provisional patent application serial No. 63/022033 entitled "texture filtering technique for ray tracing (Texture Filtering Technologies for Ray Tracing)" filed on month 5 and 8 of 2020. The subject matter of these related applications is hereby incorporated by reference.

Technical Field

Embodiments of the present disclosure relate generally to computer science and computer graphics, and more particularly, to techniques for anisotropic texture filtering using ray cones.

Background

In three-dimensional (3D) computer graphics, ray tracing is a popular technique for rendering images (e.g., frames of a movie or video game). Ray tracing techniques track the path of a ray and simulate the effect of the ray interacting with a virtual object in a virtual scene. Ray cone tracking techniques are similar to ray tracking techniques, except that ray cone tracking techniques can track the view cone through the entire scene. The ray cone tracking technique can solve various sampling and aliasing problems that affect the ray cone tracking technique. Furthermore, ray cone tracking techniques are less computationally expensive than some other ray tracking techniques (e.g., differential ray tracking and covariance tracking).

Anisotropic filtering is a technique that can be implemented to enhance the quality of certain surfaces rendered using texture mapping techniques, where the surface at issue is at an oblique view angle relative to the virtual camera. In general, images using anisotropic filtering have less aliasing effects, reduced blurring and more detail at extreme viewing angles than images without anisotropic filtering.

Currently, there are techniques for ray-free cone tracking that implement anisotropic filtering. As a result, tracking the rendered image using the ray cone typically includes aliasing effects, blurring, and reduced detail, especially at extreme viewing angles.

As previously mentioned, there is a need in the art for more efficient techniques for rendering graphical scenes using ray cone tracking.

Disclosure of Invention

One embodiment of the present disclosure sets forth a computer-implemented method for rendering one or more images. The method includes tracking one or more ray cones through a graphical scene. The method also includes performing one or more anisotropic texture filtering operations based on the one or more ray cones to calculate a texture color. In addition, the method includes rendering one or more graphical images based on the texture colors.

Another embodiment of the present disclosure sets forth a computer-implemented method for computing texture colors. The method includes tracking a ray cone through a graphical scene. The method further includes determining at least one axis of an ellipse formed by the intersection of the ray cone with a plane associated with a geometry within the graphical scene at the point of impact. The method further includes calculating one or more gradients along at least one axis of the ellipse. Additionally, the method includes calculating a texture color based on the one or more gradients and the texture.

Other embodiments of the disclosure include, but are not limited to, one or more computer-readable media comprising instructions for performing one or more aspects of the disclosed technology and one or more computing systems for performing one or more aspects of the disclosed technology.

At least one technical advantage of the disclosed techniques over the prior art is that the disclosed techniques use ray cone tracking to achieve anisotropic filtering, resulting in an image that has less aliasing effects, less blurring, and more detail at extreme angles relative to images rendered using conventional ray cone tracking. In addition, the disclosed techniques use ray cone tracking, which is less computationally expensive than some other ray tracking techniques (e.g., differential ray tracking) that may be used to implement anisotropic filtering. These technical advantages represent one or more technical improvements over the prior art methods.

Drawings

A more particular description of the inventive concepts briefly summarized above may be had by reference to the embodiments, some of which are illustrated in the appended drawings, in order to obtain a detailed understanding of the above-described features of the embodiments. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this inventive concept and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.

FIG. 1 is a block diagram illustrating a computer system configured to implement one or more aspects of the present embodiments;

FIG. 2 is a block diagram of a parallel processing unit included in the parallel processing subsystem of FIG. 1, in accordance with various embodiments;

FIG. 3 is a block diagram of a general processing cluster included in the parallel processing unit of FIG. 2, in accordance with various embodiments;

FIG. 4 is a block diagram illustrating an exemplary cloud computing system in accordance with various embodiments;

FIG. 5 illustrates an exemplary ray cone tracked through a virtual three-dimensional scene in accordance with various embodiments;

FIG. 6 illustrates a method for performing anisotropic texture filtering using ray cones, in accordance with various embodiments;

FIG. 7 illustrates a side view of a cylindrical approximation of a ray cone in accordance with various embodiments;

FIG. 8 illustrates a method for determining a gradient of textured coordinates along an axis of an ellipse, in accordance with various embodiments;

FIG. 9A illustrates an example image of ray cone tracing rendering using a ray cone without anisotropic filtering according to the prior art;

FIG. 9B illustrates an example image rendered using ray cone tracing implementing anisotropic filtering in accordance with various embodiments; and

FIG. 10 is a flowchart of method steps for calculating pixel color using ray cone tracking techniques implementing anisotropic filtering, in accordance with various embodiments.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the various embodiments. It will be apparent, however, to one skilled in the art, that the present inventive concept may be practiced without one or more of these specific details.

General overview

Embodiments of the present disclosure provide improved ray cone tracking techniques that enable anisotropic filtering. Improved ray cone tracking techniques have many practical applications, including video games, movie production rendering, building and design applications, and any other application that can use ray cone tracking to render images. In the improved ray cone tracing technique, when a ray cone traced through a virtual three-dimensional (3D) scene hits a geometric surface in the scene, the ray cone approximates a cylinder to determine the axis of an ellipse formed by the intersection of the ray cone and a plane associated with the surface. Texture coordinate gradients along the axes of the ellipse are then calculated and input into a texture unit of a Graphics Processing Unit (GPU) that performs anisotropic texture filtering along with textures associated with the surface. The texture unit outputs anisotropically filtered texture colors, which can then be used to determine the color of the pixels in the rendered image.

The cone-of-ray tracing techniques of the present disclosure have many practical applications. For example, ray cone tracking techniques may be used to effectively render images and/or frames in a video game. As a particular example, the ray cone tracking technique may be performed by a cloud-based graphics processing platform (e.g., a cloud-based gaming platform) that executes a video game and streams video of the game session to the client device. The disclosed ray cone tracking technique is computationally more efficient than differential ray tracking techniques that implement anisotropic filtering. The rendered image and/or frame may also appear more realistic than images and/or frames rendered using some other rendering techniques, such as conventional ray cone tracking techniques and rasterization-based techniques.

As another example, ray cone tracking techniques may be used in the quality of production for rendering movies. The production of animated movies and Computer Generated Images (CGIs) and special effects in real movies typically require high quality rendering of the frames of these movies. The disclosed ray cone tracking techniques may be used to render frames of a movie more efficiently and/or correctly than certain other techniques, such as differential ray tracking techniques and conventional ray cone tracking techniques.

As yet another example, the disclosed ray cone tracking techniques may be used to render designs of building structures and other objects. Building and design applications typically provide a rendering to show the look of a particular design in real life. The disclosed ray cone tracking techniques may be used to render designed images more efficiently and/or correctly than certain other techniques, such as differential ray tracking techniques and conventional ray cone tracking techniques.

The above examples are in no way intended to be limiting. As will be appreciated by those skilled in the art, as a general matter, the ray cone tracking techniques described herein may be implemented in any application currently employing conventional ray tracking and/or ray cone tracking techniques.

System overview

Fig. 1 is a block diagram illustrating a computer system 100 configured to implement one or more aspects of the present embodiments. As will be appreciated by those of skill in the art, the computer system 100 may be any type of technically feasible computer system, including, but not limited to, a server machine, a server platform, a desktop machine, a laptop computer, a handheld/mobile device, or a wearable device. In some embodiments, computer system 100 is a server machine operating in a data center or cloud computing environment that provides scalable computing resources as a service over a network.

In various embodiments, computer system 100 includes, but is not limited to, a Central Processing Unit (CPU) 102 and a system memory 104 coupled to a parallel processing subsystem 112 via a memory bridge 105 and a communication path 113. Memory bridge 105 is also coupled to an I/O (input/output) bridge 107 via communication path 106, and I/O bridge 107 is in turn coupled to switch 116.

In one embodiment, I/O bridge 107 is configured to receive user input information from an optional input device 108, such as a keyboard or mouse, and forward the input information to CPU 102 for processing via communication path 106 and memory bridge 105. In some embodiments, computer system 100 may be a server machine in a cloud computing environment. In such embodiments, the computer system 100 may not have the input device 108. Instead, computer system 100 may receive equivalent input information by receiving commands in the form of messages sent over a network and received via network adapter 118. In one embodiment, switch 116 is configured to provide connectivity between I/O bridge 107 and other components of computer system 100 (e.g., network adapter 118 and various add-on cards 120 and 121).

In one embodiment, I/O bridge 107 is coupled to system disk 114, which system disk 114 may be configured to store content and applications and data for use by CPU 102 and parallel processing subsystem 112. In one embodiment, the system disk 114 provides non-volatile memory for applications and data, which may include fixed or removable hard drives, flash memory devices, and CD-ROM (compact disc read Only memory), DVD-ROM (digital versatile disc-ROM), blu-ray, HD-DVD (high definition DVD), or other magnetic, optical, or solid-state storage devices. In various embodiments, other components, such as a universal serial bus or other port connection, an optical disk drive, a digital versatile disk drive, a movie recording device, etc., may also be connected to I/O bridge 107.

In various embodiments, memory bridge 105 may be a north bridge chip and I/O bridge 107 may be a south bridge chip. In addition, communication paths 106 and 113, as well as other communication paths within computer system 100, may be implemented using any technically suitable protocol, including, but not limited to, AGP (accelerated graphics Port), hyperTransport, or any other bus or point-to-point communication protocol known in the art.

In some embodiments, parallel processing subsystem 112 includes a graphics subsystem that communicates pixels to an optional display device 110, which may be any conventional cathode ray tube, liquid crystal display, light emitting diode display, or the like. In such embodiments, parallel processing subsystem 112 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. As described in more detail below in connection with fig. 2-3, such circuitry may be incorporated across one or more Parallel Processing Units (PPUs) (also referred to herein as parallel processors) included in parallel processing subsystem 112. In other embodiments, parallel processing subsystem 112 includes circuitry optimized for general purpose and/or computational processing. Again, such circuitry may be incorporated into one or more PPUs included in parallel processing subsystem 112 that are configured to perform such general-purpose and/or computational operations. In other embodiments, one or more PPUs included in parallel processing subsystem 112 may be configured to perform graphics processing, general purpose processing, and computational processing operations. The system memory 104 includes at least one device driver configured to manage processing operations of one or more PPUs within the parallel processing subsystem 112. In addition, system memory 104 includes rendering application 130. Rendering application 130 may be any technically feasible application that renders virtual 3D scenes using the ray cone shading techniques disclosed herein. For example, rendering application 130 may be a gaming application or a rendering application used in movie production. Although described primarily herein with respect to rendering application 130, the techniques disclosed herein may also be implemented, in whole or in part, in other software and/or hardware (e.g., parallel processing subsystem 112).

In various embodiments, parallel processing subsystem 112 may be integrated with one or more other elements of FIG. 1 to form a single system. For example, parallel processing subsystem 112 may be integrated with CPU102 and other connection circuitry on a single chip to form a system on a chip (SoC).

In one embodiment, CPU102 is the main processor of computer system 100, which controls and coordinates the operation of other system components. In one embodiment, CPU102 issues commands that control the operation of the PPU. In some embodiments, communication path 113 is a PCI express link, as is known in the art, wherein a dedicated channel is assigned to each PPU. Other communication paths may also be used. The PPU advantageously implements a highly parallel processing architecture. The PPU may be equipped with any number of local parallel processing memories (PP memories).

It will be appreciated that the system shown herein is illustrative and that variations and modifications are possible. The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of parallel processing subsystems 112, may be modified as desired. For example, in some embodiments, system memory 104 may be connected to CPU102 directly, rather than through memory bridge 105, and other devices will communicate with system memory 104 through memory bridge 105 and CPU 102. In other embodiments, parallel processing subsystem 112 may connect it to I/O bridge 107 or directly to CPU102 instead of to memory bridge 105. In other embodiments, the I/O bridge 107 and memory bridge 105 may be integrated into a single chip, rather than in the form of one or more discrete devices. In some embodiments, one or more of the components shown in FIG. 1 may not be present. For example, the switch 116, network adapter 118, and add-on cards 120, 121 may be omitted to connect directly to the I/O bridge 107. Finally, in some embodiments, one or more of the components shown in FIG. 1 may be implemented as virtualized resources in a virtual computing environment (such as a cloud computing environment). In particular, in some embodiments, parallel processing subsystem 112 may be implemented as a virtualized parallel processing subsystem. For example, parallel processing subsystem 112 may be implemented as a virtual Graphics Processing Unit (GPU) that renders graphics on a Virtual Machine (VM) executing on a server computer, with its GPU and other physical resources shared across multiple VMs.

Fig. 2 is a block diagram of a Parallel Processing Unit (PPU) 202 included in the parallel processing subsystem 112 of fig. 1, in accordance with various embodiments. Although FIG. 2 depicts one PPU202, parallel processing subsystem 112 may include any number of PPUs 202, as described above. As shown, PPU202 is coupled to local Parallel Processing (PP) memory 204.PPU202 and PP memory 204 may be implemented using one or more integrated circuit devices, such as a programmable processor, application Specific Integrated Circuit (ASIC), or memory device, or in any other technically feasible manner.

In some embodiments, PPU202 includes a GPU, which may be configured to implement a graphics rendering pipeline to perform various operations related to generating pixel data based on graphics data provided by CPU 102 and/or system memory 104. When processing graphics data, PP memory 204 may be used as a graphics memory to store one or more conventional frame buffers and, if desired, one or more other render targets. Wherein PP memory 204 may be used to store and update pixel data and transfer the final pixel data or display frame to optional display device 110 for display. In some embodiments, PPU202 may also be configured for general purpose processing and computing operations. In some embodiments, computer system 100 may be a server machine in a cloud computing environment. In such embodiments, computer system 100 may not have display device 110. Instead, computer system 100 may generate equivalent output information by sending commands in the form of messages over a network via network adapter 118.

In some embodiments, CPU 102 is the main processor of computer system 100, which controls and coordinates the operation of other system components. In one embodiment, CPU 102 issues commands that control the operation of PPU 202. In some embodiments, CPU 102 writes a command stream for PPU202 to a data structure (not explicitly shown in fig. 1 or 2) that may be located in system memory 104, PP memory 204, or another storage location accessible to both CPU 102 and PPU 202. Pointers to the data structure are written to a command queue (also referred to herein as a push buffer) to initiate processing of the command stream in the data structure. In one embodiment, PPU202 reads the command stream from the command queue and then executes the command asynchronously with respect to the operation of CPU 102. In embodiments that generate multiple push buffers, an application may assign execution priority to each push buffer through a device driver to control scheduling of different push buffers.

In one embodiment, PPU202 includes an I/O (input/output) unit 205 that communicates with the rest of computer system 100 via communication path 113 and memory bridge 105. In one embodiment, I/O unit 205 generates packets (or other signals) for transmission on communication path 113 and also receives all incoming packets (or other signals) from communication path 113, directing the incoming packets to the appropriate components of PPU 202. For example, commands related to processing tasks may be directed to the host interface 206, while commands related to memory operations (e.g., read from the PP memory 204 or write to the PP memory 204) may be directed to the crossbar unit 210. In one embodiment, the host interface 206 reads each command queue and sends the stored command stream in the command queue to the front end 212.

As described above in connection with FIG. 1, the connection of PPU 202 to the rest of computer system 100 may vary. In some embodiments, parallel processing subsystem 112, including at least one PPU 202, is implemented as an add-on card that may be inserted into an expansion slot of computer system 100. In other embodiments, PPU 202 may be integrated on a single chip with a bus bridge (e.g., memory bridge 105 or I/O bridge 107). Again, in other embodiments, some or all of the elements of PPU 202 may be included with CPU 102 in a single integrated circuit or system-on-chip (SoC).

In one embodiment, the front end 212 sends processing tasks received from the host interface 206 to a work distribution unit (not shown) within the task/work unit 207. In one embodiment, a work distribution unit receives pointers to encoded processing tasks as Task Metadata (TMD) and stores in memory. The pointer to the TMD is contained in a command stream that is stored as a command queue and received by the front end unit 212 from the host interface 206. The processing tasks that may be encoded as TMDs also include indexes associated with the data to be processed as state parameters and commands that define how the data is processed. For example, the state parameters and commands may define a program to be executed on the data. Also, for example, the TMD may specify the number and configuration of a set of CTAs. Typically, each TMD corresponds to a task. Task/work unit 207 receives tasks from front end 212 and ensures that GPCs 208 are configured to a valid state before starting the processing tasks specified by each TMD. Each TMD may be assigned a priority for scheduling execution of processing tasks. Processing tasks may also be received from processing cluster array 230. Alternatively, the TMD may include parameters that control whether the TMD is added to the head or tail of the list of processing tasks (or the list of pointers to processing tasks), thereby providing another level of control over execution priority.

In one embodiment, PPU 202 implements a highly parallel processing architecture based on a processing cluster array 230, where processing cluster array 230 includes a set of C general purpose processing clusters (GPCs) 208, where C.gtoreq.1. Each GPC 208 is capable of executing a large number of concurrent threads (e.g., hundreds or thousands), where each thread is an instance of a program. In various applications, different GPCs 208 may be allocated for processing different types of programs or for performing different types of computations. The allocation of GPCs 208 may vary according to the workload generated by each type of program or computation.

In one embodiment, memory interface 214 includes a set of D partition units 215, where D.gtoreq.1. Each partition unit 215 is coupled to one or more Dynamic Random Access Memories (DRAMs) 220 residing in PPM memory 204. In some embodiments, the number of partition units 215 is equal to the number of DRAMs 220, and each partition unit 215 is coupled to a different DRAM220. In other embodiments, the number of partition units 215 may be different than the number of DRAMs 220. Those of ordinary skill in the art will appreciate that DRAM220 may be replaced with any other technically suitable memory device. In operation, various render targets, such as texture maps and frame buffers, may be stored on DRAM220, allowing partition unit 215 to write portions of each render target in parallel to efficiently use the available bandwidth of PP memory 204.

In one embodiment, a given GPC 208 may process data to be written to any DRAM220 in PP memory 204. In one embodiment, crossbar unit 210 is configured to route the output of each GPC 208 to an input partition unit 215 of any GPC 208 or any other GPC 208 for further processing. GPCs 208 communicate with memory interface 214 via crossbar unit 210 to read from or write to various DRAMs 220. In some embodiments, crossbar unit 210 has a connection to I/O unit 205 in addition to a connection to PP memory 204 via memory interface 214, thereby enabling a processing core within a different GPC 208 to communicate with system memory 104 or other memory that is not local to PPU 202. In the embodiment of fig. 2, crossbar unit 210 is directly connected with I/O unit 205. In various embodiments, crossbar unit 210 may use virtual channels to separate traffic between GPCs 208 and partition units 215.

In one embodiment, GPCs 208 may be programmed to perform processing tasks related to a wide variety of applications, including, but not limited to, linear and nonlinear data transforms, filtering of video and/or audio data, modeling operations (e.g., applying physical laws that determine the position, velocity, and other properties of objects), image rendering operations (e.g., tessellation shaders, vertex shaders, geometry shaders, and/or pixel/fragment shader programs), conventional computing operations, and the like. In operation, PPU 202 is configured to transfer data from system memory 104 and/or PP memory 204 to one or more on-chip memory units, process the data, and write the resulting data back to system memory 104 and/or PP memory 204. The resulting data may then be accessed by other system components, including CPU 102, another PPU 202 in parallel processing subsystem 112, or another parallel processing subsystem 112 in computer system 100.

In one embodiment, any number of PPUs 202 may be included in parallel processing subsystem 112. For example, multiple PPUs 202 may be provided on a single add-in card, or multiple add-in cards may be connected to communication path 113, or one or more PPUs 202 may be integrated into a bridge chip. PPU202 in a multi-PPU system may be the same or different from each other. For example, different PPUs 202 may have different numbers of processing cores and/or different numbers of PP memory 204. In implementations where multiple PPUs 202 are present, the PPUs may operate in parallel to process data at a higher throughput than a single PPU202. The system containing one or more PPUs 202 may be implemented in a variety of configurations and forms, including, but not limited to, desktop, notebook, handheld personal computer or other handheld devices, wearable devices, servers, workstations, gaming machines, embedded systems, and the like.

FIG. 3 is a block diagram of a General Processing Cluster (GPC) 208 included in the Parallel Processing Unit (PPU) 202 of FIG. 2, according to various embodiments. As shown, GPCs 208 include, but are not limited to, a pipeline manager 305, one or more texture units 315, a preROP unit 325, a work distribution crossbar 330, and an L1.5 cache 335.

In one embodiment, GPCs 208 may be configured to execute a large number of threads in parallel to perform graphics, general processing, and/or computing operations. As used herein, a "thread" refers to an instance of a particular program executing on a particular set of input data. In some embodiments, single Instruction Multiple Data (SIMD) instruction issue techniques are used to support parallel execution of a large number of threads without providing multiple independent instruction units. In other embodiments, single Instruction Multithreading (SIMT) techniques are used to support parallel execution of a large number of generally synchronized threads by using a general purpose instruction unit configured to issue instructions to a set of processing engines in GPC 208. Unlike the case of SIMT execution mechanisms where all processing engines typically execute the same instructions, SIMT execution allows different threads to more easily follow divergent execution paths through a given program. Those of ordinary skill in the art will appreciate that a SIMD processing scheme represents a functional subset of a SIMT processing scheme.

In one embodiment, operation of GPCs 208 is controlled via a pipeline manager 305, which pipeline manager 305 distributes processing tasks received from a work distribution unit (not shown) within task/work unit 207 to one or more Streaming Multiprocessors (SMs) 310. The manager 305 may also be configured to control the work distribution crossbar 330 by specifying the destination of the process data output by the SM 310.

In various embodiments, GPCs 208 include a set M of SM 310, where M+.1. Furthermore, each SM 310 includes a set of function execution units (not shown), such as execution units and load storage units. The processing operations specific to any functional execution unit may be pipelined, which enables a new instruction to be issued for execution before the previous instruction completes execution. Any combination of function execution units within a given SM 310 can be provided. In various embodiments, the functional execution unit may be configured to support a variety of different operations, including integer AND floating point arithmetic (e.g., addition AND multiplication), comparison operations, boolean operations (AND, OR, 5 OR), shifting, AND computation of various algebraic functions (e.g., planar interpolation AND trigonometric functions, exponential AND logarithmic functions, etc.). Advantageously, the same functional execution unit may be configured to perform different operations.

In one embodiment, each SM 310 is configured to process one or more thread groups. As used herein, a "thread group" or "thread bundle" refers to a group of threads that concurrently execute the same program on different input data, with one thread in the group being assigned to a different execution unit within SM 310. The number of threads that may be included is less than the number of execution units in SM 310, in which case some execution may be in an idle state during the period that the thread group is processed. The thread group may also include more threads than the number of execution units in the SM 310, in which case the processing may occur in successive clock cycles. Since each SM 310 can support up to G thread groups simultaneously, up to G x M thread groups can be executed in GPC 208 at any given time.

Additionally, in one embodiment, multiple related thread groups may be active (in different execution phases) simultaneously in the SM 310. Such a set of thread groups is referred to herein as a "cooperative thread array" ("CTA") or "thread array". The size of a particular CTA is equal to m x k, where k is the number of threads executing simultaneously in a thread group, typically an integer multiple of the number of execution units in SM 310, and m is the number of thread groups active simultaneously in SM 310. In some embodiments, a single SM 310 may support multiple CTAs simultaneously, where such CTAs have granularity to allocate work to the SM 310.

In one embodiment, each SM 310 includes a level one (L1) cache, or uses space in the corresponding L1 cache external to the SM 310 to support (among other things) load and store operations performed by execution units. Each SM 310 may also access a level two (L2) cache (not shown) shared among all GPCs 208 in PPU 202. The L2 cache may be used to transfer data between threads. Finally, SM 310 can also access off-chip "global" memory, which can include PP memory 204 and/or system memory 104. It should be appreciated that any memory external to PPU202 may be used as global memory. Further, as shown in FIG. 3, a one-point five (L1.5) cache 335 may be included in GPC 208 and configured to receive and hold data requested by SM 310 from memory via memory interface 214. Such data may include, but is not limited to, instructions, unified data, and constant data. In embodiments having multiple SMs 310 within a GPC 208, SMs 310 may advantageously share common instructions and data cached in L1.5 cache 335.

In one embodiment, each GPC208 may have an associated Memory Management Unit (MMU) 320 configured to map virtual addresses to physical addresses. In various embodiments, MMU320 may reside within GPC208 or within memory interface 214. MMU320 includes a set of page entries (PTEs) for mapping virtual addresses to physical addresses of tiles or memory pages, and optionally cache line indexes. MMU320 may include an address translation look-aside buffer (TLB) or may reside in SM 310, one or more L1-cache or in GPC 208.

In one embodiment, in graphics and computing applications, GPCs 208 may be configured such that each SM 310 is coupled to a texture unit 315 for performing texture mapping operations, such as determining texture sample locations, reading texture data, and filtering texture data.

In one embodiment, each SM 310 sends processed tasks to work distribution crossbar 330 to provide the processed tasks to another GPC208 for further processing or to store the processed tasks in an L2 cache (not shown), parallel processing memory 204, or system memory 104 via crossbar unit 210. In addition, a pre-raster operations (preROP) unit 325 is configured to receive data from SM 310, direct the data to one or more Raster Operations (ROP) units within partition unit 215, perform color blending optimizations, organize pixel color data, and perform address translations.

It will be appreciated that the architecture described herein is illustrative and that variations and modifications are possible. Any number of processing units may be included within a GPC208, such as SM 310, texture unit 315, or preROP unit 325. Further, as described above in connection with FIG. 2, PPU 202 may include any number of GPCs 208 configured to be functionally similar to each other, such that execution behavior is not dependent on which GPC208 receives a particular processing task. Further, each GPC208 operates independently of other GPCs 208 in PPU 202 to perform tasks for one or more applications.

FIG. 4 is a block diagram illustrating an exemplary cloud computing system in accordance with various embodiments. As shown, computing system 400 includes one or more servers 402 in communication with one or more client devices 404 via one or more networks 406. Each server 402 may include similar components, features, and/or functions as the exemplary computer system 100 described above in connection with fig. 1-3. Each server 402 may be of any technically feasible type of computer system, including but not limited to a server machine or server platform. Each server 402 may also include similar components, features, and/or functions as computer system 100 except that each client device 402 executes a client application 422 instead of rendering application 130. Each server 402 may be of any technically feasible type of computer system, including but not limited to a desktop, a laptop, a handheld/mobile device and/or a wearable device. In some embodiments, one or more servers 402 and/or client devices 404 may be replaced by one or more virtualized processing environments, such as those provided by VMs and/or containers executing on one or more underlying hardware systems.

The one or more networks 406 may include one or more networks of any type, such as one or more Local Area Networks (LANs) and/or Wide Area Networks (WANs) (e.g., the internet).

In some embodiments, one or more servers 400 may be included in a cloud computing system (such as a public cloud, private cloud, or hybrid cloud), and/or in a distributed system. For example, one or more servers 400 may implement a cloud-based gaming platform that provides game streaming services, sometimes referred to as "cloud gaming," on-demand gaming, "or" game-as-a-service. In this case, games stored and executed on one or more servers 400 are streamed as video to one or more client devices 402 via one or more client applications 422 running thereon. During the gaming session, one or more client applications 422 process user inputs and transmit those inputs to one or more servers 400 for execution in the game. Although a cloud-based gaming platform is described herein as a reference example, one skilled in the art will appreciate that, as a general matter, one or more servers 400 may execute any technically feasible type of application or types, such as the design application described above.

As shown, each client device 404 includes one or more input devices 426, a client application 422, a communication interface 420, and a display 424. The one or more input devices 426 may include any type of one or more devices for receiving user input, such as a keyboard, mouse, joystick, and/or game controller. The client application 422 receives input data in response to user input at one or more input devices 426, sends the input data to one of the one or more servers 402 via a communication interface 420 (e.g., a network interface controller) and over one or more networks 406 (e.g., the internet), receives encoded display data from the server 402, and decodes and causes the display data to be displayed on a display 424 (e.g., a cathode ray tube, a liquid crystal display, a light emitting diode display, etc.). In this way, more computationally intensive computations and processing may be offloaded to one or more servers 402. For example, game sessions may be streamed from one or more servers 402 to one or more client devices 404, thereby reducing the need for graphics processing and rendering by one or more client devices 404.

As shown, each server 402 includes a communication interface 418, one or more CPUs 408, a parallel processing subsystem 410, a rendering component 412, a rendering capture component 414, and an encoder 416. Input data sent by the client device 404 to one of the one or more servers 402 is received via a communication interface 418 (e.g., a network interface controller) and processed by one or more CPUs 408 and/or parallel processing subsystems 410 included in the server 402, which correspond to the CPU 102 and parallel processing subsystem 112, respectively, of the computer system 100 described above in connection with fig. 1-3. In some embodiments, one or more CPUs 408 may receive input data, process the input data, and send the data to parallel processing subsystem 410. In turn, the parallel processing subsystem 410 renders one or more individual images and/or image frames (e.g., frames of a video game) based on the transmitted data.

Illustratively, the rendering component 412 employs the parallel processing subsystem 112 to render results of processing the input data, and the rendering capture component 414 captures the rendering as display data (e.g., as one or more image data and/or one or more image frames capturing separate images). Rendering performed by rendering component 412 can include ray-traced or path-traced lighting and/or shadow effects that are calculated using one or more parallel processing units (e.g., GPUs), and can further utilize one or more dedicated hardware accelerators or processing cores to perform ray or path-tracing techniques of server 402. In some embodiments, rendering component 412 performs rendering using the ray cone tracking techniques disclosed herein. Thereafter, encoder 416 encodes the captured rendered display data to generate encoded display data, which is transmitted via communication interface 418 over one or more networks 406 to one or more client devices 422 for display to one or more users. In some embodiments, rendering component 412, rendering capture component 414, and encoder 416 may be included in rendering application 130, as described above in connection with fig. 1.

Returning to the example of a cloud game, during a game session, input data received by one of the one or more servers 402 may represent movement of a user's features in the game, firing a weapon, reloading, passing a ball, turning a vehicle, etc. In this case, rendering component 412 can generate a rendering of the game session that represents the result of the input data, and rendering capture component 414 can capture the rendering of the game session as display data (e.g., capture image data of a rendering frame of the game session). Parallel processing (e.g., GPU) resources may be dedicated to each game session, or resource scheduling techniques may be employed to share parallel processing resources among multiple game sessions. In addition, the game session may be rendered using the ray cone tracing techniques disclosed herein. The rendered game session may then be encoded by encoder 416 to generate encoded display data that is transmitted over one or more networks 406 to one of the one or more client devices 404 for decoding and output by display 424 of that client device 404.

It will be appreciated that the architecture described herein is illustrative and that variations and modifications are possible. Any number of processing units, such as SM310, texture unit 315, or preROP unit 325, described above in connection with fig. 3, may be included in GPC 208, among other things.

Ray cone tracking and texture filtering techniques

FIG. 5 illustrates an exemplary ray cone tracked through a virtual three-dimensional scene in accordance with various embodiments. As shown, an enhanced ray cone 500 as ray 502 is traced through pixels 501 in screen space into a scene containing three objects 510, 512, and 514. When the ray cone 500 hits one of the objects 510, 512, 514, the ray cone 500 may reflect in one direction depending on the material properties of the object. Furthermore, the angle of the ray cone 502 may be increased or decreased (or remain unchanged) based on the surface curvature of the object. In general, if the surface curvature at the point of impact where the ray 502 intersects the geometry of the object is convex, the angle of the ray cone 500 increases and vice versa.

Illustratively, since the surface curvature at the impact point 520 is negative, the angle of the ray cone 500 is reduced after impacting the object 510. After shrinking to zero size, the size of the ray cone 500 increases again. Then, since the surface curvature at the impact point 522 is convex, the angle of the ray cone 500 increases after hitting the object 512 at the impact point 522.

In some embodiments, rendering application 130 instructs a texture unit of the GPU (e.g., texture unit 315 described above in connection with fig. 3) to perform anisotropic texture filtering at an impact point (e.g., impact point 520, 522, or 524) based on the surface texture of the impact point and a texture footprint determined by approximating ray cone 500 as a cylinder. For example, assume that leftmost object 510 has a textured and reflective marble surface, object 512 has a wooden surface that is also textured and reflective, and object 514 has a textured but non-reflective surface. In this case, the rendering application may track the ray cone 500 to the impact point 520 on the object 510, perform an anisotropic texture filter lookup based on the corresponding texture footprint and texture of the marble surface of the object 510, and track the reflected ray cone to the impact point 522 on the object 512, perform another anisotropic texture filter lookup based on the corresponding texture footprint and texture of the wood surface of the object 512, track another reflected ray cone to the impact point 524 on the object 514, and perform another anisotropic texture filter lookup based on the corresponding texture footprint and texture of the object 514. The results of the anisotropic texture filter lookup may then be used to render a combination of the marble texture associated with object 510 and the texture reflections from the wood textures and other textures associated with objects 512 and 514, respectively, on the surface of object 510.

FIG. 6 illustrates a method of performing anisotropic texture filtering using ray cones in accordance with various embodiments. As shown, the intersection of the ray cone 602 with the triangle plane 608, the triangle plane 608 being a plane passing through the vertices of the triangle at the point of impact 606, wherein the ray cone 602 intersects geometry (including triangles) in the 3D virtual scene to form an ellipse 604. The ellipse 604 corresponds to a texture footprint that may be used to perform anisotropic texture filtering. In particular, the major and minor axes (not shown) of the exit circle 604 may be input into a hard accelerated texture lookup unit in the GPU, which performs anisotropic texture filtering based on these axes.

More formally, letIs the direction associated with ray cone 602. As used herein, caps represent normalized vectors, and uppercase letters represent points. Will beProjected onto the triangle plane 608 defined by the normal vector 616, by +.>One axis 610 of the ellipse 604 is shown as +.>The other axis 612 of the ellipse 604 is made of +.>The representation can be calculated asThe axes 610 and 612 may then be rescaled to fit the size of the ellipse 604, giving the major and minor axes of the ellipse 604 here respectively +.>And->And (3) representing.

In some embodiments, the ray cone 602 is approximated as being along the cone direction Oriented cylinder, reduced the axis of the obtained ellipse 604 +.>And->Is used for the calculation of the complexity of the calculation of (a). Fig. 7 shows a side view of a cylinder approximation of a ray cone according to the invention. As shown, the ray cone may be approximated as having +.>The cylinders of the oriented sides 712 and 714. The width of such a cylinder is equal to 706 +.>The width of the ray cone at the point of impact of triangular plane 702. For a principal point of impact, the ray cone width can be calculated as w=2r=2t tan α, where t is the distance to the point of impact and α is the angle of the impact cone. Radius r is r=tttanα. In addition, as described above in connection with FIG. 5, according to equation (1), the ray cone direction can be determined by>Projected to the normal 706 +.>The major axis 710 of the ellipse is determined on the defined triangle plane 702 +.>Direction 708 of (2)

In some embodiments, similar triangles, shown as triangles 718 and 718, are used to determine principal axis 710Is a length of (c). In this case, the major axis 710 +.>Length of (d) and projection vector 708->The ratio between the lengths of (2) can be expressed asThe method is shown as follows:

wherein the method comprises the steps ofParallel to->But with the correct major axis length of major axis 710 and p is +.>Projected to a generalLength of plane of (i.e.) >Scaled major axis 710 +.according to equation (2)>It can be calculated as:

in addition, the short axis may be determined by taking the cross product between the long axis and normal and rescaling using the same techniqueI.e.

It should be noted that when ray cone direction 700Parallel to plane normal 706->When this is the case, the scaling of the axis becomes zero. To prevent division by zero, the calculated axial length may be clamped to a small constant so that the axial length is never less than the small constant. When the ray direction is 700->Perpendicular to plane normal 706->Such clamping may also be used to handle this situation when it comes to the moment.

In determining major and minor axes of an ellipse formed by intersection of a ray cone and a triangle plane at an impact pointAnd->Rendering application 130 then calculates gradients of texture coordinates along these axes in texture space and feeds the gradients along with the texture to a texture unit of the GPU that performs anisotropic texture filtering based on the gradients and texture (e.g., texture unit 315 described above in connection with fig. 3). In an embodiment, any technically feasible anisotropic texture filtering may be performed. For example, for HLSL (high-level shading language of DirectX), a SampleGrad function may be called to sample the texture and use gradients to affect the sampling. In this case, the sampling may comprise, for example, performing a plurality of bilinear searches within an ellipse represented by the gradient, and averaging the results of these searches to obtain an anisotropic filtered texture value.

FIG. 8 illustrates a method for determining a gradient of textured coordinates along an axis of an ellipse, in accordance with various embodiments. As shown, the triangle is included at position P ₀ 、P ₁ And P ₂ Vertices 820, 822 and824 and respectively have texture coordinates T ₀ 、T ₁ And T ₂ Wherein the texture coordinates T _i ＝(t _ix ，t _iy ). Texture coordinates are two-dimensional coordinates assigned to triangle vertices, indicating how the texture is mapped onto the triangle. Also shown is an elliptical footprint with its center point 802, denoted by P, and barycentric coordinates (u, v). The barycentric coordinates are different from the texture coordinates and can be used to interpolate the colors within the triangle. Although ellipses within triangles having vertices 820, 822, and 824 are shown for illustrative purposes, it should be understood that ellipses may also be at least partially external to triangles. In this case, the barycentric coordinates are negative outside the triangle.

For calculating along the main axisGradient of->The first step is to calculate the barycentric coordinates (u ₁ ，v ₁ ) This can be done by adding the principal axis vector 812 + ->Is added to the impact point 802P, i.e., point 804 is +.>Although relative to along the long axis 812>Gradient of->Is described but may be used in a similar manner for illustrative purposes Calculate along the minor axis +.>Gradient of->The barycentric coordinates u of the point 804 may be calculated ₁ Calculated as the area of triangle 830 divided by the entire area of the triangle with vertices 820, 822, and 824:

wherein the method comprises the steps ofAnd->Is the normalized triangle normal. The molecule in equation (5) is +.>This is twice the area of triangle 830 because +.>Normalized and perpendicular to->And->The denominator in equation (5) is P ₀ ，P ₁ And P ₂ The area of the triangle covered. Similarly, another barycentric coordinate v ₁ It can be calculated by dividing the area of triangle 832 by the entire triangle area according to equation (6):

in other embodiments, the barycentric coordinates may be calculated in any technically feasible manner, including using techniques well known to those skilled in the art.

Using the barycentric coordinates (u) ₁ ,v ₁ ) And the barycentric coordinates (u, v) of the impact point 802, an ellipse 812 can be calculatedAnd 810->Texture coordinate gradient of the axis of (c). In particular, since texture coordinates can be interpolated to T (u, v) = (1-u-v) T ₀ +uT ₁ +vT ₂ The gradient can thus be calculated as the difference between the texture coordinates at points 804 and 808 on the ellipse and the texture coordinates at the impact point 802:

wherein, as previously described, may be similar to (u) ₁ ,v ₁ ) Calculation (u) ₂ ,v ₂ ) But is based on

As described, the gradient calculated according to equation (7) may be input into a texture unit of the GPU together with the texture, which performs anisotropic texture filtering based on the gradient and the texture. For example, a SampleGrad function in HLSL may be called to sample textures using gradients.

In some embodiments, the method of performing anisotropic filtering using ray cones described above in connection with FIGS. 4-7 may be implemented according to the following pseudocode:

FIG. 9A illustrates an example image of ray cone tracing rendering using a ray cone without anisotropic filtering according to the prior art. As shown, an image 902 rendered using ray cone tracing without anisotropic filtering includes a plane with a checkerboard pattern texture. In the rendered image 902, the checkerboard pattern becomes gray and blurred at extreme glancing angles.

FIG. 9B illustrates an example image rendered using ray cone tracing implementing anisotropic filtering in accordance with various embodiments. As shown, the image 904 tracked and anisotropically filtered using the ray cone contains the same planes as the checkerboard pattern texture contained in the rendered image 902. As an example, the rendered image 904 is clearer than the rendered image 902, with less blur at extreme glances.

Referring generally to fig. 5-9B, those skilled in the art will appreciate that the foregoing examples are given for clarity and are not meant to limit the scope of the present embodiments. In general, any technically feasible method may be used to modify or replace the graphical object, and any technically feasible method may be equally applied to detect the modified or replaced graphical object.

FIG. 10 is a flowchart of method steps for calculating pixel color using ray cone tracking techniques implementing anisotropic filtering, in accordance with various embodiments. Although the method steps are described in connection with the systems of fig. 1-4, one skilled in the art will appreciate that any system configured to perform the method steps in any order falls within the scope of the present embodiments. Although described with respect to tracking a single ray cone, method steps may be repeated to track multiple ray cones while rendering an image.

As shown, method 1000 begins at step 1002, where rendering application 130 tracks a ray cone through a scene until the ray cone intersects a geometry in the scene at an impact point. In particular, the ray cone may be traced through pixels in screen space to the scene until the ray cone intersects a triangle in the geometry at the point of impact.

If the geometric surface at the point of impact is textured at step 1004, then at step 1006, rendering application 130 determines the axis of the ellipse formed by the ray cone at the point of impact with the triangle plane associated with the geometry by approximating the ray cone as a cylinder. As described, in some embodiments, rendering application 130 may determine the major axes of the ellipse by projecting the direction vector associated with the ray cone onto the triangle plane, thereby obtaining a vector associated with the major axes, and rescaling the vector to the correct size of the major axes using similar triangles according to equation (3). Similarly, rendering application 130 may determine the minor axis of the ellipse by taking the cross product between the major axis vector and the normal to the triangle plane to obtain the vector associated with the minor axis, and rescaling the vector using similar triangles according to equation (4).

In step 1008, the rendering application 130 calculates a gradient along the axis of the ellipse. As described, in some embodiments, rendering application 130 first calculates the barycentric coordinates on the ellipse by dividing the area of the sub-triangles (e.g., triangles 830 and 832) within the triangle at the point of impact by the area of the entire triangle (e.g., triangle with vertices 820, 822, and 824). Then, the rendering application 130 calculates texture coordinates based on the barycentric coordinates, and calculates a gradient along the axis of the ellipse using the texture coordinates according to equation (7) as a difference between the texture coordinates at points along the axis of the ellipse and the texture coordinates at the impact point.

At step 1010, rendering application 130 causes texture units of the GPU to perform anisotropic texture filtering based on texture and gradient. As described above, in embodiments, any technically feasible anisotropic texture filtering may be performed. For example, rendering application 130 may include invoking a SampleGrad function in HLSL to sample textures, where gradients are used to affect the sampling. Furthermore, sampling may include performing a circular mipmap lookup, for example, within the ellipses indicated by the gradient and average results of these mipmap (multi-level progressive texture) lookups to obtain anisotropic filtered texture values

At step 1012, after the texture unit has performed anisotropic texture filtering based on texture and gradient, rendering application 130 receives anisotropically filtered texture values from the texture unit of the GPU. The anisotropically filtered texture values represent the texture colors associated with the pixels in the screen space through which the ray cone was traced in step 1002.

At step 1014, rendering application 130 applies or accumulates the anisotropically filtered texture values to the pixels that tracked the ray cone at step 1002. The applied or accumulated texture filter values contribute to the color of the pixels in the rendered image. As described, the rendered image may be, for example, an image or frame within a video game or movie, an image generated by a building or design application or any other application, or the like.

Although described herein with respect to applying or accumulating anisotropically filtered texture values to pixels, anisotropically filtered texture values may be used in any technically feasible manner in other embodiments.

At step 1016, if the surface of the geometric figure is not reflective at the point of impact, the method 1000 ends. On the other hand, if the surface of the geometry is reflected at the point of impact, the method 1000 returns to step 1002 where the rendering application 130 tracks the (reflected) ray cone through the scene until the (reflected) ray cone again intersects the geometry in the scene at another point of impact.

In summary, the disclosed techniques use ray cones to achieve anisotropic texture filtering. When a ray cone traced through a virtual 3D scene strikes a surface within the scene at an impact point, the ray cone is approximated as a cylinder to determine the axis of an ellipse formed by the intersection of the ray cone and the triangle plane at the impact point. Texture coordinate gradients along the axes of the ellipses are calculated and input into a texture unit of a Graphics Processing Unit (GPU) that performs anisotropic texture filtering along with the texture. The texture unit outputs anisotropically filtered texture colors, which can then be used to determine the color of the pixels in the rendered image.

At least one technical advantage of the disclosed techniques over the prior art is that the disclosed techniques use ray cone tracking to achieve anisotropic filtering, resulting in an image with less aliasing effects, less blurring and more detail at extreme angles relative to images rendered using conventional ray cone tracking. In addition, the disclosed techniques use ray cone tracking, which is less computationally expensive than some other ray tracking techniques (e.g., differential ray tracking) that may be used to implement anisotropic filtering. These technical advantages represent one or more technical improvements over the prior art methods.

1. In some embodiments, a computer-implemented method for rendering one or more graphical images includes tracking one or more ray cones through a graphical scene, performing one or more anisotropic texture filtering operations based on the one or more ray cones to calculate texture colors, and rendering one or more graphical images based on the texture colors.

2. The computer-implemented method of clause 1, wherein, for each of the one or more ray cones, performing the one or more anisotropic texture filtering operations comprises: determining at least one axis of an ellipse formed by the intersection of the ray cone and a plane associated with a geometry within the graphical scene, computing one or more gradients along the at least one axis of the ellipse, and computing at least one texture color based on the one or more gradients and one or more textures.

3. The computer-implemented method of clause 1 or 2, wherein determining the at least one axis of the ellipse comprises approximating the ray cone intersecting the plane as a cylinder.

4. The computer-implemented method of any of clauses 1-3, wherein the one or more graphical images are associated with a video game, a movie, or a building or design application.

5. The computer-implemented method of any of clauses 1-4, wherein tracking is performed in a virtualized environment, performing one or more anisotropic texture filtering operations, and rendering steps.

6. The computer-implemented method of any of clauses 1-5, wherein tracking is performed in a cloud computing environment, and the steps of performing one or more anisotropic texture filtering operations and rendering are performed.

7. In some embodiments, a computer-implemented method for computing texture colors includes: tracking a ray cone through a graphical scene, determining at least one axis of a first ellipse formed by the intersection of the ray cone with a first plane associated with geometry at a first point of impact and within the graphical scene, computing one or more gradients along the at least one axis of the first ellipse, and computing a first texture color based on the one or more gradients and a first texture.

8. The computer-implemented method of clause 7, wherein calculating the first texture color comprises performing one or more anisotropic texture filtering operations based on the one or more gradients and the first texture.

9. The computer-implemented method of clause 7 or 8, wherein determining at least one axis of the first ellipse comprises approximating the ray cone intersecting the first plane as a cylinder.

10. The computer-implemented method of any of clauses 7-9, wherein determining the at least one axis of the first ellipse comprises: projecting a direction vector associated with the ray cone onto the first plane to generate a first vector associated with a principal axis, rescaling the first vector based on a first set of similar triangles, calculating a cross product between the first vector and a normal vector associated with the first plane to generate a second vector associated with a minor axis, and rescaling the second vector based on a second set of similar triangles.

11. The computer-implemented method of any of clauses 7-10, wherein calculating the one or more gradients along the at least one axis of the first ellipse comprises: calculating a plurality of barycentric coordinates on the first ellipse, calculating a plurality of texture coordinates based on the plurality of barycentric coordinates, and calculating the one or more gradients based on the plurality of texture coordinates.

12. The computer-implemented method of any of clauses 7-11, wherein calculating the plurality of barycentric coordinates comprises dividing an area of a plurality of triangles within a first triangle by an area of the first triangle.

13. The computer-implemented method of any of clauses 7-12, wherein the plurality of texture coordinates are calculated at a plurality of points along the at least one axis and on the ellipse and at the first point of impact, and calculating the one or more gradients comprises subtracting the texture coordinates at the first point of impact from the texture coordinates at a plurality of points along the at least one axis and on the first ellipse.

14. The computer-implemented method of any of clauses 7-13, further comprising clamping one or more lengths of the at least one axis of the first ellipse to a constant value.

15. The computer-implemented method of any of clauses 7-14, further comprising: tracking a reflected ray cone through a graphical scene, determining at least one axis of a second ellipse formed by the intersection of the reflected ray cone with a second plane associated with additional geometry within the graphical scene at a second point of impact, calculating one or more gradients along the at least one axis of the second ellipse, and calculating a second texture color based on the one or more gradients along the at least one axis of the second ellipse and a second texture.

16. In some embodiments, one or more non-transitory computer-readable media storing program instructions that, when executed by at least one processor, cause the at least one processor to perform the steps of: tracking a ray cone through a graphical scene, determining at least one axis of a first ellipse formed by the ray cone intersecting a first plane associated with geometry within the graphical scene at a first point of impact, calculating one or more gradients along the at least one axis of the first ellipse, and calculating a first texture color based on the one or more gradients and a first texture.

17. The one or more non-transitory computer-readable media of clause 16, wherein calculating the first texture color comprises performing one or more multi-level progressively farther texture lookup operations based on the one or more gradients and the first texture.

18. The one or more non-transitory computer-readable media of clauses 16 or 17, wherein determining the at least one axis of the first ellipse comprises: projecting a direction vector associated with the ray cone onto the first plane to generate a first vector associated with a principal axis, rescaling the first vector based on a first set of similar triangles, calculating a cross product between the first vector and a normal vector associated with the first plane to generate a second vector associated with a minor axis, and rescaling the second vector based on a second set of similar triangles.

19. The one or more non-transitory computer-readable media of any one of clauses 16-18, wherein each of the one or more gradients comprises a texture coordinate gradient.

20. The one or more non-transitory computer-readable media of any one of clauses 16-19, wherein calculating the one or more gradients along the at least one axis comprises subtracting texture coordinates at the first point of impact from texture coordinates at a plurality of points along the at least one axis and on the first ellipse. .

21. The one or more non-transitory computer-readable media of any one of clauses 16-20, wherein calculating the one or more gradients along the at least one axis of the first ellipse comprises: calculating a plurality of barycentric coordinates on the first ellipse, calculating a plurality of texture coordinates based on the plurality of barycentric coordinates, and calculating the one or more gradients based on the plurality of texture coordinates.

22. The one or more non-transitory computer-readable media of any one of clauses 16-21, the steps further comprising: tracking a reflected ray cone through a graphical scene, determining at least one axis of a second ellipse formed by the intersection of the reflected ray cone with a second plane associated with additional geometry within the graphical scene at a second point of impact, calculating one or more gradients along the at least one axis of the second ellipse, and calculating a second texture color based on the one or more gradients along the at least one axis of the second ellipse and a second texture.

23. In some embodiments, a system includes: one or more memories storing instructions; and one or more processors coupled with the one or more memories and configured, when executing the instructions, to: tracking a ray cone through a graphical scene, determining at least one axis of an ellipse formed by the ray cone intersecting a plane associated with a geometry within the graphical scene at an impact point; one or more gradients are calculated along the at least one axis of the ellipse, and a texture color is calculated based on the one or more gradients and the texture.

24. The system of clause 23, wherein the one or more processors comprise at least one of a Graphics Processing Unit (GPU) or a virtual GPU, and the texture color is calculated by a texture unit included in at least one of the GPU or the virtual GPU performing one or more anisotropic texture filtering operations based on the one or more gradients and the texture.

In any way, any and all combinations of any claim element recited in any claim and/or any element described in the present application fall within the intended scope of the present disclosure and protection.

The description of the various embodiments has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining all software and hardware aspects commonly referred to herein as a "module" or "system. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied thereon.

Any combination of one or more computer readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any other suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine. When executed via a processor of a computer or other programmable data processing apparatus, the instructions enable the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, or a field programmable gate array.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A computer-implemented method for rendering one or more graphical images, the method comprising:

tracking one or more ray cones through the graphical scene;

performing one or more anisotropic texture filtering operations based on the one or more ray cones to calculate texture colors; and

rendering one or more graphical images based on the texture colors;

wherein performing the one or more anisotropic texture filtering operations for each of the one or more ray cones comprises:

determining at least one axis of an ellipse formed by the intersection of the ray cone and a plane associated with a geometry within the graphical scene;

calculating one or more gradients along the at least one axis of the ellipse; and

at least one texture color is calculated based on the one or more gradients and the one or more textures.

2. The computer-implemented method of claim 1, wherein determining the at least one axis of the ellipse comprises approximating the ray cone intersecting the plane as a cylinder.

3. The computer-implemented method of claim 1, wherein the one or more graphical images are associated with a video game, movie, or building or design application.

4. The computer-implemented method of claim 1, wherein the steps of tracking, performing one or more anisotropic texture filtering operations, and rendering are performed in a virtualized environment.

5. The computer-implemented method of claim 1, wherein the steps of tracking, performing one or more anisotropic texture filtering operations, and rendering are performed in a cloud computing environment.

6. A computer-implemented method for computing texture colors, the method comprising:

tracking a ray cone through a graphical scene;

determining at least one axis of a first ellipse formed by the ray cone intersecting a first plane associated with geometry within the graphical scene at a first point of impact;

calculating one or more gradients along the at least one axis of the first ellipse; and

a first texture color is calculated based on the one or more gradients and the first texture.

7. The computer-implemented method of claim 6, wherein computing the first texture color comprises performing one or more anisotropic texture filtering operations based on the one or more gradients and the first texture.

8. The computer-implemented method of claim 6, wherein determining the at least one axis of the first ellipse comprises approximating the ray cone intersecting the first plane as a cylinder.

9. The computer-implemented method of claim 6, wherein determining the at least one axis of the first ellipse comprises:

projecting a direction vector associated with the ray cone onto the first plane to generate a first vector associated with a principal axis;

rescaling the first vector based on a first set of similar triangles;

calculating a cross product between the first vector and a normal vector associated with the first plane to generate a second vector associated with a short axis; and

rescaling the second vector based on a second set of similar triangles.

10. The computer-implemented method of claim 6, wherein calculating the one or more gradients along the at least one axis of the first ellipse comprises:

calculating a plurality of barycentric coordinates on the first ellipse;

calculating a plurality of texture coordinates based on the plurality of barycentric coordinates; and

the one or more gradients are calculated based on the plurality of texture coordinates.

11. The computer-implemented method of claim 10, wherein calculating the plurality of barycentric coordinates comprises dividing an area of a plurality of triangles within a first triangle by an area of the first triangle.

12. The computer-implemented method of claim 10, wherein the plurality of texture coordinates are calculated at a plurality of points along the at least one axis and on the ellipse and at the first impact point, and calculating the one or more gradients comprises subtracting the texture coordinates at the first impact point from the texture coordinates at a plurality of points along the at least one axis and on the first ellipse.

13. The computer-implemented method of claim 6, further comprising clamping one or more lengths of the at least one axis of the first ellipse to a constant value.

14. The computer-implemented method of claim 6, further comprising:

tracking the reflected ray cone through the graphical scene;

determining at least one axis of a second ellipse formed by the intersection of the reflected ray cone with a second plane associated with additional geometry within the graphical scene at a second point of impact;

Calculating one or more gradients along the at least one axis of the second ellipse; and

a second texture color is calculated based on the one or more gradients along the at least one axis of the second ellipse and a second texture.

15. One or more non-transitory computer-readable media storing program instructions that, when executed by at least one processor, cause the at least one processor to perform the steps of:

tracking a ray cone through a graphical scene;

16. The one or more non-transitory computer-readable media of claim 15, wherein calculating the first texture color comprises performing one or more multi-stage progressively farther texture lookup operations based on the one or more gradients and the first texture.

17. The one or more non-transitory computer-readable media of claim 15, wherein determining the at least one axis of the first ellipse comprises:

rescaling the first vector based on a first set of similar triangles;

rescaling the second vector based on a second set of similar triangles.

18. The one or more non-transitory computer-readable media of claim 15, wherein each of the one or more gradients comprises a texture coordinate gradient.

19. The one or more non-transitory computer-readable media of claim 15, wherein calculating the one or more gradients along the at least one axis comprises subtracting texture coordinates at the first point of impact from texture coordinates at a plurality of points along the at least one axis and on the first ellipse.

20. The one or more non-transitory computer-readable media of claim 15, wherein calculating the one or more gradients along the at least one axis of the first ellipse comprises:

Calculating a plurality of barycentric coordinates on the first ellipse;

21. The one or more non-transitory computer-readable media of claim 15, the steps further comprising:

tracking the reflected ray cone through the graphical scene;

22. A system, comprising:

one or more memories storing instructions; and

one or more processors coupled with the one or more memories and configured, when executing the instructions, to:

the ray cone is traced through the graphical scene,

Determining at least one axis of an ellipse formed by the intersection of the ray cone at the point of impact with a plane associated with geometry within the graphical scene;

calculating one or more gradients along the at least one axis of the ellipse, an

A texture color is calculated based on the one or more gradients and the texture.

23. The system of claim 22, wherein the one or more processors comprise at least one of a graphics processing unit GPU or a virtual GPU, and calculating the texture color comprises performing one or more anisotropic texture filtering operations based on the one or more gradients and the texture via a texture unit in the at least one of the GPU or the virtual GPU.