KR101550477B1 - Architectures for parallelized intersection testing and shading for ray-tracing rendering - Google Patents

Architectures for parallelized intersection testing and shading for ray-tracing rendering Download PDF

Info

Publication number
KR101550477B1
KR101550477B1 KR1020107023579A KR20107023579A KR101550477B1 KR 101550477 B1 KR101550477 B1 KR 101550477B1 KR 1020107023579 A KR1020107023579 A KR 1020107023579A KR 20107023579 A KR20107023579 A KR 20107023579A KR 101550477 B1 KR101550477 B1 KR 101550477B1
Authority
KR
South Korea
Prior art keywords
ray
test
data
cross
intersection
Prior art date
Application number
KR1020107023579A
Other languages
Korean (ko)
Other versions
KR20100128337A (en
Inventor
루크 틸만 피터슨
제임스 알렉산더 맥콤
리안 알. 살스버리
스테판 퍼셀
Original Assignee
이메지네이션 테크놀로지스 리미티드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 이메지네이션 테크놀로지스 리미티드 filed Critical 이메지네이션 테크놀로지스 리미티드
Publication of KR20100128337A publication Critical patent/KR20100128337A/en
Application granted granted Critical
Publication of KR101550477B1 publication Critical patent/KR101550477B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

In one example, a ray tracing method for a scene includes using a plurality of intersection test resources coupled to a plurality of shading resources, collectively communicable through a link / queue. Cues generated from testing for shading contain individual ray / circle intersection identification results, which include a ray identifier. The shading of the test queue includes a new ray of light identifier for the test, and the data defining the ray is stored separately in the distributed memory between the cross test resources. The ray definition data can be maintained in a distributed memory until the ray completes the intersection test and can be selected for testing multiple times based on the ray identifier. An accelerating structure can be used. The packets of the ray identifier and shape data can be cycled between the cross test resources, each resource being identified in the packet, and the definition data being able to test the rays present in the memory. Accelerated shape test results are identified by allowing collection of rays based on the intersected shape and by queuing the optical signal identifier for the nearest detection ray / circular intersection section.

Figure R1020107023579

Description

 [0001] ARCHITECTURES FOR PARALLELIZED INTERSECTION TESTING AND SHADING FOR RAY-TRACING RENDERING [0002]

The present invention relates to a method for rendering two-dimensional representations from a three-dimensional scene, and more particularly to a method for using ray tracing to accelerate a real-world two-dimensional representation of a scene will be.

A method of rendering a real image using ray tracing is well known in the art of computer graphics technology. Ray tracing is known to produce a real image (including realistic shadows and lighting effects). Ray tracing can model the physical behavior of a scene's components and the interaction of light. However, ray tracing is known to be computationally intensive and, at present, art graphic workstations also require considerable time to render complex scenes using ray tracing.

Ray tracing usually involves acquiring a scene description consisting of geometric primitives such as triangles depicting the surfaces of structures in a scene and starting from the camera to determine the number of possible interactions Involves modeling how light interacts with the prototypes in a scene by tracing rays until they exit the scene without blocking the light source or crossing the light source.

For example, a scene may include a car on a street, along with a building located on either side of the street. An automobile in this scene can be defined by a large number of triangles (e.g., one million triangles) that approximate a continuous surface. The camera position where the scene is visible is defined. A ray from a camera is often referred to as a primary ray, and a ray emitted from one object to another is called a secondary ray, for example, to enable reflection. The image plane of the selected resolution (e.g., 1024x768 for SVGA display) is placed at a selected location between the camera and the scene.

The simplest ray tracing method of tracking an algorithm involves casting one or more rays into a scene through each pixel of an image from a camera. Each ray is then tested for each circular form that constitutes a scene for identifying the primitive in which it intersects, and determines the effect this circular has on, for example, reflecting and / or diffracting rays . Such reflections and / or diffractions allow rays to travel in different directions and divide the rays into a number of secondary rays that take different paths. All of these secondary rays are then tested for scene prototypes to determine their intersecting circles, and continue to iteratively until secondary (and tertiary, etc.) rays are canceled, for example, by scene-out or light source collisions . While all of these rays / circles are determined, a tree is created that maps them. After the ray is canceled, the contribution of the light source is traced back through the tree and the effect on the pixels of the scene is determined. As can be appreciated, due to the computational complexity of testing 1024x768 rays (for example) for the intersection of millions of triangles, these numbers are computationally expensive, It does not take into account any additional rays generated as a result of the intersection.

Rendering a scene using ray tracing is called an "embarrassingly paralle problem" because color information accumulated in each pixel of the generated image can be accumulated independently of the rest of the pixels in the image . Thus, the color information for an image pixel can be determined in parallel, although there is constant filtering, interpolation or other processing for the pixel before outputting the final image. Thus, it is simple to divide the task of ray tracing the image for a specified set of processing resources by dividing the pixels to be rendered among the processing resources and rendering the pixels in parallel.

In some cases, the processing resources may be computing platforms that support multithreading and, in other cases, to computer clusters or computer core clusters connected on a LAN. With respect to this type of system, a designated processing resource (e.g., a thread) may be an example for processing an assigned group of rays or rays through the completion of cross-testing and shading. In other words, using attributes that allow the pixels to be rendered independently of one another, rays that are known to contribute to different pixels can be split between threads or processing resources to be cross-tested, And writes the result of this shading operation to the screen buffer to be processed or displayed.

Some algorithm approaches have been proposed to solve this kind of problem. One such approach is disclosed by Matt Pharr et al. In "Rendering Complex Secene with Memory-Coherent Ray Tracing" (Proceedings of SigGraph (1997), hereinafter "Pharr"). Pharr describes how to divide a ray-traced scene into geometric 3D pixels (voxels), where each geometric voxel is a cube surrounding a primitive (eg, triangle). Pharr also describes a method of superimposing a scheduling grid, where each component of the scheduling grid is a scheduling voxel that can overlap a portion of a geometric voxel (i.e., the scheduling voxel is different from the cube of a geometric voxel Each of the scheduling voxels has an associated ray queue, which includes a beam of light presently present in the scheduling voxel, that is, a ray enclosed within the scheduling voxel, And the like.

 When Pharr processes the scheduling voxels, the intersection of the rays in the associated queue with the circles in the geometric voxels enclosed in the scheduling voxels is tested. If an intersection between a ray and a circle is found, a shading operation is performed, which results in a ray that is added to the ray queue. If no intersection is found in this scheduling voxel, the ray proceeds to the next non-empty scheduling voxel and is placed in the ray queue of the scheduling voxel.

Pharr suggests that if the purpose pursued by this approach is to match the shape of the scene to a cache typically provided by a general purpose processor so that the lifelong graphic within each scheduling voxel is aligned with the cache, Do not accept excessive traction (load) during cross-testing.

Pharr also describes that more work can be done on the cache when the prototype is fetched into the geometry cache by queuing the rays for testing in the scheduling voxel. In the case where the following processing is performed on a multistructured voxel, the scheduling algorithm may select a scheduling voxel that minimizes the amount of graphics that need to be loaded into the geometry cache.

Pharr recognizes that the proposed regular scheduling grid does not work well when certain scenes have non-uniform complexity (eg, high density of circles in part of the scene). Pharr assumes that adaptive data structures such as Octree can be used instead of a regular scheduling grid. The octree introduces a spatial subdivision within the three dimensional scene by subdivision along each major axis (e.g., x, y, and z axis) of the scene at each level of the hierarchy, These eight sub-volumes are created, and they can, for example, each create eight smaller sub-volumes. In each sub-volume, the divide / do not divide flag is set to determine whether the sub-volume is further divided or not. These sub-volumes indicate subdivisions until the number of primitives in the sub-volume is small enough to test. Thus, with respect to the octree, the amount of subdivision can be controlled depending on how many rounds are present in a particular part of the scene. Thus, an octree can change the degree of subdivision of the volume to be rendered.

A similar approach is described in US Patent No. 6,556,200 to Pfister (hereinafter " Pfister "). Pfister describes how to divide a scene into multiple scheduling blocks. A ray queue is provided in each block, and the rays in each queue are spatially and temporally ordered using a dependency graph. The rays are traced through each scheduling block in the order defined in the dependency graph. Pfister refers to Pharr's paper and adds the purpose of Pfister to render more than one single graphic prototype (e.g., but not limited to triangles) and devise a more complex scheduling algorithm for the scheduling block. The Pfister also considers the lower portion of the scene on the stage at multiple cache levels within the memory hierarchy.

Another approach relates to packet tracing, and a general reference to such packet tracing is found in the article "Interactive Rendering Through Coherent Ray Tracing" by Ingo Wald, Philip Slusallek Carsten Benthin (Proceedings of EUROGRAPHICS 2001, pp In this reference, the packet tracking method involves tracking packets of rays having similar origins and directions through the grid, Most of these rays travel through the same grid position, because they move in a direction substantially similar to the exit from substantially the same grid position. Thus, from a similar origin, a ray traveling in a similar direction is identified Another variation of this packet tracing method is the use of a frustrated < RTI ID = 0.0 > rum, truncated) rays are used to ensure that the truncated rays are used to determine the intersected voxels, which helps to reduce the number of operations on the specified light packet (i.e., not all rays are tested on the intersection, The packet tracking method should still identify rays traveling in directions similar to those in a similar location. Because rays are reflected, diffracted, and / or generated during ray tracing, It can become more and more difficult to do.

Another approach exists in the field of accelerating ray tracing. That is, one approach attempts to improve cache usage by more actively managing the ray state. Navratil et al., &Quot; Dynamic Ray Scheduling for Improved System Performance "(2007 IEEE Symposium on Interactive Ray Tracing (Sep. 2007)), Pharr's approach is based on Ray State explosion "(hereinafter referred to as" Navratil "). In order to solve this, Navratil suggests that during ray tracing, there is a restriction to "actively manage" the state of the beam and the state of the shape, thereby preventing the "beam state explosion". One suggestion tracks the creation of multiple rays separately, so Navratil first traces the primary ray, then traces the secondary rays after the primary ray, and so on.

This background knowledge shows the diversity of general ideas and approaches in the area of accelerating ray-tracing based on rendering. In addition, these references appear to have additional advantages in the field of ray tracing. However, a discussion of any of these references and techniques is not intended to suggest or imply that any of these references, or their central subject matter, is prior art to any subject matter disclosed herein. Rather, these references are intended to help highlight differences in the approach to rendering using ray tracing. Further, the processing of any of these references should be omitted for clarity, and need not be explicitly described.

In one aspect, the method of the present invention utilizes a plurality of computational resources in ray tracing a 2-D representation of a 3-D scene. The method includes using a first subset of computational resources for cross-testing a geometric shape comprising one or more circular and geometric acceleration components using ray motion in a 3-D scene. Each of the computational resources of the first subset may be operable to communicate with an individual localized memory resource that stores a respective subset of the rays traveling in the scene. The method transmits the identification result of the intersection between the light and the circles from the first subset of the computational resources to the second subset of the computational resources and executes the shading routine associated with the identified intersection between the light and the prototypes Using a second set of computational resources to perform the crossing test, the output from the shading routine including new rays to be cross-tested. The components in the subset may be time-variable or statistically determined during recognition of the system during configuration of the environment or during rendering of either a scene or a series of scenes.

The method also includes distributing data defining a new ray of localized memory resources and transferring the group of ray identifier to a first subset of the computational resources comprising shape data. Each ray identifier contains the ray definition data and other data for this ray. The transfer of the ray identifier activates the intersection test of the identified ray with the shape identified by the shape data. Such testing may include the steps of: fetching data defining each identified arithmetic resource stored in its local memory, identifying the identified ray, testing the identified shape for the intersection based on the retrieved definition data, And outputting a negative identification result.

In another aspect, the present invention includes a system for rendering a 2-D representation of a 3-D scene constructed in a circular fashion using ray tracing. The system includes a plurality of cross test resources including access to individual cache memories, each cache memory comprising a subset of master copy of the ray definition data, wherein the ray definition data for each ray comprises It is kept in the cache memory until the test for the ray is completed.

The system also includes control logic that is operative to control a test for each light beam by an individual test resource having access to the definition data for the light rays in the respective cache memory and to assign an identifier to each light beam . Test control is accomplished by providing a light beam identifier to an individual test cell that stores data for the ray to be tested. The system includes an output queue for identifying individual prototypes that intersect the ray that completed the cross test. The control logic allocates a new ray obtained as a result of the shading operation to replace the ray that has completed the cross test in the cache memory.

In some aspects, one or more of the following may be provided: The control logic makes a ray substitution by reusing the identifier for the completed ray as an identifier for the new ray, the ray identifier is associated with a memory location that stores the individual data defining the ray, and the data defining the new ray is the complete ray In place of the data stored in the memory location of FIG.

Yet another aspect of the present invention includes a system for rendering a 2-D representation of a 3-D scene consisting of a circle using ray tracing. The system includes a memory for storing prototypes constituting a 3-D scene, and a plurality of cross test resources. Each crosstalk test resource operates to test one or more rays traveling in the scene using one or more of the above circles, and outputs the identification result of the detected intersection. The system also includes a plurality of shading resources, each of which operates to actuate a shading routine associated with the circle from the results of the detected ray / circle intersection. The system also includes a first communication link for outputting an identification result of the intersection detected by the shading resource and a second communication link for conveying the new ray generated as a result of the operation of the shading routine to the intersection test resource. Here, the new ray can be sent to the cross test resources, and the cross test can be completed differently from the relative order in which they are sent. The communication links may be implemented in a queue such as a FIFO queue.

Another aspect of the present invention includes a method of ray tracing a scene composed in a circle in a system having a plurality of operation resources connected to a hierarchical memory structure including a main memory and a distributed memory between operation resources. Here, the main memory has a larger latency than the distributed memory. The method includes dispersing data defining a ray to be cross-tested in a scene in a distributed memory, causing subsets of rays to be stored in different ones of the distributed memories, and cross-testing the group of rays having one or more geometric shapes . Here, the members of the light beam group are stored in various distributed memories. The method includes the steps of fetching data defining one or more shapes from the main memory, geometric shapes relating to the beam groups, and identifying one or more processing resources associated with each distributed memory storing data for one beam of the beam group . The method also includes the steps of testing each ray of a group of rays with respect to the intersection using computational resources associated with one or more of the memories of the distributed memory storing data for the rays, .

Yet another aspect includes a system for testing rays against a circle and an intersection comprising a 3-D scene. The system includes a plurality of intersection test resources, each intersection test resource operative to test individual rays with respect to the intersection using a geometric shape. Each individual ray is identified by a reference provided to each cross test resource and the test resource is operative to output an identification of the intersection between the ray and the geometric shape at the first output or the second output.

One output relates to the circular intersection and the other output relates to the geometric acceleration element intersection. For example, the first output may provide input to a plurality of shading resources, and is related to the identification of the intersection between the light and the circle. The second output, on the other hand, provides an input to the ray collection manager and receives the result of the identification of the intersection between the ray and the geometric acceleration element.

Yet another aspect includes a ray tracing method comprising storing a geometric acceleration element that individually limits the selection of a circle in the main memory resource and a prototype that constitutes a 3-D representation, And defining an identifier for each ray. The method includes storing, in a system including a plurality of individually programmable processing resources, a portion of a light source origin and direction data in a localized memory resource that is individually associated with each processing resource. The method also includes implementing light beam scheduling for cross testing by providing an identifier for the light beam scheduled for testing and an identification result of the geometric shape for the processing resource. Each processing resource determines whether or not its localized memory resource stores ray definition data for any identified ray and in such a case intersects this ray with this identified geometry.

Yet another aspect includes a computer readable instruction for a system for controlling a plurality of processing resources so as to enable intersection testing of a ray and a geometric shape for use in rendering a 2-D representation of a 3-D scene Computer-readable media. The instructions may include accessing an identifier packet for a ray that is determined to intersect a first geometric acceleration element that defines a first selection of a circle and a second geometric acceleration element defining another geometric acceleration element defining a portion of the prototype defined by the first geometric acceleration element And determining an acceleration factor. The method may also include instantiating individual identification results for one of a plurality of packets each comprising a light beam identifier and another of the other geometric acceleration elements, and for individually testing each of the plurality of light beams, And providing a plurality of packets to each of the plurality of resources. The method also includes the steps of receiving an identification result of the intersection detected from the plurality of computational resources and determining a next geometric acceleration element having an identification result that is less than the number of received identification results of the threshold, Tracking the received identification result and repeating access with the next packet.

Yet another further aspect of the present invention is directed to a computer program product comprising a plurality of computing resources configured to cross test a ray and a shape, a separate cache each associated with computational resources, each cache having data defining part of a plurality of rays traveling in a scene , And also includes a channel for message delivery between the plurality of computational resources. Wherein each of the computational resources includes a plurality of light beam identifiers to interpret data in a message received by the computational resource, to determine whether or not to have one of the plurality of light beams stored in the cache, Test.

Yet another additional aspect includes a system for cross testing a circle and a ray that make up a 3-D scene. The system includes a plurality of cross-test resources, each of the cross-test resources operative to test individual rays with respect to the geometric shape and intersection. The individual rays are identified by a reference provided to each cross-test resource. Each crosstalk test resource is also configured to output the identification result of the intersection between the light and the circle with the first output or the second output. The system also includes a plurality of shading resources (each of which operates to execute a shading code for the detected intersection) and includes a plurality of intersection test resources for identifying a ray to be tested and maintaining a reference to the ray And a light ray collection manager operative to provide a light ray reference to the object. The first output provides an input to a plurality of shading resources and receives an identification result of the intersection between the ray and the circle and the second output provides an input to the ray collection manager and the intersection between the ray and the geometric acceleration element And receives the identification result.

Yet another additional aspect includes a computational configuration for using a 2-D representation of a 3-D scene in parallel with ray tracing based rendering. A processor coupled to a local cache (the local cache stores data defining a plurality of rays to be tested for intersection with a specified geometric shape); And an input queue provided by this processor. The data received in the input queue may be interpreted by the processor as comprising a plurality of identifiers relating to the ray to be tested with respect to the intersection with the identified geometric shape, To extract the definition data for only the identified rays of the queue, to cross test any of these rays with the identified geometry, and to output the identification result of any selected intersection.

Yet another further aspect includes a computer readable medium. The medium accesses an identifier packet for a ray that is determined to intersect a geometric acceleration element that limits the selection of a circle and determines another geometric acceleration element that limits (bounds) a portion of the circle defined by the intersected geometric acceleration element And computer-readable instructions for implementing a ray tracing method comprising: The method also includes the steps of intantiating a separate identification result of a plurality of packets (each packet including a light beam identifier) and one of the geometric acceleration elements, and transmitting a plurality of packets to the identified ray To each of a plurality of operational resources individually configured to cross test each of the plurality of operational resources. The method also includes receiving an identification result detected from a plurality of computational resources and tracking the received identification result according to the geometric acceleration element.

Yet another additional aspect includes a ray tracing method. The method includes determining a circle defining the 3-D front and a ray definition data defining a plurality of rays to be tested on the intersection. The method also includes dispersing a subset of the light definition data into a respective local memory of the plurality of computational resources, wherein the computational resources intersect the rays using a geometric shape, and in the management module, And to determine the ray to be collected among the plurality of rays to be cross-tested. Such a collection is defined by a plurality of ray identifiers, each of which includes a definition data for the rays and other data, and is associated with a prescribed shape defining a portion of the circle. The method also includes causing the computational resources to test the determined collection rays by passing a ray identifier for the collection between computational resources, wherein each computational resource is associated with an identified ray By cross-testing.

In these aspects, the plurality of light beams stored in the local cache may be a separate subset of the second plurality of light beams, some of the plurality of light beam identifiers identifying the light beams stored in the local cache, and some of the second plurality of light beams Is not stored in the local cache.

The functional aspects described may be implemented as modules, which are modules of computer-executable code that, for example, constitute suitable hardware resources that operate to generate inputs and outputs as described.

For a fuller understanding of the aspects and embodiments described herein, reference is made to the accompanying drawings, which are illustrated in the following description.
Figure 1 shows a first embodiment of a system for rendering a scene using ray tracing.
Fig. 2 shows a further aspect of the part of Fig.
Figure 3 shows another implementation of the cross-testing portion of a ray tracing < Desc / Clms Page number 3 >
Figure 4 shows an example of computational resources for a cross test that may be used in the system of Figures 1-3.
5 shows a further embodiment of a cross-testing system architecture for use in ray tracing.
Figure 6 illustrates aspects of another embodiment of an architecture for cross-testing.
FIG. 7 illustrates a system architecture that implements the various aspects of the examples of FIGS. 1-6, including crossing test resources and shading resources connected by a queue.
8A illustrates various aspects of a method for providing an identifier for a light beam that may be used to control ray tracing in the system according to Figs. 1-7.
Figures 9A and 9B illustrate embodiments that use a light beam ID to identify light ray data in memory that can be provided to any of the cross-test resources of Figures 1-7.
Figure 10 illustrates several aspects of the cross-test control (function) and shape distributed among a plurality of cross-testing resources that may be implemented in the system of Figures 1-7.
Figure 11 shows a multiprocessor architecture in which several aspects of the system of Figures 1-10 can be implemented when using an architecture for ray tracing.
FIG. 12 illustrates the interworking of resources and the organization of multiple computational resources using a localized light data storage medium, which may affect the implementation of FIGS. 1-11.
13 shows an example of a multithread or core operating as part of the computational resources of FIG.
Figures 14A-14C illustrate several different queue implementations that may be used in systems and architectures according to Figures 1-13.
15 is used to illustrate the different ways in which ray data can be distributed between L2 cache and private L1 cache shared by a plurality of computational resources.
16 shows an example of packets that may exist in a queue for each embodiment of the present invention.
Figure 17 provides a way to use the locally available light data in a cross test to process a light beam ID from a packet and to rewrite such test results.
18A and 18B illustrate embodiments of an exemplary SIMD architecture for processing packets of light ID information.
Figure 19 shows the content of a light beam identifier, testing a light beam, and summing test results with additional packets for further testing.
Figure 20 shows the methodological steps in the context of the data structure, generally applicable to systems in accordance with the previous figures.
Figure 21 shows additional method aspects in accordance with the present invention.

In the following, the invention will be described in detail with reference to the accompanying drawings and examples.

The following description is intended to enable those skilled in the art to make and use various embodiments of the invention. Various changes to the examples described in this specification may be made apparent to those skilled in the art to which the present invention pertains and the generic principles set forth herein may be applied to other examples and applications without departing from the scope of the present invention. have. This description begins by first introducing a plurality of aspects relating to the example of a three-dimensional (3-D) scene (FIG. 1), and the three-dimensional scene can be compressed using geometric acceleration data, as in the example of FIG. This three dimensional front view may be rendered in a two-dimensional representation using the system and method according to the depicted and described example.

As introduced in the art, a three-dimensional scene needs to be transformed into a two-dimensional representation for display. This conversion requires selecting the camera position from where the scene is seen. The camera position often represents the position of the person watching the scene (eg gamer, viewer of the animated film, etc.). The two-dimensional representation is typically in a plane position between the camera and the scene, and the two-dimensional scene includes a pixel array of the desired resolution. The color vector for each pixel is determined through rendering. During ray tracing, the ray can intersect the plane of the two-dimensional representation of the desired point, starting from the camera position. Then, the 3D scene continues. The position at which the ray intersects the two-dimensional representation is maintained in the data structure associated with that ray.

The camera position does not necessarily have to be a single point defined in the space, but instead the camera position may be dispersed so that the light rays are emitted from a large number of points considered within the camera position. Each ray intersects a two-dimensional representation in a pixel, which can be referred to as a sample. In some embodiments, the more precise the position at which the ray intersected with the pixel can be written, the more accurate color interpolation and blending is possible.

For clarity of description, data for a given type of object (e.g., coordinates for three vertices of a triangle) is often briefly described in the object itself rather than for data about the object. For example, when we say "circular fetch", it should be understood that a circular data representation is fetched rather than a physical realization of the prototype. However, particularly with respect to light rays, the present invention distinguishes between an identifier for a light ray and data that defines the light ray itself, and where the expression "ray" is used, Lt; RTI ID = 0.0 > and / or < / RTI > light IDs.

Actual and highly detailed object representations within a three dimensional scene are typically achieved by providing a large number of small geometric prototypes close to the surface of the object (i.e., the wireframe model). As such, complex objects need to be represented in more circles and smaller circles than simple objects. While providing an advantage of high resolution, performing a cross test between a light beam and a large number of prototypes (as described above and as will be explained further below) is exacerbated, Of the object. Without a constant external structuring forced into the scene for cross-testing, each ray must be tested for each circle and crossing, which results in a very slow cross test. Thus, a method of reducing the number of ray / circular intersection tests required per ray helps to accelerate in-scene ray intersection testing. One way to reduce the number of such cross-tests is to provide an extra bounding surface that compresses a large number of circular surfaces. The rays can be cross-tested first for each ray and for the boundary surface to identify a subset of the circle that is smaller than the circle for cross-testing. These boundary surface shapes can be provided in various shapes. In this specification, a collection of such boundary surface elements is referred to as GAD (Geometry Acceleration Data).

A more extensive treatment for GAD ratios, elements and uses can be found in U.S. Patent Application No. 11 / 856,612 (filed September 17, 2007), which is incorporated herein by reference. Thus, a simpler treatment of GAD with respect to content is provided below, and further details of this problem can be obtained from the above referenced application.

As introduced, the GAD element typically includes geometric shapes that enclose an individual collection of prototypes, in a three-dimensional space, such that the intersection failure of the surface of the geometry with the ray indicates that the ray does not intersect any of the circles in the shape . The GAD element may include a shape, an axis-aligned bounding box, a kd-tree, an octree, and other types of boundary volume layers, and thus the implementation according to the present invention uses a boundary scheme such as a kd- Or to specify or specify a range of boundary surfaces that bound one or more prototypes. In summary, since the GAD element is primarily useful for compressing a circle to more quickly identify the intersection between the light and the circle, it is desirable that the GAD element is a shape that can be readily tested for intersection with a ray.

The GAD elements can be related to each other. The interrelation of a GAD element may be a graph comprising a node and an edge in this specification where the node represents a GAD element and an edge represents a correlation between two of the GAD elements. When a pair of elements is connected by an edge, the edge indicates that one of the nodes has a different granularity than the other, which means that one of the nodes connected to that edge is different from the other node This can mean that you are bounding many or fewer circles. In some cases, the graph can be layered, so there is a direction to the graph, and the graph traverses from the parent node to the child node in this way to narrow the boundaries of the remaining boundaries in this way. In some cases, the graph may include homogeneous GAD elements, whereby the specified GAD elements do not directly define a circle if the specified GAD elements specify other GAD elements (i. E., In a homogeneous GAD structure , The prototype is specified directly by the leaf node GAD element, and the non-leaf node directly defines the other GAD element, not the prototype).

The graph for the GAD element can be configured with the goal of maintaining uniformity in the number and / or dimensions of the elements defined by each GAD element. The specified scene can be subdivided until this goal is achieved.

In the following description it is provided that there is a mechanism for determining which GAD element should be tested in response to the next, based on the ray determined to cross the specified GAD element. In one example of a hierarchical graph, the element to be tested next is generally the child node of the tested node.

One method of use for GAD implemented in many of the embodiments in this specification includes collecting with other rays determined to intersect the element when it is found to intersect a specified GAD element. If a large number of rays are collected, a stream of GAD elements associated with that element is fetched from main memory and streamed through a tester, each having a different collection ray. Thus, each tester maintains the light beam in the local high-speed memory unchanged, while fetching the geometric shape from the low-speed memory, when overwriting is required or allowed. More generally, this description is a series of how it is structured to advance a ray to detect the intersection of the ray and the geometric shape (GAD element and circle) of the ray and, consequently, which ray to collide with which prototype ≪ / RTI >

Another aspect of the present invention that this embodiment may implement includes one of the following. (1) a queue is provided to provide output from the intersection test to shading; (2) when there is a decision to test a particular ray for this shape, while the geometry is fetched from low-speed memory, (3) cross-testing is driven by identifying the ray (by using a ray identifier) as the computational resource that performs the cross-testing, which allows each computational resource to be computed from its localized memory To fetch data corresponding to the identified ray.

The following description shows an example of a system and apparatus for rendering a two-dimensional representation of a three-dimensional scene using ray tracing. The two conceptual functional components of this system are (1) ray tracing to identify the intersection and (2) shading of the identified intersection.

FIG. 1 illustrates various aspects of a system for use in ray tracing a scene constructed in a circle. In general, one of the functions or roles of one of the functional units of FIG. 1 and the other drawings may be implemented in multiple hardware units or software, a software subroutine, and may be run on a computer with each other. In some cases, such an implementation is more specifically described because it may affect system functionality and performance.

Figure 1 illustrates a geometric shape including a geometric unit 101, a cross processing unit 102, a sample processing resource 110, a frame buffer 111, and a GAD element and a circle (circular and GAD storage 103) , A sample 106, a ray shading data 107, and texture data 108. The memory resource 139 is configured to store or otherwise store the texture data 108, as shown in FIG. The geometric unit 101 enters an acceleration structure that includes a description of the scene to be rendered and a GAD element that defines the boundary of the circle. The cross processing 102 shunts the identified intersection between the light and the circle and utilizes inputs such as texture, shading code, and other sample information obtained from the illustrated data source. The output of the cross processing 102 includes color information and a new ray (to be described below) to be used to generate a two-dimensional representation of the scene to be rendered. All of these functional components may be implemented in one or more host processing resources represented by dashed line 185.

As described above, during shading of the identified light / circular intersection, the cross processing 102 may generate a new ray to be cross-tested. The driver 188 receives the new light and manages communication between the cross processing resource 102 and the localized cross test region 140 including the light data storage 105 and the cross test unit 109 And may be an interface with the cross processing 102 to do the same. The intersection test area 104 is configured to test the ray for the intersection and to have the circular access via the interface 112 and the read access to the GAD storage device 103 and to the intersection processing 102 via the result interface 121 And outputs the identification result of the intersection. The local light data storage device 105 is preferably implemented in a relatively fast memory so as to be relatively small in size. While the circular and accelerated structure storage devices are implemented in the main memory 139, which is relatively large and slow, which may be the main dynamic memory of the host 185.

One method of ray tracing high resolution scenes is related to the volume of associated light data and shape data. For example, rendering full HD resolution film at 30 frames per second requires more than 60 million pixels per second (1920 x 1080> 2M, 30 times per second). And, in order to determine the color of each pixel, a lot of light may be needed. Thus, hundreds of millions of rays must be processed every second, all rays require several bytes of storage, and ray tracing of full HD scenes involves more than a few gigabytes of ray data. Also, at any designated time, a large amount of light ray data must be stored in the memory. There is almost always a tradeoff between access speed and memory size, and therefore cost-effective large size memory is relatively slow. Also, a large memory size can not be used effectively unless a sufficiently large block of data is accessed or used. Thus, one challenge is to be able to simultaneously identify groups of rays that are large enough to efficiently access data from memory. However, as shown by approaches such as search and group testing of rays with similar origin and orientation, there may be over-processing, sometimes severe, in identifying such rays. In one aspect, the following exemplary architecture may be used to structure and use multiple computational resources, faster and more expensive memory, slower and larger amounts of memory to increase the processing efficiency of ray intersection testing and shading for scene rendering Explain the method.

1 illustrates shading of an identified intersection by a data flow including ray definition data to be stored in a high speed memory localized to a computational resource 109 that tests the ray for the intersection with the GAD element and the circle. Lt; RTI ID = 0.0 > test. ≪ / RTI > The output of the cross test 109 includes an identification result for the identified ray intersecting the identified circle. The cross processing 102 may receive such an identification result, perform shading according to this identification result, and generate an intern of a new ray for testing, which ultimately results in a fast ray of light data memory 105 . Such decoupling may be provided in various implementations using one or more of the general purpose computer and fixed functionality hardware programmed with software in accordance with the teachings of the present disclosure, using communication means selected in accordance with the processing resources used. However, one aspect that arises in this embodiment is that the shape data that is tested about the intersection of the ray with the ray definition data in the cross-test area 140 compared to the ray definition data is transient. Whereas differently, and where applicable, a high-speed memory is also allocated to the light beam data first, while the shape is streamed through the tester and there is little computational resources to be used to optimize the caching of such shape data. The various following drawings illustrate more specific examples of such collocation with decoupling, data flow, ray data storage, and cross-testing resources.

1 also shows that the frame buffer 111 can be used to drive the display 197 eventually. But. This is only one example of an output that is the result of a crossover test and shading operation that may be referred to as rendering for convenience. For example, the output may be written to a computer readable medium. The computer-readable medium includes a rendering product, such as a sequence of rendered images, to be transmitted on a network containing computational resources connected by a communication link, or to be rendered on or after a computer readable medium having an entity . In some cases, the 3-D scene to be rendered may be an image of a real 3-D scene, such as in the case of rendering or immersive virtual reality conferencing, have. In this case, the rendering method acts on or transforms the data representation of the physical object. In other cases, the 3-D scene may have some objects that represent physical objects and other nonexistent objects. In yet another additional 3-D scene, the completeness of the scene may be fictitious, such as in video games and the like. As a result, however, the method is typically a variant of an article of memory, display and / or computer readable media.

In addition, ray tracing rendering has been implemented since 1979, and a variety of techniques have been developed for cross-testing and other functions needed to implement rendering using ray tracing. Thus, the specific architectures and methods described herein do not replace the functional principles of ray tracing for use in rendering 3-D scenes in 2-D representations.

Figure 2 shows that the intersection test unit 109 of the intersection test area 140 includes one or more individual test resources (also known as test cells), which can test the geometry for the light beam. Region 140 includes test cells 205a-205n that receive light data from light beam data storage device 105 and geometric data from memory 139, respectively. Each test cell 205a-205n generates a result for the communication via the result interface 121 with the cross processing 102, which may include an identification result as to whether the specified light intersected the specified circle. Conversely, the result of the intersection test of the ray and the GAD element is provided to the logic 203. The logic 203 maintains a collection 210 of references to rays that associate these rays with GAD elements that are determined to intersect the rays.

In general, the system components are designed to support time-tested, specified, specified, light ray tests that are not known. The cross test unit 109 has a read access to the geometric memory and has a reference queue for the light as input. As an output of a cross test, each ray is correlated (correlated) with one geometric figure (referred to herein as a primitive for convenience) that intersects the ray first. Other geometric shapes (i.e., circles) may be shown as non-correlated.

As discussed above, region 140 includes management logic 203 associated with a ray reference buffer, which holds a list 210 of ray collections to be tested in test cells 205a-205n. Buffer management logic 203 may be implemented in hardware configured with fixed function processing resources or instructions obtained from a computer readable medium. These instructions may be structured into modules in accordance with the functions and tasks specified in logic 203 in this specification. The skilled artisan can provide additional implementations of logic 203 based on this description.

The logic 203 can assign light and geometric shapes to the test cell and can handle communications with other units in the design. In one aspect, each ray collection in the list 210 includes a plurality of ray identifiers, all of which are to be tested for intersection with one or more geometric shapes, and the logic 203 stores these ray collections. In a more specific example, a plurality of ray identifier is set to intersect an identified GAD element in this collection, and the next GAD element to be tested for intersection with a plurality of rays is associated with the intersected GAD element in the graph of the GAD element. The associated element for the specified collection is fetched from memory 139 when the intersection test with these elements begins.

Alternately, the logic 203 may store a reference representing a ray intersecting the lower-portion of the geometric data corresponding to the respective child node in the temporary ray reference buffer, which may delay the execution of further processing of such rays . In the example of a hierarchically ordered GAD, this execution delay is delayed until the time after the cumulative number of rays intersecting the geometric sub-portion of the child node is found to be suitable for further processing, such that the geometric acceleration of the child node It is possible to postpone the processing of the lower part of the data.

Logic 203 may communicate with memory 139 to establish a memory transaction that provides a geometric shape for testing for cells 205a- 205n. Logic 203 also communicates with light beam data storage device 105 and determines which rays are to be stored therein. In some implementations, logic 203 may be obtained or received from memory 139 or from a shading process running in cross processing unit 102 and may be stored in memory 105 for storage when space is available, And can be used during cross-testing.

Thus, the logic 203 may include a temporary ray reference buffer containing associated data of the ray identifier for a GAD shaped identifier. In one implementation, an identifier for the GAD element may be hashed to identify the location in the buffer to store the specified collection associated with the GAD element. Associated data is generally referred to as a collection, when referring to the storage or collection of such data, in memory and at some place within the current application, and the term "packet" It is commonly used to mean movement. The result thus returned is incorporated into the collection stored in memory associated with the GAD shape, as described below.

In summary, FIG. 2 continues to show that the ray definition data is stored in the first memory 105 while the shape data to be tested is extracted from memory 139 on the intersection with the ray. The above description also shows that the next time a plurality of shapes to be tested is fetched from memory 139 and then tested for intersection with a group of rays known to intersect the "parent" GAD element.

Now, FIG. 3 includes a block diagram of an intersection testing unit 350 implementation of region 140 (FIG. 1) that may be used in a rendering system for ray tracing two-dimensional representations of a three-dimensional scene. The ITU 350 includes a plurality of test cells 310a-310n and 340a-340n. The GAD element is shown as originating from the GAD data storage device 103b, and the circular data is shown as originating from the circular data storage device 103a.

The test cells 310a-310n receive the GAD elements and the light data to test for these elements (i.e., these test cells test the GAD element). The test cells 40a-340n receive the prototype and the light data for testing against this prototype (i. E., This test cell tests the prototype). Accordingly, the ITU 350 can test the collection of rays for intersection with the circle and test for a separate collection of rays for the intersection with the GAD element.

The ITU 350 also includes a collection management logic 203a and a collection buffer 203b. The collection buffer 203b and the light ray data 105 are stored in the memory 340 capable of receiving light ray data from the memory 139 (e.g.). The collection buffer 203b maintains a light beam reference associated with the GAD element. The collection manager 203a manages this collection based on the crossing information from the test cell. Collection manager 203a may also begin fetching circular and GAD elements from memory 139 to test the ray collection.

ITU 350 returns the identification result of the identified intersection. Which may be buffered in the output buffer 375 for final preparation via the result interface 121 to the crosstalk processing 102. The identification information is sufficient to identify, in the ray and in the specified range, the circle determined to intersect the ray.

The ITU 350 may be viewed as a function or utility that can be invoked via a control process or driver (e.g., driver 188). This control process or driver provides the ITU 350 with an appearance that the rays and this beam will be tested for crossing. For example, the ITU 350 is informed via a driver 188 (i.e., a process of connecting the ITU 350 with a rendering process, such as shading, and an initial ray generating functional unit). In a perspective view of ITU 350, ITU 350 does not need to know the origin of the information provided to it. Because the region 140 can perform a cross test using a circle (more generally a scene diagram) obtained based on the ray, GAD, and other information supplied to or supplied to it.

As described above, the ITU 350 can control how, when, and what data is supplied, and accordingly, the ITU 350 is not passive, and can be, for example, a light beam or a geometric Data or acceleration data can be fetched. For example, the ITU 350 may be provided with a large number of rays for cross-testing, with sufficient information to identify a scene in the ray to be tested. For example, as many as ten or more rays may be provided for a cross test at a specified time, and upon completion of the test for such rays, a new ray (generated by the cross processing 102) Likewise, it may be supplied to maintain the number of rays to be processed in the ITU 350 at an approximate initial number. The ITU 350 then controls the temporary storage of the ray during processing (in the ray collection buffer 203b (see FIG. 3)) (at logic 203a (see FIG. 3)). You can also initiate fetching for the elements of the prototype and GAD that are needed during this processing.

As described above, the GAD element is temporal in ITU 350 as compared to the beam. This is because the light identifier is stored in the buffer 203b and structured with respect to the GAD, whereas the data defining the light beam is stored light ray data 105. [ The buffer 203 and the light data 105 may each be stored in the memory 340, which may be physically implemented in various ways, such as one or more banks of the SRAM cache.

As described above, the logic 203a tracks the state of the ray collection stored in the memory 340 and determines the collection ready for processing. As shown in FIG. 3, logic 203a is communicatively coupled to memory 340 and can initiate the transfer of light for testing to each of the connected test cells. In situations where the GAD element specifies only a GAD element or only a prototype (but not a combination thereof), the logic 203a tests the ray according to whether a particular collection defining a prototype or other GAD element is associated with the GAD element May be assigned to cells 340a-340n or test cells 310a-310n.

In an example where a particular GAD element may specify both a GAD element and a prototype, the ITU 350 may have a data path to provide both the GAD element and the prototype together with a light beam in each test cell, ) Can prepare a ray test of the collection between test resources. In this example, because of the typical differences in shape between the GAD element and the circle (e.g., spherical to triangular) shape, the identification result for selecting the test logic or loading the optimized cross-testing algorithm for the shape to be tested is in logic 203a. ≪ / RTI >

Logic 203a may provide information about test cells 340a-340n and test cells 310a-310n directly or indirectly. In an indirect case, the logic 203a may provide information to each test cell so that each test cell may begin fetching light data for the memory 340. Although logic 203a is shown separate from memory 340, logic 203a may be implemented within the circuitry of memory 340 for simplicity of explanation. Since the management functions that are performed extensively by the logic 203a are related to the data stored in the memory 340. [

The ability to increase the parallelization of access to memory 340 by cross-testing resources is an advantage of some aspects of the invention described herein. As such, an increase in the number of access ports to memory 340 (preferably one or more per test cell) is an advantage. Exemplary structuring related to such parallelization is further described below.

In addition, ITU 350 may operate asynchronously with a unit that provides input data and receives output from ITU 350. Here, "asynchronous" may include the ITU receiving and initiating a cross test of additional beams while the cross test continues for a previously received ray. In addition, "asynchronous" includes that the intersection test for the light rays need not be completed in the order in which the ITU 350 receives the light rays. In addition, the asynchronous type is also applicable to the case where the cross test resources in the ITU 350 are arranged in a three-dimensional scene, irrespective of the position of the scheduling grid superimposed on the ray or scene, or a ray having a generation relation such as a daughter ray emerging from a small number of mother ray It is possible to assign or assign a cross test, regardless of whether you are testing only or a particular generation of rays (e.g., camera ray or secondary ray).

The ITU 350 also includes an output buffer 375 that receives the circular and the result of the identification of the identified intersection of the rays intersected with this circle. In one example, the identification result includes an identification result for a circle that is paired with information sufficient to identify the light intersected with the circle. The identification information for a ray may include a reference (e.g., an index), which identifies a particular ray in the list of rays. For example, the list may be managed by a driver 188 running on the host 185, which list may be stored in the memory 139. Preferably, when memory 139 does not contain this information, memory 139 may contain information such as the origin or orientation of the light beam (which is sufficient to reconstruct the light beam). Usually, a few bits are required to pass the reference, which can be an advantage.

4 illustrates an example of a test cell 310a and may include a working memory 410 and test logic 420. [ The work memory 410 includes several registers, which contain sufficient information to test the line segments for intersection with the surface, and can be more complex in different embodiments. For example, work memory 410 may store instructions that constitute test logic 420 for testing a particular shape received about the intersection, and may detect which shape was received based on the received data . The work memory 410 may temporarily store the detected hits, where each test cell is configured to test a series of rays for a geometric shape, or vice versa. The temporarily saved hits can then be output as a group. The working memory also receives shape data input from the storage device 103b.

The test logic 140 may perform a cross test at an available or selectable resolution and return a binary value indicating whether the detected cross is present. Such a binary value may be stored in the working memory for reading, caching (temporary storage), or output for a latch in a read cycle, such as a read cycle in memory 340 for GAD element testing.

Figure 5 illustrates several aspects of an implementation of a cross-testing unit 500 that is specifically focused on an exemplary memory structure. In ITU 500, test cells 510a-510n and 540a-540n are shown and in this example correspond to test cells 310a-310n and 340a-340n. This does not imply any limitation as to the number of test cells. Thus, in ITU 500, the prototype and GAD elements can be tested in parallel. However, if it is determined that an additional test cell in one variant, etc. is needed, then any test cell can be reconfigured properly (relocated in the case of hardware and reprogrammed in the case of software). As the transistor density increases in Braille, such a test cell can be more accommodated at the time of hardware implementation (or as a resource capable of running software). As described above, a portion of the test cell may be treated as an action group in that it will test the light beam for a common shape (i.e., a circular or GAD element). The test cells 540a-540n may return a binary value indicating the intersection with the circle at a particular level of precision (e.g., 16 bits), and may be useful for larger circular cases, More accurate identification results can be returned.

In ITU 500, memory 540 includes a plurality of independent operation banks 510-515, each of which has two ports (port 531 and port 532 of identified bank 515) One port is accessed via the GAD test logic 505 and the other port is accessed through the round test logic 530. The GAD and circular test logic 505 and 530 are each coupled to a respective work buffer 650 -565 and 570-575) and acquires the GAD elements for testing from the GAD storage device 103a and the circular storage device 103b, respectively.

Banks 510-515 mostly operate to provide non-collision access to the light data by GAD and circular test logic 505 and 530 so that test cells 510a-510n and test cells 540a-540n May be provided with light beams from separate banks 510-515, respectively. This non-collision access may be implemented, for example, by a separate cache bank and a cross-bar architecture (which allows access by ports to different physical parts of the memory), as can be understood in this specification. . If testing of the light rays stored in the bank by one or more test cells is allowed, a collision may occur when the two rays to be tested are in the same bank, in which case access may be made by the test logic 505 and 530 Can be processed sequentially. In some cases, work buffers 560-565 and 570-575 may load the next processing cycle while other processing is complete. For example, area 578 includes a test area for the GAD element. Since this region includes the GAD tester 510a and the memory bank 510. [ On the other hand, region 579 includes a memory bank 510 that contains tile 510a and 540a, one for each of GAD and circle, and stores the light ray data to be used in the test associated with the test cells in regions 578 and 579, , It includes a test area for all of the GAD element prototypes.

 By consistently testing the light beams, the operation of tracking which light rays are assigned to which test cell can be reduced. For example, each collection may have 32 rays, and there may be 32 of the test cells 310a-310n (510a-510n). For example, by providing the fourth ray in the collection consistently to the test cell 310d, the cell 310d does not need to maintain information about the ray that was provided to it, and only returns the result of the identification of the intersection. As shown, other implementations for maintaining consistency may be provided, which transmit the packets of the ray identifiers in the test cell and cause the test cell to record the crossing result in the packet.

A storage device for the ray collection can be implemented as an n-way interleaving cache for the ray collection, so that any specified ray collection can be stored in one of the n parts of the ray collection buffer 203b or 520. [ The ray collection buffer 203b or 520 stores a list of the ray collections stored in each of the n portions of the buffer. An implementation of the ray catalog buffer (203b or 520) identifies the identity of the elements of the GAD associated with the ray collection , And a unique identifier string may be used, for example, between the GAD elements used to render the scene. An alphanumeric string character set is a number, or a hash. For example, the hash may refer to one of the n portions of the ray collection buffers 203b and 520. [

In other implementations, the elements of the GAD may be defined to be stored in designated portions of the light collection buffers 203b, 520, for example, by mapping segments of an alphanumeric string used in a portion of the buffer. The circular / light crossing output 580 represents an output for identifying a potential circular / light crossing, and the output 580 may be serial or parallel. For example, if there are 32 circular test cells 540a-540n, the output 580 may include 32 bits indicating the presence or absence of the intersection of each ray for the circle just tested have. Of course, in other implementations, such as packet implementations, for example, the output may come directly from the test cell. The output can be serial and can be stored serially by the test cell in the packet.

Light ray data is received from a light source (e.g., a shader) in memory 340 (520). The collection management logic (e. G. 203a in Figures 2 and 3) operates to initially assign rays to a collection, where each collection is associated with an element of the GAD. For example, an element of GAD may be the root node of a graph, and all rays received are initially assigned to one or more collections associated with the root node. The reception of the rays may also be included in a group sized (e.g., from an input queue) to be the entire collection, and each such collection may be treated as a collection identified, for example, in the ray collection buffer 203b .

By focusing on the processing of one collection, multiple collections can be processed in parallel, and retrieval of the rays of a collection associated with a test node from memory 340 can be performed, for example, (E.g., a light beam identifier) from memory 340 or according to the example of Figure 5, banks 510-515 providing light data on a plurality of output ports for receiving test cells (e.g., test cells 560-565) By the collection management logic 203a, by providing these rays to retrieve.

Regarding the test of the GAD element specified by the selected node for testing (i.e., the GAD element associated with the selected node specifies a different GAD element), the distribution of the ray data for the rays of the collection under test is terminated, Fetching of the GAD element is performed (this is not necessarily the case with subsequent ray tracing). With respect to such fetching, the logic 203a may input address information to the GAD storage device 103b (or by any of the provided memory management means), and output the address GAD elements to the test cells 310a-310n do. If, as in this case, multiple GAD elements are specified, for block reading of multiple GAD elements, this element may be arranged to be streamed serially to the test cell (e.g., using a serial buffer).

In the test cell (e. G., 310a-310n), the ray of the collection can be tested for intersection with the GAD elements provided in series (e.g., different rays within each test cell). If it is determined that a ray intersects, then it is determined whether a collection for the intersected GAD element is present, in which case the ray is added to that collection (space is allowed), otherwise the collection is created, A light beam is added. If there is no space in the existing collection, a new collection can be created.

In some implementations, a 1: 1 correspondence of the maximum number of rays in the collection to the number of test cells 310a-310n is provided, so that all rays in the collection can be tested in parallel on a designated GAD element, Lt; / RTI > architecture. In this architecture, throughput is generally similar to what can be achieved using a 1: 1 correspondence of light to test cells. However, even though all of the rays of a given collection appear to be tested in parallel, by providing for serial transmission of packets (e.g., information representing the collection as described above) between different test cells, . ≪ / RTI >

Thereafter, the rays are tested for the circle and cross provided in the test cell (i. E., In this embodiment, each test cell has a different ray and tests a ray having a common circle). After the test, each test cell identifies the detected intersection.

Each ray of the collection is tested in the test cell for the intersection with the GAD element provided in the test cell (e.g., in the multi-bank example of Figure 5 (regions 578 and 579 are shown) , The bank may be considered to be localized in the GAD element test area and / or the circular test area, so that the bank uses the light data to assist one or more testers of each kind).

Since the output from the ray test for intersection with the GAD element is different from the test for the same ray about the circular intersection (ie, the intersection with the GAD element results in a collection into the collection for the GAD element, Is the determination of the intersection closest to the circle and the output of such intersection), even if a particular ray is present in two collections to be tested in parallel, no collision occurs in rewriting the collection data or outputting the intersection result It is common not to. For example, by testing multiple collections of rays over a circular intersection during multiple instantiation of test cells 340a-340n, sequential completion of testing, such as storing multiple intersections, or, if additional parallelization is implemented, And so on. In the embodiment of FIG. 5, if the data for the designated ray can be provided in a single bank with one tester type (e.g., the designated line is located in one memory bank), the multiple GAD tester may, for example, Do not test the same beam at the same time, thereby avoiding the problem of rewrite collision.

In summary, the method includes receiving a beam of light, assigning it to a collection, selecting a test waiting collection, assigning the rays of the selected collection to an appropriate test cell, and determining a suitable geometric shape Streaming, where the radiness may be determined algorithmically. The output depends on whether the geometric figure is a scene circle or a GAD element. With respect to the rays tested for the GAD element, the GAD element is identified based on the graphical linkage with the node associated with the collection to be tested, and the optic is added to the collection associated with the GAD element to be tested. The collection shows the wait status and is selected to test when it is queued. With respect to the intersection of the circle and the ray, the nearest intersection is traced by the ray. Since the ray is being tested for association with the ray collection, it is assumed that the intersection test for a particular ray is deferred until the relevant collection is determined to be a wait for testing. The rays can be collected consistently into multiple collections, which allows these rays to be tested against different parts of the scene geometry (i.e., they do not have to be tested in inverse traversal fashion).

As previously described, the ITU is stored in memory and the information representation of the previously received light beam from the light beam is input. For such rays, the ITU uses a collection of one or more rays of a plurality of collections to maintain the association of each ray. The ITU also stores the identification result of collection fullness for a plurality of collections stored in memory. This identification result may be an individual flag representing the entire collection, or it may be a number representing a plurality of light rays associated with the electrophysiological collection. More specifically, and other implementations and modifications related to implementing test algorithms are provided in the related applications referenced above, and the information provided here literally is not intended to be limiting.

As can be clearly seen from the description of these points, the rays are loaded (or accessed) from memory based on the information provided in the collection of rays. Thus, such loading determines the individual memory locations where the data representing each ray is stored. Such data is contained in the ray collection, e.g. one ray collection includes a list of memory locations or other references to the storage, and the storage device stores the ray data relating to the rays in the collection. For example, the ray collection includes references to locations in memory (e.g., memory 340), a bank of memory (e.g., bank 510, or other implementations) Or some other suitable way of referring to such data, etc. This aspect is represented in a perspective view in which separate ray data and ray collection data are maintained. However, in some implementations, There is no need to be very explicit or obvious in that data and light data can be kept in the content-related database, for example. Here, the association between the collection and the rays and the elements of the collection and GAD is maintained and tested Is used to identify the GAD elements associated with the collection and the rays associated with the collection.

It is also apparent that the ray data is "stationed" within the test cell as the circular or GAD element is traversed through the test cell. Other implementations are possible, as described in related applications. However, the main focus of this description is to provide a beam of light that will be localized or fixed with the test cell while the geometry is fetched and tested.

Several aspects of this embodiment are provided with reference to FIG. In particular, another implementation of the cross-testing logic includes a processor 605 that includes test control logic 603 (similar to the test logic of FIG. 2), wherein the test control logic includes a memory interface 625, An instruction decoder 645 and a fetch unit 620 for connecting to the data cache 650. [ The data cache 650 provides test cells 610a- 610n. The instruction decoder 645 also provides an input to the test cells 610a- 610n. The command generator 665 provides the command input to the command decoder 645. [ The test cell outputs the identification result of the intersection detected by the rewrite unit 660, and the rewrite unit can sequentially store the data in the data cache 650. [ The output from rewrite unit 660 is also used as an input to instruction generator 665 when the instruction is generated. It is assumed that the instructions used in the processor 605 have a single instruction, multiple data diversity, in which the instructions processed in the test cell are a cross test between defined surfaces (e.g., circle and GAD elements) and light rays.

In one example, the "command" includes data defining a geometric shape such as a circle or GAD element, and multiple data elements may include individual references to rays for testing for geometric shapes provided as " have. Thus, it is assumed that the combination of the geometric shape and the multi-beam reference is a separate packet of information that is transferable to several illustrated test cells. In some cases, the packet transfer is successively processed so that multiple packets are "in flight" between the plurality of test cells.

These test cells may exist in the form of a full-featured processor with a large set of instructions, each of which may contain other information sufficient to distinguish the purpose of the packet. For example, there may be a plurality of bits included to distinguish packets formed for cross-testing from packets that exist for different purposes. In addition, a variety of cross test commands may be provided, which may be included with respect to different round shapes and different GAD element shapes, or with respect to different test algorithms.

In a typical example, each intersection test packet may initially contain data about a reference or geometric element (one of which is a GAD element) about the geometric element, or may include a reference to the prototype. And a reference to a plurality of rays (i. E., The "packet " discussed above) for testing for intersection with a geometric element.

Decoder 645 may interpret instructions for determining a reference to a geometric element and initiate fetching of an element through a fetch 620 (control over a memory interface such as memory interface 625). In some implementations, the decoder 645 may expect a number of instructions to begin fetching the geometric elements needed in the future. The geometric elements may be provided by the fetch 620 to the decoder 645 where the decoder 645 provides the geometric elements to the test cells 610a-610n.

The decoder 645 also provides a light reference from an instruction, such as a functional address for the data cache 650, which provides individual data sufficient to cross test each of the light beams to each of the test cells 610a-610n. Data associated with the light (not required for cross-testing) need not be provided. Thus, the data cache 650 may function as localized light ray data for one or more computational resources operating as an intersect test cell.

The geometric elements are cross-tested with individual rays in each test cell 610a-610n and output from each test cell 610a-610n such that the identification result of the intersection is received by rewriting unit 660. [ Depending on the properties of the geometric elements tested, the rewrite unit 660 performs one of two different functional units. When the test cells 610a-610n have tested the prototype on the intersection, the rewrite unit 660 outputs the intersection of each ray intersected with the circle to be tested. When the test cells 610a-610n have tested the GAD element, the rewrite unit provides the output of the test cells 610a-610n to the instruction unit 665. [

The instruction unit 665 is operative to combine the virtual instructions to issue instructions to the test cells in an additional cross test. The instruction unit 665 operates using the inputs from the test cells 610a-610n, the instruction cache 630, and the GAD input 679 specifying the rays intersected with the specified GAD element as follows. Using input from test cells 610a-610n, instruction unit 665 determines a GAD element coupled to a GAD element specified for input from test cells 610a-610n, based on the GAD input ( That is, the instruction unit 665 determines which GAD element should be tested next based on the identified intersection for the specified GAD element.

The instruction unit 665 determines whether an instruction stored in the instruction cache for each element of the identified GAD already exists by being associated with the intersecting element and that the instruction can accommodate any additional light reference Data slot has been filed?). The instruction unit 665 adds the rays identified as intersecting at the test cell input to the instruction for the number of identified rays and generates another extinction enough to receive the natural ray reference. The instruction unit 665 also executes each element of the identified GAD by being associated with the element identified in the test cell input. Thus, after processing the test cell input (cross-identification result), the light rays identified as intersecting the same GAD element are each added to an instruction specifying a test of the light beam for the element of the GAD connected to the same GAD element. These generated commands are stored in the instruction cache 630.

An instruction may be structured in the instruction cache 630 based on the structuring of the GAD element received from the GAD input 670. [ The instruction unit 665 receives both the logic 203a and the instruction unit 665 in a manner that receives an identification result as to which light beam conflicts with which GAD element and which group is then grouped together for testing RTI ID = 0.0 > 203a. ≪ / RTI > The system of Figure 6 is a general purpose system in that light packet for testing can be one of several types for accommodating different functions.

For example, the GAD input 670 may provide a graph of the GAD, where the nodes of the graph represent the elements of the GAD, and the pairs of nodes are connected to the edges. The edge indicates which node is connected to the other node and the instruction unit 665 instructs the instruction cache 630 by the next edge connecting the node to identify whether the instruction already exists in the cache for the specified GAD element. Can be searched. If multiple commands are present for a given GAD element, they may be linked, ordered, or correlated to a single list. Such as by hashing the GAD element ID to identify a potential location in the instruction cache 630 where the associated command may be found.

The command may reference the GAD node under test, which causes it to fetch the connection node of the GAD in response to commands issued and decoded (as opposed to storing commands on each connected node). Each of these connection nodes may be streamed through test cells 610a-610n for testing with individual beams stored in each test cell (i.e., the ray data may include a plurality of GAD elements provided in each test cell While each test cell sequentially tests the beam for each GAD element).

Thus, a processor implemented in accordance with this embodiment provides functionality for generating or obtaining an instruction to collect an identified ray about an intersection with a first node for a cross test for a connected node. In the case where the GAD provided to the processor 605 is hierarchical, as in the example described above, the graphs of the GAD can be concatenated in a hierarchical order.

Exemplary connections and sources of GAD are illustrated, and other arrangements are possible. For example, memory 615 may be a source for a GAD element. However, it is desirable to store light rays in the high-speed memory rather than geometric data (i.e., data defining the ray and data such as the nearest neighboring round intersection found). The processing architecture specified here is allowed. Also, in the above example, the next node to be tested (i.e., the next acceleration element or prototype) was determined based on the test results, and in response, the packets were instanced for each geometric shape. Other implementations that may be apparent from this description may include instantiating a packet for each "child" node when it is determined to start testing a child node of a given node, Create a collection.

Figure 7 additionally illustrates various aspects of a ray tracing system (e.g., system 700) that can use a cue for a crossover test and a separate operation of light shading (including new ray generation including camera rays) . System 700 enables the provision of light beams for cross-testing and the refreshing of cross-tests, thereby producing an output for shading in a different order, such as the system of Figures 1-6. Likewise, the cross-testing resources may proceed with cross-testing of the rays without limiting the shading resolution of the previously-identified intersection.

7 shows a plurality of cross test resources (ITRs) 705a-705n, each of which is connected to a light data storage device 766a-766n, respectively. This storage device stores data defining the ray to be tested for crossing in the resource. Each group of ITRs and light data storage devices (e.g., light ray data 766a and ITR 705a) can be shown as a localized grouping of test resources (eg, grouping shown at 704). This is similar to previous groupings, for example, groupings 578 and 579 in FIG.

The light data storage devices 766a-766n may be a memory such as a private L1 cache, a shared or mapped portion of the L2 cache, and so on, as in the previous example, where the high speed memory is localized to the specified processing resource It is preferable to use it for storing the light ray data. The localized storage of light data is made easier by the cross test algorithm used here,

Increases the length of time so that it can be stored in a localized, high-speed memory, and reduces the amount of slash of the small memory. As such, such a beam storage device may appear to be quasistatic in that the data for the designated ray is stored in the same local memory until the intersection test of the scene is completed.

The data definition light is loaded from test control unit 703 (similar to logic 203b and the like in the previous figures) via output 743. The test control unit 703 receives an input that includes an identifier for the light rays to cause the intersection test at the ITRs 705a-705n to complete through the light completion queue 730. [

The queue 730 stores a ray identifier (Ray ID (1, 18, 106, 480) is shown as an example). Cue 730 obtains input from ITRs 705a-705n, which represents the ray that has been tested in the tested scene to identify the nearest intersection intersected. As such, the cue 730 can be used when the output from the ITRs 705a-705n is information about the GAD element or the nearest round (ITR 705a-705n) can test both types of shapes ) From the decision point 751, which can determine whether the intersection of the two points is representative of the intersection.

The decision point 751 represents the two types of cross control function units previously described. One is that the GAD / ray intersection is closer to the cross tester, and the other is that the nearest detection circle / ray intersection is output for shading. In some earlier architectures, when separate test cells are used, the decision point can be tracked when the closest (possible) circular intersection is present.

From decision point 751 the GAD result is input to the mux 752 and the mux receives the light beam ID input from the cue 725 which stores the light beam ID received from the input 742, And sequentially inputted from the light beam control unit 703. The light beam control unit 703 includes an input 742 having a light beam identifier corresponding to the light ray information to be provided to the light ray data 766a-766n via the output 743 from the test control unit 703. [ Thus, the data definition light rays (identified by the light beam identifier) (identified by the light beam ID) in the queue 725 are provided via the output 743, and the light ray data 766a-766n are stored for storage in such a memory . An example of how the beam ID is formed is provided below.

The two cues 730 and 725 represent a series of identifiers (ray IDs) for the ray. However, as will be described below, a plurality of rays are generally tested simultaneously for a given geometric shape. Thus, in this case, the cue 725 preferably stores the ray ID for the packet of ray IDs, and thus the queue 730 may represent a series of entries, each having a plurality of ray IDs associated with the specified shape.

As a specific example, an algorithm that drives this architecture typically waits until a plurality of orphan lines are determined to need to be tested for a given shape, after which such testing is performed and the results are output. Thus, it is common to assume that a plurality of rays will complete the test at the same time and start the test almost simultaneously. Effectively, the rays that have completed this test can be completely separated from each other by the way in which the rays initially instantiate and in terms of viewpoints or by the paths through which the rays pass through the acceleration hierarchy (hierachy). Conversely, the cue 725 may be considered to include a group or packet of new rays to be tested against a fixed GAD element of the scene (e.g., the root node of the hierarchy of GAD elements).

This new ray originates from a light source comprising a camera shader 735 and other shaders 710a-710n. The camera shader 735 is identified separately as it creates the primary ray to be tested in the scene. The shaders 710a-710n operate on computational resources, such as threads and / or cores of one or more processors, and represent execution of instructions or logic that specify appropriate responses to identified intersections between light and prototype. Usually, this response is determined at least in part by the shading code associated with the prototype, and various other effects and considerations can be made.

The shaders 710a-710n receive the crossed light and circular discriminator through the dispersion point 772 and receive this light data from the output 745 of the test control unit 703 (see FIG. 8A). The distribution point 772 may be used to provide this light ray data with the computational resources having the ability to execute the code for the specified prototype and thus any means for determining this capability may be used as the load measurement, , A fullness indicator, and a FIFO separation. Or a round robin or pseudorandom dispersion scheme may be used.

The output of the shaders 710a-710n may include other light beams (referred to as secondary light for convenience, and the output from camera 735 includes light rays). In this example, the ray at this time includes the origin or orientation data defining the ray. However, it is not necessary for the test control unit 703 to have an illuminated light ID that is desirable.

As can be seen, the test control unit 703 can monitor the light conditions of the cross-test resources and, as will be described in more detail with reference to Figures 8-9, the light rays of the completed light beam data 766a-766n Assign a new ray to replace. The dispersion of the light beam ID to the ITR (705 -705n) is performed by the disperser 780, which will be described in detail with reference to FIG. This variance is primarily controlled by the memory of the light data 766a-766n storing data defining the light rays identified by the specified identifier. Also, as described with reference to FIG. 10, based on a consideration condition such as a collection wait state, the distributor 780 controls when the light beam ID is obtained from the queue 725.

Turning now to FIG. 8A, a portion of the test control unit 703 is shown. The control unit includes memory banks each associated with light ray data 766a-766n, each bank having a densely packed slot of light data and addressable by a memory address. 8A shows that an output 744 from a ray completion queue 730 includes a light identifier 1, 18, 106, 480, each of which has a space allocated in the memory 803. This space is allowed to overwrite / cue in response to receipt of such a ray identifier from the output 744. [ Output 745 for dispersion point 722 includes light ray data for use in shading. Output 745 may also include other data. In practice, the memory 803 may be implemented in a memory that is also used by other processes, such as process execution shaders 710a-710n. In this case, the output 745 may represent (or be fetched by) fetching of this data from the memory 803 by computational resources.

As with the links 741, 742, 743, 744, 745, 750, 790, various communication links are identified in FIG. One of these links may be implemented according to the overall architecture implementation and may include a native memory area, a physical link, a virtual channel installed on the expansion bus, a shared register space, and so on.

FIG. 8B shows data for a new ray at output 741 (e.g., from a shading operation such as camera shader 735). Such light ray data includes at least a light ray origin point and direction information. Now, the test control unit 703 assigns this new ray to a new ray at a location in the memory 803 different from the ray data 766a-766n. The identifier associated with each ray origin and orientation depends on where the identifier was stored. Thus, input 742 (input relating to queue 725) receives a defined light beam identifier according to this evidence. The output 743 also includes both the light beam identifier and the origin and orientation information associated with those stored in the memory 803. The arrangement of the ray IDs shown in Figures 8A and 8B is convenient in that the ray ID can be used to index the memory to identify the relevant data. However, as a result, other types of identifiers for light may be used as long as the identification results of the light data in the ITRs 705a-705n and the memory 803 are affected by the use of light identification data.

FIG. 9A shows an example of a case where a content associative memory 910 holds a key 905 associated with different light ray data, respectively.

9B illustrates that a slot in each of the light data 766a-766n is provided to receive light data from the test control distribution 703 via interface 743. These slots may be further subdivided into multiple banks, interleaved, and / or other cache structuring mechanisms to facilitate retrieval of data from the cache. If this is where the rays need to be dispersed for storage, this distribution may be based on the least significant bits of the ray ID, by hash of the ray ID, or by module partitioning using multiple banks where dispersion will occur, or by a distribution mechanism that can be used to distribute the light data to the memory. Within any given portion, the ray data can be stored based on the ray Id.

Briefly, Figures 7-9B illustrate an architecture in which a ray to be tested is collected by the control logic and an identifier is assigned, wherein the identifier is stored in memory, in which the light definition data can be stored in individual caches connected to different cross test resources It is preferable based on location. The circular intersection test result comes from the test resource as it is completed, and the test logic reallocates the memory location for this completed ray with the new ray that needs to be tested subsequently. The completed rays can be shared in one of a plurality of different cross processing / shading resources, which can generate additional rays to be tested. In general, it is common for rays to traverse the entire accelerating structure through a cross test resource until the nearest round intersection is identified (or until it is determined that the ray does not intersect the scene background).

Returning to Fig. 10, there is shown an additional architectural aspect for rendering the system. One aspect of FIG. 10 is that light ray data can be stored in separate cache memories connected to a processor configured for cross-testing. Another aspect is how the distributor 780 can interface with the ITRs 705a-705n. Additional aspects shown relate to how the shape data about the test is provided to the tester.

The distributor 780 receives the light beam identifier from the mux 751 (FIG. 7) via a communications link 790 (implemented in hardware, inter-process or inter-thread communication, etc.). This beam ID is sent to the collection management unit 1075 where the association between the beam ID that defines the next object to be tested and the individual GAD element is maintained. The ray ID is also determined by a decision (dicision, 1013, 1014, 1015) between the queues 1021, 1022, 1023 waiting for a determination result from the collection management and storage unit 1075 to test their collection Lt; / RTI > For example, the collection 1045 is determined to be ready for testing, the ray IDs are sent to the respective ITRs 705a-705n, and the caches 1065a-1065n of the ITR contain data for each of these ray IDs do. The collection management unit 1075 may also have an interface for storing GAD element data and / or prototype data to begin a search for the geometry required for testing.

This shape reaches the queue 1040 from the storage device 103 (Fig. 1) via, for example, the link 112. This shape is identified based on its association with the elements of the GAD associated with the specified collection. For example, in the case of a hierarchical GAD, this shape may be a member of the parent GAD element. Each ITR can test its own rays in series against the shape from the cue 1040. Thus, the highest throughput may be obtained when the rays of a specified collection are equally distributed between the caches 1065a-1065n, and the collection management unit 1075 may most easily update the collection based on the test results of the specified ray collection . When multiple rays in the specified collection are in one cache, the remaining cross tests are delayed. Or they can test the rays from the next collection. Before the collection test synchronization is again requested, the test with the reversed maximum order can be adjusted.

An output is generated at output terminals 750a-750n (this output terminal may be a component of link 750 (FIG. 7)), which is provided at decision point 751 (FIG. 7). As described above, this architecture is provided in an ITR that tests for any shape (e.g., a circle or GAD element). In addition, a decision point 751 connected to the collection management unit 1075 indicates that the result of the GAD intersection test includes a determination that the specified ray of the specified GAD element is colliding, . Thus, other implementations may include providing the GAD test results directly to the collection management unit 1075. More generally, the illustrated examples illustrate the flow of provisional information, and other flows can be seen from these.

Other aspects of the present invention that should be noted are that one or more ray IDs for a specified ray collection can be stored in any of the queues 1021, 1022, 1023, and the collection 1047). In this case, the ITR for that queue can test both as a result of the light and the secondary test (or many subsequent tests), as they become available. The decision point 751 may wait for all the results of the collection to converge, or a "straggler" result may propagate as needed.

In summary, FIG. 10 depicts a system architecture that allows packets of one or more shapes to be distributed in a queue for a plurality of test resources and a beam of light identifier, wherein the plurality of resources each store a subset of the light data. Each test resource fetches the ray data identified by each ray identifier for the shape loaded into the test resource. Preferably, this shape can be streamed consistently through all test resources. The shape can be identified by a sequence of children starting at one address in main memory. Thus, Figure 10 illustrates a system structure in which one shape is tested simultaneously for multiple beams.

However, another example provides a step of successively tracing a shape in a series of different crosstalk resources, wherein the shape data and the ray identifier packets travel between the crosstalk test resources. By making a plurality of packets into an in-flight state, the throughput of the test increases. Several examples of such schemes are described below.

11 shows a first example of a computer architecture in which a ring bus arrangement of a plurality of operational resources 1104-1108 can be implemented. Each of the computational resources has access to a private L1 cache 1125a-1125n, which, for any computational resources used in the crosstab test, may be accessed from the shape data store 1115 in memory 340 Contains geometric shapes and ray data to be cross-tested. Communication between computing resources 1104-1108 may be by bus 1106, which may include a plurality of point-to-point links or any other architecture suitable for such inter-processor communication.

Communication between computing resources (1107, 1106) sharing these computational resources (e.g., L2 cache 1130)) when computing resources share a particular memory structure (e.g., L2 cache 1130, 1135) They can communicate with each other through a constant purpose cache. In addition, a copy of the data about the light rays being tested in the system may be stored in the light ray data 1110, for the distribution of their subsets between the light ray data 1110a-1110n, ) And L2 1135, and many of these may be stored in the L2 cache (described below). Shape data 115 may also be present in memory 340 and temporarily exist in one or more of L2 1130 and 1135 and in caches 1125a through 1125n. However, the light ray data stored in such a cache is protected from overwriting the shape data, and the size of the space allocated to such a shape is generally such that, without attempting to maintain the shape data, Is limited to being able to wait for testing and be used for the currently identified light packet to be sufficient to defend the latency for shape data 115. In other words, for light ray data,

Figure 11 also preferably avoids using a typical cache management algorithm, such as a recently least used replacement.

Figure 11 also illustrates that in addition to the replacement test, the application and / or driver 120 may be executed on the compute resources 1104. Light beam process 1121 may be executed on computing resource 1108 and packet data 1116 may be stored on cache 1125a for use by packet process 1121. [ Other packet data may be stored in L2 1129. However, similar to light data, it is desirable to store packet data in the fastest available memory. The packet process performs many of the same functions as the collection and other management logic imaged in the previous figures. That is, it tracks which GAD element intersects which ray, and selects a GAD element that is waiting for a test, for example, enough rays to cross enough GAD elements to test for a ray of light.

In this example, since the packet process 1121 is centralized, by issuing a packet that includes a plurality of ray identifiers and one of the data for a reference or shape for the shape to be tested for the intersection with the identified ray do. Each computational resource 1104-1107 performing a cross test receives a packet. For example, in a plurality of point-to-point links (described below), or simultaneously in a shared bus type medium (which is similar to the architecture of FIG. 10). Each of the computational resources 1104-1107 determines whether the localized light ray data 1110a-1110n stores data for any ray identified in the packet, and in that case retrieves data for that ray Test it and print the result.

Any communication mechanism that returns this result to the packet process 91121 is suitable because the result for the GAD element crossing is tracked by the packet process 1121. [ Such a mechanism can be selected based on the overall architecture of the system. Some exemplary approaches are described below and may include individual identification results for each found intersection. Or each test resource may use the crossing result to cluster the circular packet.

Figure 12 shows a further embodiment of the structuring of computational resources 1205-1208 associated with caches 1281-1284 where each of the caches stores light data 1266a-1266n and packet data 1216a-1216n do. Each operational resource 1205-1208 is connected to one or more other operational resources by the queues 1251-1208. The ray process 1210 provides input to the computational resources 1205 via the queue 1250. [ Light ray process 1210 communicates with application / driver 1202. The output 1255 from the computational resource 1208 is coupled to a light ray process 1210. Another output 1256 is coupled to the compute resource 1205. The circular and GAD storage device 103 provides the read-outs for shape data as computational resources 1205-1208.

Light ray process 1210 receives or generates light rays for testing and forms packets containing light ray identifiers and light ray data for the identified light rays. The packets are delivered to each of the computational resources 1205-1208 through the queues 1250-1254. Each of the computational resources 1205-1208 obtains a portion of the light rays in the specified packet, in some instances only one light ray, and stores some of the light rays in the light ray data 1266a-1266n. Another example involves sending a predefined packet to a particular computational resource 1205-1208 to cause the ray process 1210 to determine which ray data is to be stored in which localized ray data 1266a-1266n.

After the rays are loaded into the localized storage, they are thereafter identified by packets containing only the ray ID without the origin and orientation data. Such a packet also includes one of a reference or data for the shape to be tested for the light beam identified in the packet. In some instances, the data that forms these packets is distributed among the localized memories 1281-1284 of the computational resources 1205-1208. Thus, each of the computational resources 1205-1208 is, at a given point in time, And thus information about which rays will be tested next for which shape will be distributed. Thus, each of the computational resources 1205-1208 can be used to test a collection that is waiting to be tested A packet of the ray ID and shape information can be issued.

Each packet creates a round through the queue and computational resources and then retransmits it to the originating computational resources according to the result of the cross-testing within it. In one implementation, each computational resource 1205-1208 fetches shape data for the packet to be issued. For example, if the compute resource 1205 has a packet waiting to be tested (e.g., a collection of rays for a given GAD element), then the compute resource will be shaped by this association (e.g., a GAD element) Fetch, create a packet with data for each shape, and send each packet from queue 1251.

Next, after the packet has moved through another computing resource, the computing resource 1205 receives each packet it has released. When received, each packet contains the result of testing the shape in the packet (reference or definition data) for the intersection of the light stored in the remaining operational resources 1206-1208 and / or the light identified in the stored packet. The computational resources 1205 may test any identified light rays localized in the light data 1266a at any time before or after the remaining computational resources perform their tests. Thus, the ray definition data can be distributed among a plurality of high-speed memories connected to the cross test resources, and the test results are collected in a distributed manner.

The method of implementing the architecture according to FIG. 12 takes into account the various characteristics of the physical system in use. For example, the queue is shown as transmitting packets in one direction. However, sending packets in both directions can also be implemented (i. E., Bi-directional queues or multiple queues). Figure 12 also shows that a data packet is spread between computational resources, allowing for more L2 cache and more distributed memory access to other potentially large memory (e.g., main memory 103).

When the packet data is centralized, the packet transmitted in one direction with the data reference may have, for example, data fetched by the computational resource 1205, and the packet transmitted in the other direction along with the data reference may be computational resources 1208. < / RTI > This situation can be generalized to provide arbitrary entry points in a ring bus architecture (unidirectional or bidirectional).

As evidenced by the description, the described cues may include one or more queues for inputting new rays for a cross test into a system containing a plurality of cross test resources, and connecting the cross test resources to each other. In some cases, a queue entering a new ray may include ray definition data (e.g., a queue waiting to store data in a cache connected to a cross-testing resource). Such a queue may be implemented in such a way that it is enumerated in main memory which stores the ray definition data. It is preferred that the queue interconnecting the cross test resources transmitting the packets includes only the light beam identifier, not the light beam definition data.

Figure 13 illustrates a portion of a potential implementation of system 1200 where the compute cores may be implemented with cores of chips so that compute resource 1205 is one core and compute resource 1206 is another core . Here, the queue 1251 is an inter-core communication. Also shown is an interleaved L2 cache 1305 that can store light data as well as shape data.

As described with reference to the previous figures, the L2 cache 1305 stores some of the scene graphics and acceleration data as long as the thrashing of the light data is not increased by storing such data (i.e., Specify the priority of the storage device).

14A-14C illustrate various relationships that a queue may have, according to various implementations of an exemplary system, respectively. In general, communication between computational resources need not be serial or 1: 1. For example, FIG. 14A can be input with one input 1404 to both of the queues 1405 and 1406, and these queues can be respectively specific to one operation unit 1407 and 1408, respectively. For example, if the operation units 1407 and 1408 are implemented in a single physical chip, the input 1404 may be a chip level input, and each of the queues 1405 and 1406 may be associated with a particular core.

14B shows that a single input is input to multiple cores, each of which can be input to a computation unit 1407, 1408, each of which receives data individually in opposite queues 1406, 1405 Lt; / RTI > Fig. 14C shows that queue 1411 receives input 1410 and provides it to both output operation units 1407 and 1408. Fig. Thus, Figures 14A-14C illustrate that various queuing strategies may implement packet transmission in accordance with these aspects.

15 shows that multiple levels of cache metrology structure may exist (e.g., level 1 cache 1502, 1503 and level 2 cache 1504, various combinations of light data may be provided. The light ray data 1507 may include a separate subset of the light ray data 1505 and 1506 and similarly other light ray data may not be present in one or more of the light ray data 1505 or 1506. One The light ray data 1505 and 1506 may be dynamically changed and the light ray data may be converted into the light ray data 1505 or 1506 of the light ray stored in the light ray data 1507, Quot;). ≪ / RTI >

Fig. 16 shows in detail an example of a queue 1251 and data that can be stored. Packets 1601a-1601n are shown, each having respective conflict information fields 1610a-1610p, 1611a-1611p, 1612a-1612p corresponding to the individual ray identifiers 1605a-1605p, 1606a-1606p, 1607a-1607p . Packet 1601a includes data 1615a for shape 1 and packet 1601b includes data 1615b for shape 2 and packet 1601n includes data 1615n for shape n . Of course, various other queuing strategies (some of which are shown in Figures 14a-14c) may be implemented.

The term queuing used here does not add a first in / first out condition for the rays tested in any given computational resource. On average, the light rays identified in an arbitrarily-specified packet will be evenly distributed among the localized light-storing devices for different computational resources, thus parallelizing each packet. A bubble may be formed in which, if one beam does not need to be tested for one packet in one computing resource, the other computing resources do not have rays to cross for that packet. These bubbles can be filled by other operations, and other operations include cross testing of other packets. In some examples, each computational resource may maintain state for multiple threads and may switch between fixed threads for a given packet. As long as the overhead data for each cross test between packets can be stored in a register, a basic throughput gain must be realized.

In part, in terms of the operation of the exemplary system, each operational resource operates in response to receipt of a packet. When a packet arrives from an input queue for a particular compute resource, the compute resource examines the light identifier in the packet and determines if the identified light in the packet has stored data for the identified light in a separate memory. In other words, a packet may be formed using a ray identifier without prioritization information as to which computational resources contain fast access to the ray data for the ray identified in the packet. Furthermore, each computational resource does not make a response attempt to obtain light data for all rays identified in the packet. It only decides whether or not to have light data in the local high-speed memory for any rays identified in the packet, and only performs a light test on the intersection with the identified shape.

Figure 17 is intended to illustrate several aspects of how a packet can be processed in an exemplary computational resource. Fig. 17 shows that the packet 1601 is input to the computational resource 1206. Fig. The computational resource 1206 queries the ray data using the ray identification result from the packet 1601a (e.g., the ray 1605a has a ray ID 31 and is a ray of light within the ray data storage device 1266b) ID 31). The origin and direction associated with the ray ID) 31 are retrieved via 1290. In addition, shape data (if identified in the packet) is obtained 1715 from the memory resource 1291 in which the current shape data is stored. If no shape data is provided in the packet, the shape data is directly used. The ray 31 is then tested for the intersection with shape 1 (or the shape defined by the retrieved data).

If the tested shape is a GAD element (1725), the effect of such a cross test is to determine a subset of sub-sets of scene prototypes that may have the potential to intersect with the tested ray. Thus, the positive hit (collision) result is rewritten (1726) to the packet 1610a of the position relative to the ray identifier (i.e., the ray identifier 31). In some implementations, the sender of the packet can track what the transmitted beam ID is, what order it is in the packet, and it is necessary to rewrite only the result, assuming that the inferred order is in the same order as the transmission. Thus, after passing the tester, the packet transmission (emission) resources can process the test results.

On the other hand, when the tested shape is a circle 1730, the nearest circular intersection determining unit 1731 can determine whether the detected intersection is adjacent to any previous one. Then, in the case where the detection unit is an intersecting circle, the intersection distance may be selectively stored or output using the packet. Since the designated ray is associated with multiple packets (i.e., simultaneously with multiple GAD elements), a count can be archived 1733 whenever a ray is associated with a GAD element, and the count accordingly determines whether the ray is any other It is possible to determine when a packet no longer exists and to allow the memory used for that light to be in a free state for the input of another light beam.

In summary, it is preferred that the data associated with each ray in the local high-speed storage device includes the nearest detected circular cross-identifiers, which include parameterized distances and circular references to the intersections. Other data associated with each ray includes a count of the GAD element ray collection in which the ray is present. After each collection is tested, the count is decremented and the count increases as another collection is created. When the count is zero, it is determined that the circle identified as being the nearest intersecting circle will be intersected by that ray.

Figure 18 relates to an exemplary single instruction multiple data (SIMD) architecture, which may be used when the packet can identify the beginning of a strip of geometric shape for testing. In one example, a node of a GAD element graph is connected to one or more other nodes, where each node represents an element of geometric acceleration data, such as an axis or a sphere, arranged to define a box. In some instances, the graph is hierarchical, and thus, when testing a designated node, the characters of the designated node are known to define the selection of the prototype defined by the parent node. The GAD element finally defines the choice of the prototype.

In an implementation, a string of acceleration elements (which is a pointer to a specified element) may be identified by the memory address of the first element in the string. The architecture then provides a predefined stride length to the data at the start of the next element. A flag may be provided to indicate the end of the specified string of elements, which is the child node of the specified node. Similarly, a circular strip may be identified by a start memory having a stride length known to define the next round. More specifically for a triangle strip, two vertexes of a sequence can define multiple triangles, respectively.

FIG. 18 is intended to illustrate various aspects of an SIMD architecture similar to the SIMD architecture shown in FIG. In this example, a storage space for receiving multiple ray identifiers 1605a-1605n and optionally cross-test results 1610a-1610n, and a shape definition data, an identifier for the shape, or a strip of the shape to be tested A packet 1601 containing shape data including an identifier 1815a for the start of a triangle circle) is received.

This exemplary architecture is suitable when a small number of powerful individual processing resources (having a large cache) are used for cross-testing. Here, it can be expected that each individual processing resource has, on average, a number of rays of the local storage device that are approximately equal to the number of rays that can be tested by the SIMD instruction (in contrast, For example, it is preferable for each cache to have one light ray for each pixel. For example, it may be desirable for the four beams to have statistically four light beams in the local storage through which the SIMD units in each packet pass, if four beams in the SIMD execution unit can be tested simultaneously. For example, if four distinct processing resources are provided, each may have a SIMD unit capable of testing four rays and one packet may have about 16 referenced rays. Alternatively, individual packets may be provided to each processing resource having a SIMD unit, such that, for example, the packet may have four rays referred to when there are 4 x SIMD units.

In some instances, the first computational resource 1205 receiving the packet 1601 may use the identifier 1815a to obtain data for the shaped strip. Each ray referenced in packet 1601a stored in ray data 1266a is then tested in arithmetic units 1818a-1818d. In the example of a shape strip, a shape strip 1816 is searched, which includes Figures 1-4. Each shape can be streamed through a respective arithmetic unit 1818a-1818d and tested for each intersection with a light beam loaded into that unit. For each shape of the strip, the computational resources may organize packets (packets 1820 are shown), each containing a result of testing a ray for one of the shapes.

Alternatively, a separate bit may be provided in the result section for each ray receiving the crossing result, and one packet may be passed. In order to avoid being fetched again from the low speed memory, this scheme may be used when multiple operation resources can share L2, or when a fetch by the first operation resource causes shape data to be transferred to another operation resource, It is considered to be most suitable. For example, a DMA transaction may include multiple targets. Each target is a different computational resource that needs to receive a designated stream of the shape to be tested, and is an example of a memory transaction model suitable for some implementations. The principle considerations are to reduce the fetching of the same data from the main memory 103 more than once.

As described previously, each crosstalk test resource determines which identifier has light ray data stored in its light ray data storage device. For any ray, the ray origin and direction are searched. Above, an example has been provided in which a test resource can test a specified identification beam using a sequence of one or more identified shapes. Apparently, processing resources can test a plurality of shapes that intersect a specified ray at the same time, or test one shape and a plurality of rays, or a combination of both, without substantial additional latency. In Fig. 18, a SIMD architecture is shown, and a SIMD architecture can test different light rays that intersect with the shape in which each of the four SIMD units is successively provided, within a single computational resource configured for cross-testing. Shaped sequence may be fetched based on a shape strip reference used as an index into the scene data storage 340 to start a search for a series of shapes, And four.

Preferably, the rays are collected into a collection based on the detected intersection between the collected light and the elements of the acceleration data. Thus, in this example, when different rays are tested in each SIMD unit for different shapes of 4, the computational resources, including the SIMD unit, can reformat the results into packets of light, See the shape.

Other architectures that use SIMD units can be collections and instead provide a fetch for a plurality of collected beams. As described above, these rays are cross-tested next time for shapes related to the shape associated with the collection. For example, there may be 16 or 32 shapes associated with what is collected for a shape. The first subset of these shapes can be loaded into different SIMD units and the collected light rays can be streamed through each SIMD unit (i.e., the same rays pass through each SIMD at the same time). The result packet can be formed independently by each SIMD unit, and the next form is loaded into the SIMD unit. The beam can then be recycled through the SIMD unit. This process continues until all relevant shapes have been tested for the collected rays.

Figure 18B shows the time-based progress of the operation unit 1818a in this example. At time 1, shape 1 and ray 1 are tested. These shapes are numbered from 1 to q, and light rays from the collection are numbered from 1 to n. At time n, shape 1 and ray n are tested. At the start of the next cycle (time q - 1 * n + 1), the final shape starts the test in the operation unit 1818a.

19 illustrates various aspects of the test results and how the packets 1905 are distributed for cross testing between computational resources and the test results include an operation that manages memory for the light rays of the packet associated with the identified shape 1905 And finally merged into the resource 1910. 19 illustrates an exemplary system state during processing. Specifically, the computational resources 1910-1914 receive the light-beam ID information for the beams stored in memory, each of which is accessible to the computational resources, test the identified shape for the intersections, and output the results 1915-1919 . The output results include identified conflicts 1915, 1917, and 1919. [ Since neither the hit nor the miss can be the default operation, for example, the non-collision may not be displayed as a positive value or a default (fixed) value, or a default value in the packet may be set to non-collision . After the test, the computational resources 1910 collect at least collision information, where the computational resources 1910 can manage all of the packet information in the test system or a subset of them, including those for a particular shape.

An exemplary structuring of the memory 1966 illustrates the logical structure of shape references mapped to a plurality of ray IDs (ray A, B, etc.). It also indicates that some of the rows of the row related to Ref # 1 (reference to the shape under test) are empty. Thus, when the computational resource 1910 receives the collision result, it first fills in the remaining empty slots of the specified Ref # 1 collection, and then in 1966, the ray n starts a new packet for Ref # 1 in memory 1966 . Now, since the packet for Ref # 1 is saturated, such a packet can be determined to be in a standby state for testing. In some instances, a shaped child GAD element referenced by Ref # 1 is fetched and a packet is formed using all rays associated with Ref # 1 in each packet. For example, there are 32 children packets of Ref # 1, thus 32 packets may be formed using the packets 1922-1924 shown. In some embodiments, the computational resources 1910 may fetch data defining a child shape and store the data in packets 1922-1924. Alternatively, a reference is provided that allows other computational resources to fetch such data.

In some cases, the computational resource 1910 may also store the light rays identified in the generated packet, thus testing the light rays first before sending the packet. In this case, the computational resource 1910 may store shape data already fetched in the transmitted packet. As described with reference to FIG. 12, the instructions may cause such packets to be sent to one or more other computing resources (e.g., bi-directional queuing, random-versus-random, etc.).

Figure 20 is intended to illustrate some examples of how the methods according to the various aspects described are implemented. The packet is transmitted (2005) including shape information, a ray ID, and the location at which the collision information is rewritten (2005), the collision information may be zero'ed and at this point it may be " A first test is performed on the ray 1 ID (2006) and a conflict is found, so a 1 is written to the packet and the packet is passed to the second test 2007. Here, (Or maintained), and the collision information from the test 2006 is moved in the direction of the packet (i. E., Within the packet) A third test 2008 is performed on ray 2 and found as a collision. This example shows that the rays in the packet can be tested out of the order in which they are present in the packet, and Which tester is associated with the specified ray ID The test continues until all ray IDs have been tested (2009). The packets are then merged, which means that only the collision information needs to be maintained The new collision result can be combined with the collision result from the previously existing packet (see Figure 19). [0095] Thus, a collection of rays in the packet (2025) (based on fullness), else the other packet may be processed 2040. If it is in a wait state, the shape of the shape associated with this packet (2030), where the parent node 2041 is, for example, the shape and child node of the node identified in step 2042. The new packet is then fetched as a packet associated with the parent For each shape having a beam identifier of the emitter, it may be generated.

Figures 21 and 22 help summarize the methodological aspects described above, in terms of the content of the system that may be used to implement it. Specifically, FIG. 21 illustrates a step 2105 of a method 2100 for defining a circle and a GAD element in main memory, and defining a ray for cross-testing using ray definition data (e.g., origin and orientation information) 2110). Each ray can be identified using an identifier (2115). A subset of ray definition data is stored in a localized memory associated with the respective processing resources of the plurality of resources. The rays are scheduled for testing (2125) by distributing the identifier and shape data for these rays between the processing resources. The ray is tested 2130 in the processing resource with the definition data for the locally stored ray. In some cases, each ray may have definition data in one local memory.

The identification results for the intersection between the light and the circle are transferred 2135 from the first subset to the second subset of the computational resources. The second subset shades the intersection (2140). This shading creates a new ray, and the definition data for it is distributed 2145 between the localized memories, preferably crossing the definition data for the completed ray. These rays are then tested as described above. A subset of the computational resources may be implemented by instantiation or by computational resource allocation, where the computational resources include an instantiation thread running on a multi-threaded processor or core. The allocation can be changed over time, and no static allocation is required between resources for cross testing and shading. For example, a core executing a thread of intersection testing can complete a series of intersection tests, filling the memory space with multiple identification results for ray intersections with the circle. The core can then be switched to shade this intersection.

Some of the above examples have been largely explained in terms of testing of the GAD element for crossing. Here the result of this test is to group the rays for a grouping of increasingly smaller prototypes (through combination of ray IDs using specific GAD elements). Finally, the fact that the GAD element identified by the test will be part of the group associated with the GAD element, will thus define the prototype to be tested for the identified ray. For a packet having a circular shape, the result of the cross-test is the identification result of the light / circular intersection, which (for convenience) usually uses the other data defining the ray to track the nearest intersection detected for the specified ray .

Then, after the specified ray is tested for the entire scene, the nearest detected intersection, if any, is used for the respective ray, using the ray ID to the application or driver or to use this result to start the shading process To another process that does. As in the various embodiments of this specification, the light beam identifier may be returned via a queuing strategy (i.e., which operation resource is to be tested for a particular intersection or whether it is tested by a predefined shading resource It is not necessary to specify a specific cross test resource having the detected intersection). In some crossover tests, center of gravity coordinates are calculated for the cross test, and such coordinates may be used for shading, if necessary. This is an example of other data that can be sent from the cross tester to the shader.

In general, any of the functional units, features, and other logic described in this specification may be used to implement various computing resources. The computational resources may be threads, cores, processors, fixed function processing elements, and the like. Also, other functional units (e.g., a collection or packet management unit) may be provided or implemented as a process, thread, or task that may be distributed among a plurality of computational resources or localized to one computing resource A plurality of threads distributed between physical computing resources). The task includes identifying an in-flight state packet that includes a cross-test result for a shape having a collection managed primarily by computational resources.

Likewise, the computational resources used for the cross-testing can govern other processes, such as the shading process used to shade the detected intersection. For example, a processor executing a cross test may execute a shading thread. For example, in a ring bus implementation, if the queue for one processing resource does not currently have any packets for cross-testing, then the data processing resource may be used instead of a thread to shed previously identified intersections You can start. The fundamental difference between having a cross-testing thread on a given processor and operating a shading thread on a ray intersection detected by that thread is that there is no requirement or general relationship. Instead, the queued ray / circle intersection provides a ray input to the shading thread, so that the mapping between the intersection test resource and the shading resource may be arbitrary-versus-arbitrary so that different hardware or software units Cross testing and shading on the same orphan line can be performed.

Likewise, various interfaces (e.g., between cross-testing resources and cross-testing and siding) that mediate communication between the various queues and different functional units may be selected according to considerations related to physical resources suitable for implementing them Lt; / RTI > may be implemented in one or more memories in accordance with one of various buffering strategies. Cues can be controlled by source or destination resources. In other words, the destination can expect the data on the shared bus and obtain the necessary data, or the data can be addressed to the destination by memory mapping, direct communication, and the like.

According to a further example, if the core supports multithreading, the thread may be specified for shading, while another thread may be specialized for cross-processing. However, you should be careful to avoid non-consistency (confusion) of the cache that results from fetching textures or other shading information, instead of keeping the ray data (which maintains the cache allocation priority for the cross test resources).

This architecture has the advantage of reducing cache conditions for shape data, thus reducing cache coherency considerations for data types. In fact, in some implementations it is not difficult to keep the particular shape data available and to predict when the shape data will be used again. Instead, when the specified packet of the ray ID is in a waiting state for testing, shape data about this packet is obtained from the fastest memory, stores it, and the current workload for processing other packets is obtained from this fetch operation It will defend the latency generated. After testing this shape for the intersection, the shape data may be allowed to be overwritten.

Any one of the queues identified herein may be implemented in a shared memory resource within the SRAM in the form of a linked list, a circular buffer, a memory serialized or string memory location, or any functional unit known in the art as related to the queue . The queue operates to maintain the order of the packets, so that the first packet is output first. However, this is not necessary. In some examples, each computational resource is provided with the ability to examine a specified number of packets in the queue to determine whether it is advantageous to process the packets in a different order. This embodiment is more complex than a sequential operating system and can be provided as needed.

Computer-executable instructions include, for example, instructions and data that cause a general purpose computer, special purpose computer, or special purpose processing device to perform a particular function or group of functions. A computer-executable instruction may be, for example, a binary or intermediate instruction, such as assembly language or source code. While some of the main objectives have been described in language that specifies embodiments of the structural features and / or methodological steps, the main objects defined in the appended claims need not be limited to the features or acts described. Rather, the described features and steps are disclosed as examples of components of systems and methods within the scope of the appended claims.

Above, various examples of computing hardware and / or software programs have been described, and likewise, how such hardware / software communicate with one another has been illustrated. Software and embodiments of hardware configured with such communication interfaces provide a means for achieving the functions belonging to each of them. For example, the means for cross-testing according to some embodiments of this specification may include any one of the following: (1) a plurality of independently operating computing resources, each of which is a localized Having a storage area and operable to test the intersection of shapes and rays in response to providing an identifier for such ray and shape data).

For example, the means for managing a collection of light beams may include means for tracking a group of light beam identifiers, providing information for forming packets having shape or shape data determined by the shape associated with the group of light beam identifiers and a light beam identifier, Programming, FPGA or ASIC, or some of these.

For example, the functional unit described above includes sending an identifier for the rays that complete the intersection test and intersected with the circle through a queue for processing to a computational resource configured to shred the intersection. The means for implementing the function may include a hardware queue, or a shared memory space structured as a queue or list (e.g., a memory comprised of a ring buffer or linked list). Thus, such means may include programs and / or logic that cause the ray identifier and the circular identifier to be obtained from a next or specific slot in the queue or an in-memory location. The controller may manage the queue or memory to maintain the next read position and next write position for outputting or inputting the light and circular identifiers. Such a queuing means may also be used to interface cross-testing resources together when such resources transfer packets of ray identifier form data to one another. This queuing means may also be used to receive a ray identifier for a new ray awaiting the start of a cross test. Thus, a more detailed queuing function may be implemented by such means or equivalents.

For example, the functions described above include shading the identified intersection between the ray and the circle. Such functionality may be implemented by means including computational hardware configured using programming associated with the intersecting prototype. Programming allows the arithmetic hardware to acquire such data as textures, procedural geometric changes, etc., to determine what information is needed to determine the effect of the light on the collided prototype. Programming can cause new rays to be generated (eg, shadow, diffraction, parabolic rays) so that additional intersections are tested. Programming can interface with the application program interface to generate these rays. The ray defined by the shading program includes the origin and orientation definition information, and the controller can determine the ray identifier for this defined ray. Fixed-function hardware can be used to implement some of these functionality. However, it is desirable to make the programmable shading available as needed on computing resources that can be configured according to the prototype and / or code associated with the other code.

For example, the other functions described above may store a master list of rays that are being cross-tested and / or awaiting cross-testing, and a subset of the master rays may be distributed among the distributed cache memories associated with the means for cross- . Such functionality may be implemented by means including a processor or a group of processors using an integrated or separate memory control device to interface with a memory that stores data under the control of programming to implement such functionality. Such programming may be partially included in the driver associated with or controlling the cross-testing functionality.

Various aspects of the described and / or claimed functions and methods may be implemented in a particular purpose or general purpose computer (including computer hardware), as described in detail herein. Such hardware, firmware, and software may also be implemented in a video card, or in an external or internal computer system peripheral circuitry. A variety of functionality can be provided for customized FPGAs or ASICs or other configurable processors. On the other hand, some functionality is provided to the management or host processor. Such processing functionality may be provided by a computer, a desktop computer, a laptop computer, a message processor, a hand-held device, a multiprocessor system, a microprocessor-based or programmable consumer electronics device, a game console, A mobile phone, a PDA, a pager, and the like.

Further, the communication links and other data flow arrangements shown in the figures, such as links 112, 121, and 118 of Figure 1 and similar links in other Figures, may be implemented in various ways, according to an implementation of the identified functionality . For example, when the intersection test unit 109 comprises a plurality of threads executing on one or more CPUs, the link 118 may be used to link the physical memory access resources of such CPUs with the appropriate memory controller hardware / firmware / software And may provide access to the light data storage device 105, According to a further example, the intersection test area 140 is located in a graphics card connected to the host 140 by a PCI Express bus, and the links 121 and 112 may be implemented using a PCI Express bus.

Cross-tests such as those described in this specification generally occur in large systems and system components. For example, the processing may be distributed across a network (e.g., a local or wide area network) and may be implemented using peer-to-peer technology or the like. The separation of tasks may be determined based on the desired product performance or system, the desired price point, or a combination thereof. In implementations that implement some or all of the units described in software, computer-executable instructions that represent unit functionality may be stored on a computer-readable medium (e.g., magnetic or optical disk, flash memory, USB device) Or stored in a network of storage devices such as a NAS or SAN device. Other suitable information (e.g., data for processing) may be stored on such media.

It is also to be understood that such terminology may be used in connection with the embodiments disclosed herein and in the context of various aspects of the invention, as such terms are used in some instances where terminology is used in this specification, And should not be construed as limiting the scope in any way. For example, rays are often referred to as having an origin and orientation, and each of these individual items can be shown as representing a point in three-dimensional space and a direction vector of three-dimensional space, for an understanding of various aspects of the disclosure. However, within the scope of the present invention, various different ways of representing rays can be provided. For example, the ray direction can be expressed in spherical coordinates. In addition, data provided in one format can be transformed or mapped into other formats while retaining the meaning of the information of the originally represented data.

The above-described embodiments of the present invention are for illustrative purposes only and are not intended to limit the present invention to the described embodiments. Accordingly, it will be apparent to those skilled in the art that various changes and modifications can be made. Further, the detailed description of this specification does not limit the scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (24)

A system for cross-testing rays in a three-dimensional scene, the system comprising:
A plurality of intersection testers having access to separate caches for storing respective subsets of master copies of ray definition data defining a plurality of rays;
Control logic having access to an individual identifier for a portion of the plurality of rays defined by the ray definition data stored in the cache memory of the cross tester and controlling the ray intersection testing; And
And an output buffer for receiving an output from the plurality of crossing tester,
The output buffer including data indicative of a result of the light intersection test,
The light beam crossing test may include:
Making the plurality of light beam identifiers have an association with the identified light beam, shape information indicating one or more shapes arranged in a three-dimensional scene to be cross-tested, and
Causing the data indicative of the association to be used in the plurality of cross-ties,
Wherein each of the ray intersection testers tests the identified ray intersecting the indicated one or more shapes with the cache storing the ray definition data.
The method according to claim 1,
Further comprising a plurality of computing resources for executing shader code routines associated with a plurality of prototypes,
Wherein the prototype is identified by an output of the cross-tester that is fetchable from the output buffer.
3. The method of claim 2,
The execution of the shader code routine generates a new ray to be cross-
Further comprising an input queue to a plurality of cross tester receiving said new light,
Wherein the control logic starts a cross test of the new light beam as the other light rays complete the cross test.
4. The method according to any one of claims 1 to 3,
Each crossover tester is configured to respond in response to receipt of an identifier for a light beam stored in its respective memory, such that the light beam identifier is configured to provide definition data or to test for intersection of one or more shapes and identified light beams, Wherein the light beam crossing test system comprises:
4. The method according to any one of claims 1 to 3,
wherein said intersection test is performed between a geometric shape and a ray comprising an acceleration structural element selected from at least one of a cutting plane of the kD-tree, an axially aligned bounding box and a sphere.
4. The method according to any one of claims 1 to 3,
The crossover tester is adapted to cause the crossover tester to use the data representative of the association between the plurality of light beam identifiers and the shape information indicative of one or more shapes to be cross- To the light beam crossing test system.
The method according to claim 6,
Wherein each of the cross tester is capable of testing for a ray and an intersection, the at least one shape comprises a geometric acceleration element and a circle,
Each cross-tester further conveys the result of testing the ray for the intersection with the geometric acceleration element having the data indicative of the association,
And stores the result of testing the ray for the intersection with the circle in the cache memory of the cross tester.
4. The method according to any one of claims 1 to 3,
Wherein the plurality of cross-ties are implemented as threads of computer-executable instructions executed on one or more computing cores.
4. The method according to any one of claims 1 to 3,
A memory comprising a plurality of prototypes constituting a three dimensional scene, wherein the memory storing the prototypes functions as a main memory of an arithmetic system including at least one arithmetic core, wherein the at least one arithmetic core comprises a plurality of threads Wherein said plurality of threads are allocated by said arithmetic system between an implementation of a cross-tester and an implementation of a shader code routine on a time-varying basis.
A method for controlling ray tracing of a scene composed of a plurality of circles in a system having a plurality of arithmetic resources, wherein each arithmetic resource is connected to an individual local memory and a shared main memory, The method comprising:
Distributing data defining a respective subset of rays to be cross-tested in the scene between local memories of the plurality of computational resources;
Determining a member of a group of light beams, a group of the light rays collectively stored in a plurality of local memories, to test for intersection with a geometric shape;
Providing at least one computational resource of data and a light beam identifier for the geometric shape such that at least one computational resource having a local memory storing definition data for each ray of the group receives the geometric shape data and the light beam identifier; And
Receiving an identification result of a detected intersection between the group of rays and the geometric shape from the computational resource,
Wherein the identification result is a result of testing each ray of the group in one or more computational resources comprising a local memory storing definition data for the ray.
11. The method of claim 10,
Further comprising fetching data defining the shape from the main memory, wherein providing the data for the geometric shape comprises providing shape definition data as a computational resource with a ray identifier of the group Ray tracing control method.
The method according to claim 10 or 11,
Wherein the identification result comprises data for an intersection between a geometric acceleration element and a light ray, the group of light rays being formed by collecting light rays determined to intersect the same geometric acceleration element,
Further delaying further testing of the geometric acceleration element associated with the geometric acceleration element until as many rays as necessary for testing are collected.
The method according to claim 10 or 11,
Wherein the local memory includes a cache member and further comprises preventing the specified ray from being overwritten to the cache until the ray completes the intersection test.
The method according to claim 10 or 11,
Maintaining the detected current nearest intersection in a respective local memory for a ray having ray definition data stored in a local memory and generating each cross identification result in response to an identification result of the nearest intersection between the circle and the specified ray Step
Wherein the light beam tracking control method comprises:
The method according to claim 10 or 11,
Wherein the data for the geometric shape is selected from a set of data defining at least one shape to be tested and a reference identifying one or more shapes to be tested.
The method according to claim 10 or 11,
Wherein said providing step comprises queuing a ray identifier to a first queue to which said computational resources are connected for reception,
Wherein said receiving comprises receiving a cross identification result from a second queue.
The method according to claim 10 or 11,
Further comprising the step of storing a master copy of the light beam in the main memory.
12. A computer readable medium storing computer executable code for implementing the method of claim 10 or 11, in one or more computing resources. A system for rendering a representation of a three-dimensional scene composed of a plurality of circles using ray tracing,
A memory for storing a plurality of prototypes constituting a three-dimensional scene;
One or more memories for storing definition data for a plurality of rays;
A plurality of intersection test resources, each of which tests one or more of the plurality of circles and one or more of the plurality of rays, and outputs an intersection test result;
Driving a shading routine for the detected ray / circular intersection and generating a new ray to be cross-tested as a drive result, wherein the definition data for the new ray is stored in the one or more memories A plurality of shader operation units;
A first communication link for outputting a cross test result from the cross tester to the shader resource; And
And a second communication link for transmitting an identifier for a new ray generated as a result of driving the shading routine to the crossing tester,
Wherein the identifier for the new ray is for retrieving definition data for an individual ray from one or more memories during a cross test for such ray.
20. The method of claim 19,
Further comprising a channel for communicating a message between the plurality of cross-ties,
Wherein the cross-tester is configured to interpret data of the received message as each includes a plurality of ray identifiers, and to cross-test the selected ray identified in the message.
20. The method of claim 19,
Wherein the crossing tester comprises a ring for delivering packets of a ray identifier between intersection test resources.
20. The method of claim 19,
Each of the crossover testers selects an individual ray for testing based on a determination as to whether the cache associated with the crosstalk tester stores definition data for one of the rays identified in the message conveyed between the cross tester Lt; / RTI >
20. The method of claim 19,
Wherein the plurality of cross tester is implemented as a thread of a computer-executable instruction executed in one or more compute cores, each compute core having a localized cache memory access to a subset of light rays crossing the scene Rendering system.
20. The method of claim 19,
Wherein a memory storing a plurality of prototypes constituting the three-dimensional scene is implemented as main memory for one or more arithmetic cores, the one or more arithmetic cores collectively executing a plurality of threads simultaneously, - on a change base, between a cross-tester implementation and a shader operation unit implementation.



KR1020107023579A 2008-03-21 2009-03-20 Architectures for parallelized intersection testing and shading for ray-tracing rendering KR101550477B1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US3873108P 2008-03-21 2008-03-21
US61/038,731 2008-03-21
US9589008P 2008-09-10 2008-09-10
US61/095,890 2008-09-10

Publications (2)

Publication Number Publication Date
KR20100128337A KR20100128337A (en) 2010-12-07
KR101550477B1 true KR101550477B1 (en) 2015-09-04

Family

ID=40886951

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020107023579A KR101550477B1 (en) 2008-03-21 2009-03-20 Architectures for parallelized intersection testing and shading for ray-tracing rendering

Country Status (4)

Country Link
JP (2) JP5485257B2 (en)
KR (1) KR101550477B1 (en)
CN (2) CN104112291B (en)
WO (1) WO2009117691A2 (en)

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100064291A1 (en) * 2008-09-05 2010-03-11 Nvidia Corporation System and Method for Reducing Execution Divergence in Parallel Processing Architectures
CN101826215B (en) * 2010-04-19 2012-05-09 浙江大学 Real-time secondary ray tracing concurrent rendering method
CN102074039B (en) * 2010-09-29 2012-12-19 深圳市蓝韵网络有限公司 Method for drawing volume rendering cutting surface
KR101845231B1 (en) 2011-06-14 2018-04-04 삼성전자주식회사 Image processing apparatus and method
US20130033507A1 (en) * 2011-08-04 2013-02-07 Nvidia Corporation System, method, and computer program product for constructing an acceleration structure
CN102426710A (en) * 2011-08-22 2012-04-25 浙江大学 Surface area heuristic construction KD (K-dimension) tree parallel method on graphics processing unit
US9595074B2 (en) 2011-09-16 2017-03-14 Imagination Technologies Limited Multistage collector for outputs in multiprocessor systems
KR102042539B1 (en) * 2012-07-24 2019-11-08 삼성전자주식회사 Method and apparatus for ray tracing
CN102855655A (en) * 2012-08-03 2013-01-02 吉林禹硕动漫游戏科技股份有限公司 Parallel ray tracing rendering method based on GPU (Graphic Processing Unit)
KR102080851B1 (en) 2012-09-17 2020-02-24 삼성전자주식회사 Apparatus and method for scheduling of ray tracing
GB2546020B (en) * 2012-11-02 2017-08-30 Imagination Tech Ltd Method of scheduling discrete productions of geometry
GB2541505B (en) * 2013-03-14 2017-08-02 Imagination Tech Ltd Determining closest intersections during ray-tracing
US10970912B2 (en) 2013-03-14 2021-04-06 Imagination Technologies Limited 3-D graphics rendering with implicit geometry
GB2544931B (en) * 2013-03-15 2017-10-18 Imagination Tech Ltd Rendering with point sampling and pre-computed light transport information
CN103279974A (en) * 2013-05-15 2013-09-04 中国科学院软件研究所 High-accuracy high-resolution satellite imaging simulation engine and implementation method
CN104516831B (en) * 2013-09-26 2019-02-22 想象技术有限公司 Atomic memory updating unit and method
US11257271B2 (en) 2013-09-26 2022-02-22 Imagination Technologies Limited Atomic memory update unit and methods
KR102116981B1 (en) * 2013-10-02 2020-05-29 삼성전자 주식회사 Method and Apparatus for accelerating ray tracing
KR102193684B1 (en) * 2013-11-04 2020-12-21 삼성전자주식회사 Apparatus and method for processing ray tracing
US9697640B2 (en) * 2014-04-21 2017-07-04 Qualcomm Incorporated Start node determination for tree traversal in ray tracing applications
KR102219289B1 (en) * 2014-05-27 2021-02-23 삼성전자 주식회사 Apparatus and method for traversing acceleration structure in a ray tracing system
EP3012805A1 (en) * 2014-10-21 2016-04-27 The Procter and Gamble Company Synthesizing an image of fibers
KR102282896B1 (en) 2014-12-23 2021-07-29 삼성전자주식회사 Image processing apparatus and method
KR102493461B1 (en) * 2015-08-31 2023-01-30 삼성전자 주식회사 System and Method of rendering
US10262456B2 (en) 2015-12-19 2019-04-16 Intel Corporation Method and apparatus for extracting and using path shading coherence in a ray tracing architecture
US9892544B2 (en) * 2015-12-22 2018-02-13 Intel Corporation Method and apparatus for load balancing in a ray tracing architecture
US10282890B2 (en) * 2016-09-29 2019-05-07 Intel Corporation Method and apparatus for the proper ordering and enumeration of multiple successive ray-surface intersections within a ray tracing architecture
US10445852B2 (en) * 2016-12-22 2019-10-15 Apple Inc. Local image blocks for graphics processing
KR101826123B1 (en) 2017-07-14 2018-02-07 한국과학기술정보연구원 Unstructured Grid Volume Rendering METHOD AND APPARATUS
US10438397B2 (en) * 2017-09-15 2019-10-08 Imagination Technologies Limited Reduced acceleration structures for ray tracing systems
CN107895400A (en) * 2017-11-09 2018-04-10 深圳赛隆文化科技有限公司 A kind of three-dimensional cell domain object of virtual reality renders analogy method and device
US11138009B2 (en) * 2018-08-10 2021-10-05 Nvidia Corporation Robust, efficient multiprocessor-coprocessor interface
KR102143155B1 (en) * 2018-08-14 2020-08-10 국방과학연구소 Asymptotic high frequency method and device using Grouping of Rays
US10970914B1 (en) * 2019-11-15 2021-04-06 Imagination Technologies Limited Multiple precision level intersection testing in a ray tracing system
CN111105341B (en) * 2019-12-16 2022-04-19 上海大学 Framework method for solving computational fluid dynamics with low power consumption and high operational performance
CN111177014B (en) * 2020-02-24 2023-02-24 重庆长安新能源汽车科技有限公司 Software automatic test method, system and storage medium
US11373358B2 (en) * 2020-06-15 2022-06-28 Nvidia Corporation Ray tracing hardware acceleration for supporting motion blur and moving/deforming geometry
US11508112B2 (en) * 2020-06-18 2022-11-22 Nvidia Corporation Early release of resources in ray tracing hardware
US11367242B2 (en) * 2020-07-30 2022-06-21 Apple Inc. Ray intersect circuitry with parallel ray testing
US11335061B2 (en) 2020-07-30 2022-05-17 Apple Inc. Ray intersection data structure with many-to-many mapping between bounding regions and primitives
CN114331801B (en) * 2020-09-30 2024-04-02 想象技术有限公司 Intersection testing for ray tracing
CN112190937A (en) * 2020-10-10 2021-01-08 网易(杭州)网络有限公司 Illumination processing method, device, equipment and storage medium in game
GB2599182B (en) * 2021-03-23 2022-10-26 Imagination Tech Ltd Intersection testing in a ray tracing system
US11922026B2 (en) 2022-02-16 2024-03-05 T-Mobile Usa, Inc. Preventing data loss in a filesystem by creating duplicates of data in parallel, such as charging data in a wireless telecommunications network
US20240112397A1 (en) * 2022-09-30 2024-04-04 Advanced Micro Devices, Inc. Spatial test of bounding volumes for rasterization
CN115640138B (en) * 2022-11-25 2023-03-21 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for ray tracing scheduling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100300969B1 (en) * 1996-04-25 2001-10-22 윤종용 Method for extracting crossfoot test area in ray tracing and rendering device thereof
WO2009067351A1 (en) 2007-11-19 2009-05-28 Caustic Graphics, Inc. Systems and methods for rendering with ray tracing

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01149183A (en) * 1987-12-05 1989-06-12 Fujitsu Ltd Method for forming
EP0439260B1 (en) * 1990-01-23 1998-08-19 Hewlett-Packard Company Distributed processing apparatus and method for use in global rendering
JPH11353496A (en) * 1998-06-10 1999-12-24 Ken Nishimura Intersection search device for light ray tracing
US6556200B1 (en) * 1999-09-01 2003-04-29 Mitsubishi Electric Research Laboratories, Inc. Temporal and spatial coherent ray tracing for rendering scenes with sampled and geometry data
US6724856B2 (en) * 2002-04-15 2004-04-20 General Electric Company Reprojection and backprojection methods and algorithms for implementation thereof
DE10239672B4 (en) * 2002-08-26 2005-08-11 Universität des Saarlandes Method and device for generating a two-dimensional image of a three-dimensional structure
US7043579B2 (en) * 2002-12-05 2006-05-09 International Business Machines Corporation Ring-topology based multiprocessor data access bus
DE102004007835A1 (en) * 2004-02-17 2005-09-15 Universität des Saarlandes Device for displaying dynamic complex scenes
FR2896895B1 (en) * 2006-02-01 2008-09-26 Redway Soc Par Actions Simplifiee METHOD FOR SYNTHESIZING A VIRTUAL IMAGE BY LAUNCHING BEAMS
WO2007124363A2 (en) * 2006-04-19 2007-11-01 Mental Images Gmbh Instant ray tracing
CN101127126B (en) * 2006-08-16 2012-09-26 腾讯科技(深圳)有限公司 Method and device for emulating secondary surface dispersion effect of non-physical model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100300969B1 (en) * 1996-04-25 2001-10-22 윤종용 Method for extracting crossfoot test area in ray tracing and rendering device thereof
WO2009067351A1 (en) 2007-11-19 2009-05-28 Caustic Graphics, Inc. Systems and methods for rendering with ray tracing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Lefer, W., ‘An efficient parallel ray tracing scheme for distributed memory parallel computers’, Parallel Rendering Symposium, 1993, Page(s): 77 - 80

Also Published As

Publication number Publication date
KR20100128337A (en) 2010-12-07
CN102037497B (en) 2014-06-11
WO2009117691A3 (en) 2009-11-12
JP5485257B2 (en) 2014-05-07
JP5740704B2 (en) 2015-06-24
JP2014089773A (en) 2014-05-15
JP2011515766A (en) 2011-05-19
WO2009117691A2 (en) 2009-09-24
WO2009117691A4 (en) 2009-12-30
CN104112291B (en) 2017-03-29
CN104112291A (en) 2014-10-22
CN102037497A (en) 2011-04-27

Similar Documents

Publication Publication Date Title
KR101550477B1 (en) Architectures for parallelized intersection testing and shading for ray-tracing rendering
US10789758B2 (en) Ray tracing in computer graphics using intersection testing at selective resolution
US9710954B2 (en) Processor with ray test instructions performed by special purpose units
EP2335223B1 (en) Ray tracing system architectures and methods
US8736610B2 (en) Systems and methods for rendering with ray tracing
EP2296117B1 (en) Ray-aggregation for ray-tracing during rendering of imagery
CN113822788B (en) Early release of resources in ray tracing hardware

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
N231 Notification of change of applicant
E701 Decision to grant or registration of patent right