CN115640138B - Method and apparatus for ray tracing scheduling - Google Patents

Method and apparatus for ray tracing scheduling Download PDF

Info

Publication number
CN115640138B
CN115640138B CN202211487199.6A CN202211487199A CN115640138B CN 115640138 B CN115640138 B CN 115640138B CN 202211487199 A CN202211487199 A CN 202211487199A CN 115640138 B CN115640138 B CN 115640138B
Authority
CN
China
Prior art keywords
ray
data
information
bank
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211487199.6A
Other languages
Chinese (zh)
Other versions
CN115640138A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202211487199.6A priority Critical patent/CN115640138B/en
Publication of CN115640138A publication Critical patent/CN115640138A/en
Application granted granted Critical
Publication of CN115640138B publication Critical patent/CN115640138B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present disclosure provides methods, apparatuses, and systems, and computer readable media for ray tracing scheduling. The method includes obtaining data allocation information for a plurality of ray information pairs, wherein the data allocation information includes a location in memory where each of the plurality of ray information pairs is stored. The method further includes obtaining bank usage information regarding the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict. The method further comprises scheduling light information pairs of the plurality of light information pairs that are not involved in a bank conflict based on the obtained data allocation information and bank usage information.

Description

Method and apparatus for ray tracing scheduling
Technical Field
The present disclosure relates generally to scheduling of ray tracing. More particularly, the present disclosure relates to methods, apparatuses, and systems, and computer readable media for ray tracing scheduling.
Background
Ray tracing (ray tracing) technology is a method used when rendering three-dimensional (3D) images on a two-dimensional (2D) screen. The ray tracing technology utilizes an algorithm to simulate the physical characteristics of rays in the real world, and can realize physically accurate shadow, reflection and refraction and global illumination, so that objects in a virtual scene have more reality. Accordingly, ray tracing is widely used in the fields of games, movies, and the like, and is used in devices such as a graphic card or a Graphic Processing Unit (GPU) as a novel technique.
In operation of the ray tracing technique, rays and nodes (e.g., BOX or Triangle) in a scene may be intersected by an Arithmetic Logic Unit (ALU) in a GPU, so the ALU receives paired node data and ray data from a node cache and a ray cache, respectively, where the paired node data and ray data may be scheduled by a scheduler to be sent to the ALU or a compute unit for intersection operations.
However, as each pair of node data and light data is scheduled by the scheduler, there may be a bank conflict in both the node cache and the light cache — that is, one or both of the two banks in which the paired node data and light data are respectively located (the two banks are respectively located in the node cache and the light cache) are accessed or invoked by other visitors other than the scheduler, resulting in that these data cannot currently be scheduled for the intersection operation. In the conventional ray tracing scheduling method, when a bank conflict is encountered, the scheduler needs to wait for the conflict to disappear before performing data reading, and the waiting time affects the work efficiency of the ALU.
Therefore, there is a need for a technique that can efficiently schedule node data and ray data pairs in the event of a possible bank conflict in the node cache and ray cache, to achieve fast pipelining, eliminate latency, and ensure the operating efficiency of the ALU.
Disclosure of Invention
In view of the above, it is an object of the present disclosure to provide a method, apparatus and system for ray tracing scheduling, and computer readable medium, which are expected to overcome the above-mentioned drawbacks.
According to an aspect of the present disclosure, there is provided a method for ray tracing scheduling, including: obtaining data allocation information regarding a plurality of light information pairs, wherein the data allocation information includes a location in memory where each of the plurality of light information pairs is stored; acquiring bank usage information on the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict; and scheduling the light ray information pairs not related to bank conflict in the plurality of light ray information pairs based on the acquired data distribution information and bank usage information.
In some embodiments, the ray information pair comprises paired node data and ray data.
In some embodiments, the node data and the ray data in each ray information pair are respectively stored at respective corresponding banks in the memory, and the bank conflict means that any one of the banks storing the node data and the ray data is externally called.
In some embodiments, the data allocation information further comprises a validity flag for node data, the validity flag being effective to indicate that the node data has been retrieved and stored in the memory.
In some embodiments, scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information comprises: for each ray information pair, scheduling node data and ray data in the ray information pair only if the validity flag of the node data in the ray information pair is valid and no bank conflict is involved.
In some embodiments, scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information comprises: and sending a data scheduling instruction to the memory to transmit the light ray information pair not related to the bank conflict to a computing unit for intersection operation.
In some embodiments, obtaining data allocation information regarding the plurality of ray information pairs comprises: and receiving and caching the data distribution information.
In some embodiments, obtaining bank usage information about the plurality of ray information pairs at the current time in real-time comprises: and receiving and caching the bank use information in real time.
In some embodiments, the data allocation information is represented by a bank identifier and a line identifier.
According to another aspect of the present disclosure, there is provided an apparatus for ray tracing scheduling, including: an allocation information acquisition module configured to acquire data allocation information regarding a plurality of light information pairs, wherein the data allocation information includes a location in memory where each of the plurality of light information pairs is stored; a usage information acquisition module configured to acquire bank usage information on the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs is related to a bank conflict; and a scheduling module configured to schedule light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the data allocation information and the bank usage information.
According to another aspect of the present disclosure, there is provided a system for ray tracing scheduling, comprising: a first memory configured to store node data; a second memory configured to store ray data; a computing unit communicatively coupled to the first and second memories and configured to receive node data and ray data from the first and second memories for an intersection operation; and a scheduler communicatively coupled to the first memory and the second memory and configured to perform any of the methods described in accordance with the present disclosure.
According to another aspect of the present disclosure, there is provided a computer-readable medium having instructions stored thereon, which when executed by a computer, cause the computer to carry out any of the methods described in accordance with the present disclosure.
In general, in the context of the claimed mechanism for ray tracing scheduling, each pair of node data and ray data from each input ray may be stored at a corresponding bank in the node cache and ray cache, respectively. Then, the scheduler inquires about the bank conflict condition in the node cache and the light cache, and schedules the following node and light data pairs: no conflict occurs between two banks in the node cache and the ray cache of each of the node and ray data pairs. Specifically, for example, if either or both of two banks in which a certain pair of node data and light data is stored conflict, the scheduler does not schedule the pair of node data and light data; if neither bank in which a certain pair of node data and light data is stored conflicts, the scheduler schedules the pair of node data and light data, i.e., causes them to be sent to the ALU to perform the intersection operation.
According to some embodiments of the present disclosure, even if a bank conflict may occur in the banks of the node cache and the light cache, the scheduler may choose to schedule node data and light data pairs that do not involve the bank conflict, rather than simply waiting for the end of the bank conflict for the current data. Therefore, the scheduling mechanism according to the present disclosure eliminates the time delay for waiting for the end of the bank conflict in the prior art, and implements fast pipelining, so that the ALU can operate in a full load state as much as possible, thereby ensuring the operating efficiency of the ALU.
These and other advantages of the present disclosure will become apparent from and elucidated with reference to the embodiments described hereinafter.
Drawings
Specific exemplary embodiments of the present disclosure will now be described with reference to the accompanying drawings. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The terminology used in the detailed description of the particular exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the disclosure. In the drawings, like numerals refer to like parts.
Fig. 1 is a diagram illustrating an exemplary application scenario in which technical solutions according to embodiments of the present disclosure may be implemented.
Fig. 2 is an exemplary flow chart illustrating a method for ray tracing scheduling according to one embodiment of the present disclosure.
Fig. 3 is an exemplary block diagram illustrating a system for ray tracing scheduling according to an embodiment of the present disclosure.
FIG. 4 is an exemplary flow chart illustrating the process steps during which nodes in the system of FIG. 3 cache to perform ray tracing scheduling according to embodiments of the present disclosure.
FIG. 5 is an exemplary flowchart illustrating process steps of a ray buffer in the system of FIG. 3 during execution of a ray tracing schedule in accordance with embodiments of the present disclosure.
FIG. 6 is an exemplary flow chart illustrating process steps during execution of ray tracing scheduling in accordance with embodiments of the present disclosure by a scheduler in the system of FIG. 3.
Fig. 7 illustrates an example block diagram of an apparatus for ray tracing scheduling in accordance with an embodiment of this disclosure.
Fig. 8 illustrates an example system that includes an example computing device that represents one or more systems and/or devices that can implement the various methods described in this disclosure.
Detailed Description
The following description provides specific details of various embodiments of the disclosure so that those skilled in the art can fully understand and practice the various embodiments of the disclosure. It is understood that aspects of the disclosure may be practiced without some of these details. In some instances, well-known structures or functions are not shown or described in detail in this disclosure to avoid obscuring the description of the embodiments of the disclosure by these unnecessary descriptions. The terminology used in the present disclosure should be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a particular embodiment of the present disclosure.
Ray tracing is a method used when rendering three-dimensional (3D) images on a two-dimensional (2D) screen. The ray tracing technology utilizes an algorithm to simulate the physical characteristics of rays in the real world, and can realize physically accurate shadow, reflection and refraction and global illumination, so that objects in a virtual scene have more reality. In particular, ray tracing techniques mainly involve at least four elements: a 3D object in the scene, a light source in the scene, an image plane (as a 2D display screen), and a camera (typically a virtual camera or human eye, as a perspective from which to view the 3D object).
Hereinafter, an exemplary procedure of Ray Tracing is briefly described by taking Backward Ray Tracing (Backward Ray Tracing) as an example. First, a viewing ray (e.g., vector) is emitted from the camera through a pixel in the image plane, after which the viewing ray may intersect an object in the scene. When intersection calculation is used to determine that the ray intersects with an object and calculate an intersection point, a shadow line is emitted from the intersection point to a light source in a scene, and whether the shadow line can touch other objects or directly reach the light source is judged. If the shadow line reaches the light source directly, it can be determined that the intersection of the previously observed ray and the object will be illuminated by the light source, and the color of the pixel in the image plane through which the ray passed is calculated accordingly (e.g., its color is set to a value representing a bright portion). However, if the shadow line hits or hits another object, it may be determined that the previous intersection point is in the shadow (i.e., at the intersection point, the light of the light source is blocked by another object), and the color of the pixel passed through in the image plane may be set to a value representing a dark portion.
By analogy, multiple observation rays in multiple directions may be emitted from the camera, which rays subsequently intersect multiple objects in the scene and produce many reflected and refracted rays, etc., which may again intersect other objects and produce additional reflected and refracted rays, similar to a recursive process. The colors of the individual pixels in the image plane that intersect them, and thus the color of the entire image plane, can then be more accurately determined by the shading lines established between all of these rays and the light source, to be rendered and rendered onto a display. In this way, the 3D scene and objects therein presented on the display may be more realistic, sufficiently reflecting the interaction of light with various objects in the real world, thereby enhancing the viewing or gaming experience of the user.
It should be understood that the above process is merely exemplary to facilitate understanding of the present disclosure, and that ray tracing techniques include forward ray tracing, distributed ray tracing, path tracing, bi-directional path tracing, random ray tracing, and the like, in addition to backward ray tracing. That is, the methods for data scheduling described in this disclosure are applicable to various types of ray tracing techniques.
From the above description regarding ray tracing, it can be seen that each time it is determined whether a ray intersects an object in a scene and determines its intersection point, an ALU or a calculation unit is used to perform the intersection operation. Thus, for each ray, the computation unit may obtain ray data about the ray and data of a node (BOX or Triangle) paired with the ray, and the pair of ray and node data are stored in the two banks in the node cache and the ray cache, respectively. When either of the two banks conflicts (i.e., bank conflicts) occurs, i.e., there is an external call, i.e., when one or both banks are accessed or called by other devices, the light data or node data therein may not be scheduled and sent to the computing unit for the intersection operation. In the conventional ray tracing scheduling method, the scheduler needs to wait for the end of the bank conflict and then read and schedule the data, which causes the idle of the computing unit and affects the efficiency of the computing unit.
In order to solve the above problems, the present disclosure proposes a mechanism for ray tracing scheduling, that is, eliminating the time waiting for the end of a bank conflict by checking the bank conflict situation of a node cache and a ray cache, respectively, and scheduling ray and node data pairs that do not involve the bank conflict in both the node cache and the ray cache, thereby improving the work efficiency of a computing unit.
Fig. 1 is a diagram illustrating an exemplary application scenario 100 in which technical solutions according to embodiments of the present disclosure may be implemented. As shown in FIG. 1, the illustrated application scenario 100 includes a device 120 and a GPU 110 and a display 130 communicatively coupled to the device 120, where the GPU 110 and the display 130 may be disposed internal to the device 120 or external to the device 120. Further, GPU 110 may be communicatively coupled to device 120 as a whole with storage elements and/or other components (e.g., a graphics card).
As an example, when the device 120 generates a computation demand, it sends a computation request to the GPU 110 connected to the device 120 and waits for the GPU 110 to return a computation result. In the context of ray tracing, in one example, GPU 110 generates incoming rays (e.g., the observation rays, reflected rays, refracted rays, etc., described above) in a 3D scene in response to a request from device 120, and obtains and buffers data regarding the incoming rays (e.g., node data and ray data) from the information contained therein, then schedules and computes such data (e.g., intersection or other computations, etc.), and finally renders and presents the final computation results to display 130 for display. During the above operation, the efficiency of the computation of these data by the computation unit in the GPU 110 depends on the efficiency of the data scheduling, and the higher the efficiency of the data scheduling, the more the computation unit in the GPU 110 can run at full load without wasting its computation power. As shown in fig. 1, by implementing the ray tracing scheduling method according to the present disclosure in the GPU 110, the computational efficiency of the GPU 110 can be improved, the real-time performance of image rendering is enhanced, and image pixels can be rendered onto the display 130 better and faster, thereby improving the user experience.
As an example, when the data scheduling method for ray tracing of the present disclosure is implemented in the GPU 110, first, data allocation information regarding a plurality of ray information pairs is acquired, wherein the data allocation information includes a location where each of the plurality of ray information pairs is stored in a memory; then, obtaining bank usage information about the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict; finally, the light ray information pairs which do not relate to bank conflict in the plurality of light ray information pairs are scheduled based on the acquired data distribution information and bank use information.
The scenario 100 described above is only one example in which embodiments of the present disclosure may be implemented and is not limiting. For example, the GPU 110 may be a common graphics processor or graphics processing unit. Device 120 may be a terminal or a server or the like capable of communicatively coupling with the GPU. When the device 120 is a terminal, it may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., without limitation. When the device 120 is a server, it may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. It is noted that when device 120 is a server, display 130 may be a display apparatus communicatively coupled to another device (not shown). For example, the device 120 as a server may transmit its calculation result to the other device through a network (not shown) so that the calculation result may be presented on the display 130 of the other device in real time.
Fig. 2 is an exemplary flow chart illustrating a method 200 for ray tracing scheduling according to one embodiment of the present disclosure. The method 200 may be implemented, for example, on the GPU 110 or the device 120 as shown in fig. 1. As shown in fig. 2, the method 200 includes the following steps.
Step 210, data allocation information is obtained for a plurality of light ray information pairs, wherein the data allocation information includes a location in memory where each of the plurality of light ray information pairs is stored. By way of example, the memory may be a component of the GPU for caching data (e.g., data from double data rate synchronous dynamic random access memory (DDR SDRAM)), such as a cache, buffer, or other memory device (e.g., a node cache or a ray cache as described in this disclosure), and the data allocation information may be received by the scheduler from the memory. Furthermore, the ray information pairs may refer to data pairs that are received by and subject to an intersection operation by a computational unit in the GPU 110 as shown in fig. 1.
In some examples, the ray information pair may include paired node data and ray data. That is, each ray information pair may include one node data and one ray data paired therewith, and the pair of data may be used for the intersection operation. The node data described in the present disclosure may be coordinates of nodes, where the term "node" refers to a number of "boxes (boxes)" or "triangles (triangles)" obtained after the 3D object is segmented. For example, when a node is a BOX, the node data may be two coordinates corresponding to a diagonal of the BOX. Further, the ray data described in this disclosure may be vectors or coordinates of rays, where the term "ray" generally refers to rays emitted from a camera or a viewing angle, rays reflected or refracted from an object, and the like. A ray can be generally represented by an origin and a vector, indicating the location from which the ray emanates and the direction of travel of the ray. In another example, a ray may also be represented by two three-dimensional coordinates. Therefore, when the ALU or the calculation unit obtains the paired ray data and node data, it is possible to perform an intersection operation, that is, to determine whether the ray and node corresponding to the pair of data intersect, and if so, calculate their intersection point.
In some examples, the data allocation information may include node data allocation information and light data allocation information, and the data allocation information may be represented by a bank identifier and a line identifier. As an example, the node data and the light data in each light information pair may be stored at their respective banks in the node cache and the light cache, respectively, and in this case, the node data allocation information indicates an identifier of the corresponding bank at which the node data is stored and identifiers of lines (e.g., a line ID and a bank ID) in which the bank is located, and the light data allocation information indicates an identifier of the corresponding bank at which the node data is stored and identifiers of lines (e.g., a line ID and a bank ID) in which the bank is located. As used in this disclosure, the terms "bank" and "line" have the same meaning as understood by those skilled in the art, i.e., refer to a memory unit in a high-speed memory device such as a cache or a buffer, for example, there may be multiple lines in one cache, where each line may contain four banks — bank 0, bank 1, bank 2, bank 3, for caching data. For example, if the node data allocation information may be expressed as a vector (cache _ line _ ID, cache _ bank _ ID), the vector (1, 2) regarding the node data allocation information may indicate that a certain node data is stored at bank 2 in line 1 of the node cache, and if the light data allocation information may be expressed as a vector (ray _ line _ ID, ray _ bank _ ID), the vector (2, 3) regarding the light data allocation information may indicate that a corresponding light data is stored at bank 3 in line 2 of the light cache. The node data and the ray data are separately stored (e.g.,node data is stored in the node cache and ray data is stored in the ray cache), and each bank can store only one node data or ray data. For example, if RAY is the first incoming RAY 1 Is stored at bank 2 in line 1 of the node cache, then with respect to the second incoming RAY RAY 2 Node data of (2) may not be stored at bank 2 in line 1 of the node cache but only at other banks in the node cache. The above examples are equally applicable to light data. In this way, each data in the node cache or the ray cache can be located by only the line identifier and the bank identifier.
In some examples, the node data may be retrieved from storage and stored according to its node address, where the node address represents an address of the node data in the storage. As an example, the storage device may be a component in the GPU for storing graphics information or data to be processed (e.g., DDR SDRAM), which may later be read out (e.g., to a cache) for caching, computing, or other operations. The storage device may include, but is not limited to, volatile storage media (such as Random Access Memory (RAM), static Random Access Memory (SRAM), and Dynamic Random Access Memory (DRAM)) and/or nonvolatile storage media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). As an example, the node cache may first receive a node address from an incoming ray, where the incoming ray contains ray data and a node address corresponding to the paired node data; then, the node cache may allocate a storage location (e.g., a certain bank in a certain line of the node cache) for the node data and send the storage location (i.e., data allocation information) to the scheduler; finally, the node cache looks up the node data in the storage device according to the node address and retrieves and stores the node data at the location. As another example, the node data may also have been stored in the node cache. For example, the node cache first searches according to the node address to determine whether the node data is already stored in the node cache, and if so, the node cache sends a location (for example, an identifier of a line and an identifier of a bank in the node cache) where the node data is stored to the scheduler; if not, the node cache allocates a storage location for the node data and sends the location to the scheduler, and then retrieves the node data from the storage device according to the node address and stores it at the location, as described above.
In some examples, the data allocation information may further include a validity flag for node data, the validity flag being effective to indicate that the node data has been retrieved and stored in the memory. Specifically, the data allocation information may include node data allocation information and light data allocation information, and the node data allocation information may include a validity flag for each node data. In this case, if the node data allocation information may be expressed as, for example, a vector (cache _ line _ ID, cache _ bank _ ID, cache _ line _ valid) — where "cache _ line _ valid" is a validity flag of the node data, the vector (1, 2, 0) regarding the node data allocation information (where cache _ line _ valid = 0) may indicate that a storage location allocated to a certain node data is bank 2 in line 1 of the node cache and the node data has been retrieved and stored therein, and the vector (1, 2, 1) regarding the node data allocation information (where cache _ line _ valid = 1) may indicate that a storage location allocated to the node data is bank 2 in line 1 of the node cache and the node data has not been retrieved and stored therein. It is noted that the validity flag described in the present disclosure is not always constant, that is, the validity flag for each node data may be updated over time or periodically to indicate whether the node data is valid at the current time, thereby ensuring data timeliness. For example, a certain node data has been allocated a corresponding storage location (e.g., line ID and bank ID with respect to the node cache), but the node data has not been retrieved and stored at that location in the current clock cycle, so its validity flag may be set to 0 to indicate that the node data is invalid in the current clock cycle; however, when in the next clock cycle, the node data is likely to have been retrieved and stored at the corresponding location, and therefore, at this time, its validity flag may be set to 1 to indicate that the node data is valid. In this case, in one example, the node cache may notify the scheduler of the change of the validity flag of certain node data in real time in various ways or directly transmit the changed validity flag to the scheduler.
In some examples, obtaining data allocation information regarding the plurality of ray information pairs may include: and receiving and caching the data distribution information. In particular, the scheduler may receive and buffer multiple data allocation information, and thus may obtain information about multiple RAY information pairs (corresponding to multiple incoming RAY RAY's) 1-N ) A list of data allocation information describing in particular the location in respect of each ray information pair where the node data and the ray data are stored or allocated respectively, and possibly also a validity flag for the node data. Further, in one example, when the data allocation information includes node data allocation information and light data allocation information and is represented by a line identifier and a bank identifier, the scheduler may distinguish the identifiers representing the storage locations of the node data and the light data by buffering the line and bank identifiers with respect to the node data and the light data at different places or by any other means.
And step 220, obtaining bank usage information of the plurality of ray information pairs at the current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict. As an example, the bank usage information may be received by a scheduler from a node cache or a ray cache. As used in this disclosure, acquiring the bank usage information in real-time means that each bank usage information may only represent a bank conflict condition for the current clock cycle, and the bank usage information may be acquired again when in the next clock cycle. Specifically, the bank usage information indicates whether one or both of two banks at which paired node data and ray data are respectively stored collide at the current time, that is, indicates whether one or both of the two banks are being accessed or invoked by other visitors at, for example, the current clock cycle. In this way, the bank usage information for each of the plurality of pairs of ray information may be updated over time or periodically, for example at each clock cycle, thus ensuring data timeliness, facilitating more efficient scheduling and computation of ray and node data therein, and avoiding mis-scheduling due to changes in bank occupancy over time. As used in this disclosure, the term "bank conflict" may refer to the following: that is, a certain cache unit bank in a cache or buffer described in the present disclosure is accessed or called by other visitors besides the scheduler, thereby causing the scheduler to fail to read or schedule data in the bank. Furthermore, as used in this disclosure, the term "other visitor" may refer to any other component or element within the GPU that may perform functions such as data access or reading or calling, other than a scheduler, without limitation herein.
In some examples, obtaining bank usage information in real-time regarding the plurality of ray information pairs at the current time may include: and receiving and caching the bank use information in real time. Specifically, for example, the scheduler may receive and buffer information about the current RAY RAY at each clock cycle i Thus, information about a plurality of RAY information pairs (corresponding to a plurality of incoming RAYs RAY) can be obtained 1-N ) The bank of (1) uses a list of information that specifically describes whether each ray data is involved in a bank conflict at the current time.
And step 230, scheduling the light ray information pairs which do not relate to bank conflict in the plurality of light ray information pairs based on the acquired data allocation information and the bank use information. As an example, assume that the data allocation information obtained by the scheduler contains RAY information corresponding to five incoming RAYs 1-5 Where the node data and the ray data in the five ray information pairs are stored, respectively (e.g., by lines with respect to the node cache and the ray cache, respectively)ID and bank ID) and the bank usage information acquired by the scheduler indicates that only the corresponding incoming RAY is RAY 1 And RAY 2 The scheduler may now schedule the node data and the ray data in the two ray information pairs according to their locations where they are stored, without scheduling the other three remaining ray information pairs. Thus, even if a bank conflict may occur in the bank of the node cache and the light cache, the scheduler selects and schedules node data and light data pairs which do not involve the bank conflict, rather than simply waiting for the end of the bank conflict, thereby eliminating the time delay for waiting for the end of the bank conflict in the prior art, realizing fast flow and ensuring the work efficiency of the ALU. It is noted that in the present disclosure, there is no limitation on the order in which the ray information pairs that do not relate to bank conflicts are scheduled, and they may be scheduled according to the time at which the corresponding ray was incoming, or according to a defined priority, or in other ways.
In some examples, scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information may include: and sending a data scheduling instruction to the memory to transmit the light ray information pair not related to the bank conflict to a computing unit for intersection operation. As an example, when the scheduler determines that a certain light information pair does not involve a bank conflict according to the bank usage information, it may send a scheduling instruction to the node cache and the light cache where the node data and the light data in the data pair are respectively located (e.g., the scheduling instruction may also include a location where the node data and the light data are respectively stored, i.e., acquired data allocation information) so that the node cache and the light cache can transmit the pair of node and light data to the ALU or the calculation unit for an intersection operation.
In some examples, the bank conflict refers to any one of the banks storing the node data and the ray data being externally called. As used in this disclosure, the term "externally invoked" may refer to either or both of a bank storing the node data and the ray data being accessed or invoked by other visitors. In other words, a ray information pair that does not involve bank conflicts may refer to a ray information pair in which neither bank in which node data nor ray data, respectively, is stored is accessed or invoked by the other visitor. As an example, assuming that node data in a certain light information pair is stored at bank 2 in line 1 of the node cache and corresponding light data is stored at bank 3 in line 2 of the light cache, if neither bank conflicts occur, i.e., neither bank is accessed or invoked by other visitors, the light information pair does not involve a bank conflict and thus can be scheduled; otherwise (i.e., if a bank conflict occurs with either or both banks), the ray information pair relates to a bank conflict and therefore cannot be scheduled. As used in this disclosure, the term "other visitor" may refer to any other component or element within the GPU that may perform functions such as data access or reading or calling, other than a scheduler, without limitation herein.
In some examples, scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information may include: for each ray information pair, scheduling node data and ray data in the ray information pair only if the validity flag of the node data in the ray information pair is valid and no bank conflict is involved. As an example, in the case that the node data allocation information includes a validity flag of each node data, if a node data in a certain light information pair is allocated bank 2 in line 1 of the node cache for storage before storage and its corresponding light data is stored at bank 3 in line 2 of the light cache, the scheduler may schedule the node data and the light data in the light information pair when the node data has been retrieved and stored at bank 2 in line 1 of the node cache and neither bank of the node cache nor the light cache is accessed or invoked by other visitors.
In general, the method 200 first obtains data allocation information about a plurality of nodes and light data pairs from a node cache and a light cache, respectively, then obtains bank usage information about them in real time, and finally schedules those nodes and light data pairs that do not involve bank conflicts according to the obtained data allocation information and bank usage information. In this way, if the scheduler finds that a certain pair of node and light data may involve a bank conflict before scheduling the pair of data according to the data allocation information and the bank usage information acquired in real time, it may schedule other node and light data pairs not involving the bank conflict according to the information without waiting for the end of the bank conflict of the current data pair, thereby eliminating the waiting time and improving the efficiency of the computation unit or ALU in the GPU.
Fig. 3 is a block diagram illustrating an exemplary architecture of a system 300 for ray tracing scheduling in accordance with an embodiment of the present disclosure. As shown in fig. 3, system 300 may include scheduler 310, node cache 320, optical buffer 330, and compute unit 340. The system 300 may be included in or be part of the GPU 110 as shown in fig. 1 to perform the method for ray tracing scheduling described in this disclosure, such as the method 200. In fig. 3, the node cache 320 may be configured to store node data and the light buffer 330 may be configured to store light data. The computation unit 340 may be communicatively coupled to the node cache 320 and the light buffer 330, and may be configured to receive node data and light data from the node cache 320 and the light buffer 330 for an intersection operation. Further, the scheduler 310 may be communicatively coupled to the node cache 320 and the light buffer 330, and may be configured to perform the scheduling methods for ray tracing described in this disclosure, such as the method 200. Specifically, the node cache 320 and the optical buffer 330 may store the node data and the light data in the light information pairs, respectively, and transmit their data allocation information and bank usage information to the scheduler 310, and the scheduler 310, after receiving the data allocation information and the bank usage information, may schedule those light information pairs that do not involve bank conflict based on these information, i.e., transmit scheduling instructions to the node cache 320 and the optical buffer 330, respectively, to cause them to transmit the node data and the light data in these light information pairs to the computing unit 340 for the intersection operation.
FIG. 4 is an exemplary flow chart illustrating process steps 400 of the node cache 320 in the system 300 of FIG. 3 during execution of a ray tracing schedule according to embodiments of the present disclosure. Fig. 5 is an exemplary flowchart illustrating process steps 500 of the light buffer 330 in the system 300 of fig. 3 during execution of a ray tracing schedule according to embodiments of the present disclosure. FIG. 6 is an exemplary flow chart illustrating process steps 600 of the scheduler 310 in the system 300 of FIG. 3 during execution of ray tracing scheduling according to embodiments of the present disclosure. With reference to fig. 3-6, a detailed description of how the scheduling system 300 according to the present disclosure performs data scheduling for ray tracing is described.
Referring now to fig. 3-5, a description will be made of the node cache 320 and the optical buffer 330 in the system 300.
At step 410, node cache 320 receives a RAY RAY _ IN for an incoming RAY i A node _ data _ alloc _ request (node data allocation request) of (where i =1.. N, N > 1). IN step 510, the optical buffer 330 receives the RAY _ IN signal i Ray _ data _ alloc _ request (ray data allocation request). Here, both the node _ data _ alloc _ request and the RAY _ data _ alloc _ request come from a component or logic (not shown) external to the system 300, which may be any software or hardware or firmware, or any combination thereof, that sends requests to the node cache 320 and the light buffer 330 IN response to RAY _ IN (incoming light), without limitation.
IN step 420, IN response to node _ data _ alloc _ request, node cache 320 receives RAY _ IN i Node _ address contained in (b), and allocates its storage location in the node cache 320 for node _ data (node data) corresponding to the node _ address, for example, as represented by a line ID and a bank ID with respect to the node cache 320. Here, RAY _ IN contains node _ address of node _ data, and paired RAY _ data, where node _ data and RAY _ data paired with each other may constitute a RAY information pair,and node _ address represents an address of node _ data in the storage device. As an example, the storage device may be a component in the GPU for storing graphics information or data to be processed (e.g., DDR SDRAM), which may later be read out (e.g., read to cache) for caching or computation and display on a display. The storage device may include, but is not limited to, volatile storage media (such as Random Access Memory (RAM), static Random Access Memory (SRAM), and Dynamic Random Access Memory (DRAM)) and/or nonvolatile storage media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). In step 425, node cache 320 looks up node _ data in the storage according to node _ address to retrieve and store it at the allocated location in node cache 320.
Step 520, IN response to the RAY _ data _ alloc _ request, the optical buffer 330 receives the RAY _ IN i And stores the ray _ data in the optical line buffer 330, wherein the location where the ray _ data is stored in the optical line buffer 330 is represented by a line ID and a bank ID.
At step 430, node cache 320 sends information regarding RAY _ IN to scheduler 310 i Tag _ lookup _ response (node data allocation information) of (1), wherein tag _ lookup _ response is included IN correspondence with RAY _ IN i The storage location allocated by node _ data in the node cache 320 and information of whether node _ data is valid (has been fetched and stored) are represented by, for example, vectors (cache _ line _ ID, cache _ bank _ ID, cache _ line _ valid). Here, "cache _ line _ ID" and "cache _ bank _ ID" refer to at which bank in which line the storage location of node _ data is allocated in the node cache 320, and "cache _ line _ valid" refers to whether or not node _ data has been retrieved and stored therein.
IN step 530, the optical buffer 330 sends information about RAY _ IN to the scheduler 310 i The RAY _ buffer _ alloc _ info (RAY data allocation information), wherein the RAY _ buffer _ alloc _ info includes information about RAY _ IN i Is stored in the optical buffer 330, e.g. by the vector (ray _ line _ ID, ray)Bank ID). Here, "ray _ line _ ID" and "ray _ bank _ ID" refer to at which bank in which line in the optical buffer 330 ray _ data is stored. Since the optical line buffer 330 directly stores ray data therein after receiving ray data, a validity flag for ray data is not needed to indicate whether ray data is already stored, that is, ray data is always valid in the optical line buffer 330.
It is to be understood that although not shown in fig. 3, node _ data may also have been stored in node cache 320 before being allocated a storage location. Thus, in some examples, node cache 320 may first search through node _ address to determine whether node _ data is already stored in node cache 320, and if so, node cache 320 sends the scheduler 310 the location where node _ data is stored (e.g., the identifier of the line and the identifier of the bank in node cache 320), at which point node _ data is always valid in node cache 320 as ray _ data; if not, node cache 320 may first allocate node _ data its storage location in node cache 320 and then look up node _ data in storage for retrieval and storage according to node _ address, as described above with respect to steps 420 and 425.
IN step 440, the node cache 320 sends information about RAY _ IN to the scheduler 310 according to tag _ lookup _ response i Cache _ bank _ use _ info at the current time (i.e., bank usage information on node cache), where the cache _ bank _ use _ info representation corresponds to RAY _ IN i Whether node _ data of is currently involved in a bank conflict. IN one example, the cache _ bank _ use _ info may specifically represent a cache _ bank _ use _ info corresponding to RAY _ IN i The storage location of node _ data (e.g., the location allocated for storage, or the location already stored therein, which may be represented by the cache _ line _ ID and the cache _ bank _ ID) is currently whether a bank conflict occurs, i.e., the bank corresponding to the location is currently being externally called, i.e., accessed or called by other visitors except the scheduler 310. For example, RAY _ IN for the current incoming RAY i If its corresponding node _ data is related to ba in the current clock cyclenk collision, node cache 320 may transmit a flag indicating a "1" to scheduler 310, and if its corresponding node _ data does not involve a bank collision at the current clock cycle, node cache 320 may transmit a flag indicating a "0" to scheduler 310.
IN step 540, the optical buffer 330 sends information about RAY _ IN to the scheduler 310 according to the RAY _ BUFFER _ ALLOC _ Info i RAY _ bank _ use _ info at the current time (i.e., bank usage information on the optical line buffer 330), where the RAY _ bank _ use _ info indicates that it corresponds to RAY _ IN i Whether the ray _ data of is currently involved in a bank conflict. IN one example, RAY _ bank _ use _ info may specifically represent that it corresponds to RAY _ IN i The location where the ray _ data is stored in the optical buffer 330 (which may be represented by ray _ line _ ID and ray _ bank _ ID) is currently whether a bank conflict occurs, i.e., whether the bank corresponding to the location is currently being called externally, i.e., accessed or called by a visitor other than the scheduler 310. For example, RAY _ IN for the current incoming RAY i The optical line buffer 330 may transmit a flag indicating "1" to the scheduler 310 if its corresponding ray _ data is involved in a bank conflict at the current clock cycle, and the optical line buffer 330 may transmit a flag indicating "0" to the scheduler 310 if its corresponding ray _ data is not involved in a bank conflict at the current clock cycle.
Here, since the cache _ bank _ use _ info and the ray _ bank _ use _ info refer to only bank usage information at the current time (e.g., the current clock cycle), the node cache 320 and the optical buffer 330 may transmit the bank usage information to the scheduler 310 in real time to ensure timeliness of data. In this way, when a bank that was being accessed or invoked by another visitor at a previous time ends a bank conflict at the current time, the node cache 320 or the optical line buffer 330 corresponding to the bank may send new cache _ bank _ use _ info and ray _ bank _ use _ info to the scheduler 310 for the data in the bank, or feed the new cache _ bank _ use _ info and ray _ bank _ use _ info back to the scheduler 310 for updating. Thus, doing so allows the scheduler 310 to learn about the latest bank conflict situation, avoiding data or information hysteresis.
In the execution stepAfter step 440, node cache 320 may return to step 410 and target the next RAY RAY _ IN i+1 The above-described steps 410-440 are performed. It is noted that although not shown, RAY _ IN is used for the next RAY i+1 When these steps are performed, the node cache 320 may not only update and send the corresponding RAY RAY _ IN at the same time i The validity flag of node _ data at the current time and the bank conflict information, and may also perform step 450, described later, in parallel.
Similarly, after step 540 is performed, the light buffer 330 can also return to step 510 and aim at the next light RAY RAY _ IN i+1 The above-described steps 510-540 are performed. It is noted that although not shown, RAY _ IN is directed to the next RAY i+1 When these steps are performed, the light buffer 330 can not only update and send the corresponding last light RAY RAY _ IN at the same time i Bank _ data of the current time, and also may perform step 550 described later in parallel.
IN response to receiving the node data request, the node cache 320 sends the node data corresponding to those RAY _ IN where the data is valid and not involved IN the bank conflict to the computing unit 340 for the intersection operation, step 450. Here, the node _ data _ request is transmitted by the scheduler 310, and specifically, it may be transmitted when the scheduler 310 determines that the node _ data corresponding to these RAY _ INs are valid based on the tag _ lookup _ response and the cache _ bank _ use _ info and the pair of node _ data and RAY _ data does not involve the bank conflict. In one example, the node _ data _ request may be based on tag _ lookup _ response, i.e., may include, in addition to a request for node _ data, the location where node _ data is stored in the node cache 320, such as cache _ line _ ID and cache _ bank _ ID, so that the node cache 320 can find and send node _ data.
IN response to receiving the RAY data request, the light buffer 330 sends RAY data corresponding to those RAY _ IN where the data is valid and not involved IN the bank conflict to the computing unit 340 for the intersection operation, step 550. Here, the RAY _ data _ request is transmitted by the scheduler 310, and specifically, it may be transmitted when the scheduler 310 determines that the node _ data corresponding to the RAY _ IN is valid based on RAY _ buffer _ alloc _ info and RAY _ bank _ use _ info and that the node _ data and RAY _ data pair does not involve bank collision. In one example, the ray _ data _ request may be based on ray _ buffer _ alloc _ info, i.e., may include, in addition to a request for ray _ data, a location where ray _ data is stored in the optical line buffer 330, such as ray _ line _ ID and ray _ bank _ ID, so that the optical line buffer 330 can find and transmit ray _ data.
It is noted that the above-described operations and processes are merely exemplary for illustrative purposes and are not limiting herein. Further, the processes described above may include more or fewer steps, and the steps may be performed iteratively or in a pipelined manner, or may be performed in parallel or simultaneously, or in other manners.
Referring now to fig. 3 and 6, a description will be made of scheduler 310 in system 300.
IN step 610, the scheduler 310 obtains information about a plurality of RAY _ IN from the node cache 320 and the optical buffer 330, respectively 1-N (where N > 1) of tag _ lookup _ response and RAY _ buffer _ alloc _ info, where tag _ lookup _ response and RAY _ buffer _ alloc _ info respectively represent information corresponding to the plurality of RAY _ INs 1-N Are stored in node cache 320 and optical buffer 330, respectively. IN one example, the scheduler 310 may receive and buffer the tag _ lookup _ response and the RAY _ buffer _ alloc _ info, and thus may obtain information about the plurality of RAY _ INs 1-N A list of data allocation information of (1). In the case where the tag _ lookup _ response further includes a validity flag for each node _ data, the scheduler 310 may update the validity flag in real time or acquire the updated validity flag in real time. Since step 610 corresponds to steps 430 and 530 described above, it will not be described in detail here.
IN step 620, the scheduler 310 obtains information about a plurality of RAY _ IN from the node cache 320 and the optical buffer 330, respectively, IN real time 1-N Cache _ bank _ use _ info and ray _, at the current timebuffer _ alloc _ info, where cache _ bank _ use _ info and RAY _ buffer _ alloc _ info respectively represent the cache _ bank _ use _ info corresponding to the plurality of RAY _ INs 1-N Whether each of the node _ data and ray _ data pairs of the plurality of ray information pairs is currently involved in a bank conflict. IN one example, the scheduler 310 may receive and buffer cache _ bank _ use _ info and RAY _ buffer _ alloc _ info, and thus may obtain information about the plurality of RAY _ IN 1-N Using a list of information. In addition, since the cache _ bank _ use _ info and the ray _ buffer _ alloc _ info only represent the bank usage information at the current time and are acquired in real time, the bank usage information for each pair of node _ data and ray _ data can be updated at any time or periodically, thereby ensuring data timeliness and avoiding mis-scheduling due to the change of the bank occupancy over time. Since step 620 corresponds to steps 440 and 540 described above, it will not be described in detail here.
IN step 630, based on the received tag _ lookup _ response and RAY _ buffer _ alloc _ info and cache _ bank _ use _ info and RAY _ buffer _ alloc _ info, the scheduler 310 targets each incoming RAY (e.g., RAY _ IN) i ) It is determined whether the node _ data corresponding thereto is valid and whether the pair of node _ data and ray _ data does not involve a bank conflict. As an example, the validity of the node _ data may be represented by "cache _ line _ valid" in a vector (cache _ line _ ID, cache _ bank _ ID, cache _ line _ valid). Furthermore, that a certain node _ data and ray _ data pair is not involved in a bank conflict means that neither node _ data nor ray _ data are involved in a bank conflict, that is, neither of the two locations where they are stored in the node cache 320 and the optical line buffer 330, respectively, have a bank conflict.
After determining the validity and bank conflict conditions for each incoming ray, the scheduler 310 selects, step 640, to schedule those node _ data and ray _ data pairs that are valid and not involved in the bank conflict. As an example, if the scheduler 310 obtains information about five incoming RAYs RAY _ IN 1-5 And bank usage information, and wherein only two RAYs RAY _ IN 1 And RAY _ IN 2 The corresponding ray information pair (i.e., paired node _ data and ray _ data) is valid and does not involve bank conflict, the scheduler 310 may schedule the light RAY information pairs without scheduling RAY _ IN corresponding to the remaining light RAYs 3-5 The pair of light information. In one example, in performing the scheduling operation, the scheduler 310 may transmit the node _ data _ request and the RAY _ data _ request to the node buffer 320 and the optical line buffer 330, so that the node buffer 320 and the optical line buffer 330 transmit those pairs of node _ data and RAY _ data that satisfy the above condition to the calculation unit 340, and the calculation unit 340 may perform an intersection operation with respect to each pair of node _ data and RAY _ data at this time, and output a result RAY _ OUT. Further, IN the above example, if corresponding to the incoming RAY RAY _ IN 1-5 If all of the pairs fail to satisfy the above condition, the scheduler 310 may wait until the next clock cycle and make a decision again until an incoming RAY (possibly RAY _ IN) is found that satisfies the above condition 1-5 One or more of the other incoming rays, or both) may also be used.
Therefore, compared to the conventional scheduling method (i.e., the data can be read and scheduled after the bank conflict of the current data is finished), the method according to the present disclosure can greatly reduce the waiting time and improve the efficiency of the computing unit in the GPU. In the method for ray tracing scheduling according to the above description, even if the scheduler needs to wait until the next clock cycle, the time is much shorter than the time until the end of the bank conflict with respect to the current data as in the conventional method, because the scheduler will continuously receive many incoming rays in each clock cycle and the conflict situation of each bank in the node buffer or the ray buffer will change rapidly with time, so incoming rays satisfying the above condition (i.e., not involving the bank conflict) can be found quickly by the scheduler and then the node data and ray data pairs corresponding to them can be scheduled.
It is noted that the above-described operations and processes are merely exemplary for illustrative purposes and are not limiting herein. Further, the processes described above may include more or fewer steps, and the steps may be performed iteratively or in a pipelined manner, or may be performed in parallel or simultaneously, or in other manners.
Fig. 7 illustrates an example block diagram of an apparatus 700 for ray tracing scheduling in accordance with an embodiment of this disclosure. As shown in fig. 7, the apparatus 700 includes an allocation information acquisition module 710, a usage information acquisition module 720, and a scheduling module 730. The apparatus 700 may have a similar configuration as the scheduler 310 in the system 300 of fig. 3 and be configured to perform any of the methods for ray tracing scheduling according to the present disclosure, such as the method 200 of fig. 2, or the method steps 400-600 in fig. 4-6.
The distribution information acquisition module 710 is configured to acquire data distribution information regarding a plurality of pairs of ray information, wherein the data distribution information includes a location in memory where each of the plurality of pairs of ray information is stored.
The usage information acquiring module 720 is configured to acquire bank usage information regarding the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict.
The scheduling module 730 is configured to schedule the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the data allocation information and the bank usage information.
It is noted that the above-described structure and configuration of the apparatus 700 is merely exemplary and not limiting.
Fig. 8 illustrates an example system 800 that includes an example computing device 810 that represents one or more systems and/or devices that can implement the various methods described in this disclosure. Computing device 810 may be, for example, a server of a service provider, a device associated with a server, a system on a chip, and/or any other suitable computing device or computing system. The apparatus 700 described above with reference to fig. 6 and the system 300 described above with reference to fig. 3 may each take the form of or be included in a computing device 810. Alternatively, the apparatus 700 or system 300 may be implemented as a computer program in the form of an application 816.
The illustrated example computing device 810 includes a processing system 811, one or more computer-readable media 812, and one or more I/O interfaces 813 communicatively coupled to each other. Although not shown, computing device 810 may also include a system bus or other data and command transfer system that couples the various components, one to another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.
The processing system 811 is representative of functionality to perform one or more operations using hardware. Thus, the processing system 811 is illustrated as including hardware elements 814 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware elements 814 are not limited by the materials from which they are formed or the processing mechanisms employed therein. For example, a processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In this context, the processor-executable instructions may be electronically-executable instructions.
The computer-readable medium 812 is illustrated as including memory/storage 815. Memory/storage 815 represents memory/storage associated with one or more computer-readable media. The memory/storage 815 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). The memory/storage 815 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 812 may be configured in various other ways as further described below.
One or more I/O interfaces 813 represent functionality that allows a user to enter commands and information to computing device 810 using various input devices and optionally also allows information to be presented to the user and/or other components or devices using various output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), a camera (e.g., motion that may not involve touch may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, a haptic response device, and so forth. Accordingly, the computing device 810 may be configured in various ways to support user interaction, as described further below.
Computing device 810 also includes applications 816. The application 816 may be, for example, a software instance of the apparatus 700 for ray tracing scheduling, and in combination with other elements in the computing device 810 implement the techniques described in this disclosure.
In this disclosure, various techniques may be described in the general context of software, hardware elements, or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used in this disclosure generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described in this disclosure are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.
An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that can be accessed by computing device 810. By way of example, and not limitation, computer-readable media may comprise "computer-readable storage media" and "computer-readable signal media".
"computer-readable storage medium" refers to a medium and/or device, and/or a tangible storage apparatus, capable of persistently storing information, as opposed to mere signal transmission, carrier wave, or signal per se. Accordingly, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits, or other data. Examples of computer readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or an article of manufacture suitable for storing the desired information and accessible by a computer.
"computer-readable signal medium" refers to a signal-bearing medium configured to transmit instructions to the hardware of computing device 810, such as via a network. Signal media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signal media also includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
As previously described, the hardware elements 814 and the computer-readable medium 812 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware form that may be used in some embodiments to implement at least some aspects of the techniques described in this disclosure. The hardware elements may include integrated circuits or systems-on-chips, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or components of other hardware devices. In such a context, the hardware elements may act as processing devices to perform program tasks defined by the instructions, modules, and/or logic embodied by the hardware elements, as well as hardware devices to store instructions for execution, such as the computer-readable storage media previously described.
Combinations of the foregoing may also be used to implement the various techniques and modules described in this disclosure. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 814. Computing device 810 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, implementing modules as modules executable by computing device 810 as software may be implemented at least partially in hardware, for example, using computer-readable storage media of a processing system and/or hardware elements 814. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 810 and/or processing systems 811) to implement the techniques, modules, and examples described in this disclosure.
In various implementations, computing device 810 may assume a variety of different configurations. For example, the computing device 810 may be implemented as a computer-type device including a personal computer, a desktop computer, a multi-screen computer, a laptop computer, a netbook, and so on. The computing device 810 may also be implemented as a mobile device-like device including mobile devices such as mobile telephones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 810 may also be implemented as a television-like device that includes devices with or connected to a generally larger screen in a casual viewing environment. These devices include televisions, set-top boxes, game consoles, and the like.
The techniques described in this disclosure may be supported by these various configurations of computing device 810 and are not limited to specific examples of the techniques described in this disclosure. Functionality may also be implemented in whole or in part on the "cloud" 820 through the use of a distributed system, such as through a platform 822 as described below.
Cloud 820 includes and/or represents a platform 822 for resources 824. The platform 822 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 820. Resources 824 may include applications and/or data that may be used when computer processing is performed on a server remote from computing device 810. The resources 824 may also include services provided over the internet and/or over a subscriber network such as a cellular or Wi-Fi network.
The platform 822 may abstract resources and functions to connect the computing device 810 with other computing devices. The platform 822 may also serve to abstract the hierarchy of resources to provide a corresponding level of hierarchy encountered for the requirements of the resources 824 implemented via the platform 822. Thus, in interconnected device embodiments, implementation of the functionality described by the present disclosure may be distributed throughout the system 800. For example, the functionality may be implemented in part on the computing device 810 and through the platform 822 that abstracts the functionality of the cloud 820.
A computer storage medium having computer instructions stored thereon is provided. The processor of the computing device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computing device to perform the methods or techniques for ray tracing scheduling provided in the various alternative implementations described above.
It will be appreciated that for clarity, embodiments of the disclosure have been described with reference to different functional units. However, it will be apparent that the functionality of each functional unit may be implemented in a single unit, in a plurality of units or as part of other functional units without deviating from the present disclosure. For example, functionality illustrated to be performed by a single unit may be performed by a plurality of different units. Thus, references to specific functional units are only to be seen as references to suitable units for providing the described functionality rather than indicative of a strict logical or physical structure or organization. Thus, the present disclosure may be implemented in a single unit or may be physically and functionally distributed between different units and circuits.
It will be understood that, although the terms first, second, third, etc. may be used in this disclosure to describe various devices, elements, components or sections, these devices, elements, components or sections should not be limited by these terms. These terms are only used to distinguish one device, element, component or section from another device, element, component or section.
Although the present disclosure has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present disclosure is limited only by the accompanying claims. Additionally, although individual features may be included in different claims, these may possibly advantageously be combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The order of features in the claims does not imply any specific order in which the features must be worked. Furthermore, in the claims, the word "comprising" does not exclude other elements, and the terms "a" and "an" do not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

Claims (18)

1. A method for ray trace scheduling, the method comprising:
obtaining data allocation information regarding a plurality of ray information pairs, wherein the ray information pairs comprise paired node data and ray data, and the data allocation information comprises a location in memory where each of the plurality of ray information pairs is stored;
obtaining bank usage information at a current time instant regarding the plurality of ray information pairs in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs relates to a bank conflict; and
scheduling a light information pair not involved in a bank conflict among the plurality of light information pairs based on the acquired data allocation information and bank usage information.
2. The method of claim 1, wherein the node data and the ray data in each ray information pair are respectively stored at respective corresponding banks in the memory, and the bank conflict means that any one of the banks storing the node data and the ray data is called externally.
3. The method of claim 1, wherein the data allocation information further comprises a validity flag for node data, the validity flag effectively indicating that the node data has been retrieved and stored in the memory.
4. The method of claim 3, wherein scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information comprises:
for each ray information pair, scheduling node data and ray data in the ray information pair only if the validity flag of the node data in the ray information pair is valid and no bank conflict is involved.
5. The method of claim 1, wherein scheduling the light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the obtained data allocation information and bank usage information comprises:
and sending a data scheduling instruction to the memory to transmit the light ray information pair not related to the bank conflict to a computing unit for intersection operation.
6. The method of claim 1, wherein obtaining data allocation information for the plurality of ray information pairs comprises: and receiving and caching the data distribution information.
7. The method of claim 1, wherein obtaining bank usage information about the plurality of ray information pairs at a current time in real-time comprises: and receiving and caching the bank use information in real time.
8. The method of claim 1, wherein the data allocation information is represented by a bank identifier and a line identifier.
9. An apparatus for ray trace scheduling, the apparatus comprising:
an allocation information acquisition module configured to acquire data allocation information regarding a plurality of ray information pairs, wherein the ray information pairs include paired node data and ray data, and the data allocation information includes a location in memory where each of the plurality of ray information pairs is stored;
a usage information acquisition module configured to acquire bank usage information on the plurality of ray information pairs at a current time in real time, wherein each of the bank usage information indicates whether each of the plurality of ray information pairs is related to a bank conflict; and
a scheduling module configured to schedule light information pairs of the plurality of light information pairs that do not involve bank conflicts based on the data allocation information and the bank usage information.
10. The apparatus of claim 9, wherein node data and ray data in each ray information pair are stored in the memory at respective banks, and wherein a bank conflict is an external call to any one of the banks storing the node data and ray data.
11. The apparatus of claim 9, wherein the data allocation information further comprises a validity flag for node data, the validity flag effectively indicating that the node data has been retrieved and stored in the memory.
12. The apparatus of claim 11, wherein the scheduling module is further configured to:
for each ray information pair, scheduling node data and ray data in the ray information pair only if the validity flag of the node data in the ray information pair is valid and no bank conflict is involved.
13. The apparatus of claim 9, wherein the scheduling module is further configured to:
and sending a data scheduling instruction to the memory to transmit the light ray information pair not related to the bank conflict to a computing unit for intersection operation.
14. The apparatus of claim 9, wherein the acquisition module is further configured to: and receiving and caching the data distribution information.
15. The apparatus of claim 9, wherein the acquisition module is further configured to: and receiving and caching the bank use information in real time.
16. The apparatus of claim 9, wherein the data allocation information is represented by a bank identifier and a line identifier.
17. A system for ray tracing scheduling, the system comprising:
a first memory configured to store node data;
a second memory configured to store ray data;
a computing unit communicatively coupled to the first memory and the second memory and configured to receive node data and ray data from the first memory and the second memory for an intersection operation; and
a scheduler communicatively coupled to the first memory and the second memory and configured to perform the method of any of claims 1-8.
18. A computer-readable medium having instructions stored thereon, which, when executed by a processor of a computing device, cause the computing device to carry out the method according to any one of claims 1-8.
CN202211487199.6A 2022-11-25 2022-11-25 Method and apparatus for ray tracing scheduling Active CN115640138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211487199.6A CN115640138B (en) 2022-11-25 2022-11-25 Method and apparatus for ray tracing scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211487199.6A CN115640138B (en) 2022-11-25 2022-11-25 Method and apparatus for ray tracing scheduling

Publications (2)

Publication Number Publication Date
CN115640138A CN115640138A (en) 2023-01-24
CN115640138B true CN115640138B (en) 2023-03-21

Family

ID=84947864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211487199.6A Active CN115640138B (en) 2022-11-25 2022-11-25 Method and apparatus for ray tracing scheduling

Country Status (1)

Country Link
CN (1) CN115640138B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049032B (en) * 2023-03-30 2023-06-23 摩尔线程智能科技(北京)有限责任公司 Data scheduling method, device and equipment based on ray tracing and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5485257B2 (en) * 2008-03-21 2014-05-07 コースティック グラフィックス インコーポレイテッド Parallelized cross-test and shading architecture for ray-trace rendering
KR102193684B1 (en) * 2013-11-04 2020-12-21 삼성전자주식회사 Apparatus and method for processing ray tracing
CN111857831B (en) * 2020-06-11 2021-07-20 成都海光微电子技术有限公司 Memory bank conflict optimization method, parallel processor and electronic equipment
CN113344766B (en) * 2021-06-07 2022-09-06 中天恒星(上海)科技有限公司 Ray tracing processor, processor chip, equipment terminal and ray tracing method

Also Published As

Publication number Publication date
CN115640138A (en) 2023-01-24

Similar Documents

Publication Publication Date Title
US10235338B2 (en) Short stack traversal of tree data structures
US10242485B2 (en) Beam tracing
CN111143174B (en) Optimum operating point estimator for hardware operating under shared power/thermal constraints
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
CN110766778B (en) Method and system for performing parallel path spatial filtering using hashing
US11481950B2 (en) Real-time hardware-assisted GPU tuning using machine learning
US9239793B2 (en) Mechanism for using a GPU controller for preloading caches
JP6335335B2 (en) Adaptive partition mechanism with arbitrary tile shapes for tile-based rendering GPU architecture
KR20140105609A (en) Online gaming
US11315303B2 (en) Graphics processing
US20160027204A1 (en) Data processing method and data processing apparatus
KR102476973B1 (en) Ray intersect circuitry with parallel ray testing
WO2019019926A1 (en) System parameter optimization method, apparatus and device, and readable medium
CN112288619A (en) Techniques for preloading textures when rendering graphics
US20230229524A1 (en) Efficient multi-device synchronization barriers using multicasting
CN115640138B (en) Method and apparatus for ray tracing scheduling
KR102521654B1 (en) Computing system and method for performing graphics pipeline of tile-based rendering thereof
US11609899B2 (en) Concurrent hash map updates
CN110908929B (en) Coherent data cache for high bandwidth scaling
US20200264879A1 (en) Enhanced scalar vector dual pipeline architecture with cross execution
US11836844B2 (en) Motion vector optimization for multiple refractive and reflective interfaces
US11908064B2 (en) Accelerated processing via a physically based rendering engine
US20200264781A1 (en) Location aware memory with variable latency for accelerating serialized algorithm
US9378139B2 (en) System, method, and computer program product for low latency scheduling and launch of memory defined tasks
US20220327771A1 (en) Temporal denoiser quality in dynamic scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant