CN113344766A - Ray tracing processor, processor chip, equipment terminal and ray tracing method - Google Patents

Ray tracing processor, processor chip, equipment terminal and ray tracing method Download PDF

Info

Publication number
CN113344766A
CN113344766A CN202110629352.3A CN202110629352A CN113344766A CN 113344766 A CN113344766 A CN 113344766A CN 202110629352 A CN202110629352 A CN 202110629352A CN 113344766 A CN113344766 A CN 113344766A
Authority
CN
China
Prior art keywords
ray
ray tracing
task
unit
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110629352.3A
Other languages
Chinese (zh)
Other versions
CN113344766B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongtian Xingxing Shanghai Technology Co ltd
Original Assignee
Zhongtian Xingxing Shanghai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongtian Xingxing Shanghai Technology Co ltd filed Critical Zhongtian Xingxing Shanghai Technology Co ltd
Priority to CN202110629352.3A priority Critical patent/CN113344766B/en
Publication of CN113344766A publication Critical patent/CN113344766A/en
Application granted granted Critical
Publication of CN113344766B publication Critical patent/CN113344766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The application discloses a ray tracing processor, a processor chip, an equipment terminal and a ray tracing method, wherein the ray tracing processor comprises a global task allocation unit for receiving ray tracing tasks and allocating the ray tracing tasks according to the working states of a plurality of ray processing controllers; the ray tracing task comprises a ray intersection calculation task; each ray processing controller comprises a local task allocation unit and a plurality of types of calculation units; the local task allocation unit is used for receiving the ray tracing task of the global task allocation unit and allocating the ray tracing task to the adaptive computing unit according to the ray intersection computing task to perform ray intersection computing; the global cache unit is used for caching the processing result of each ray processing controller for processing the ray tracing task; the task returning unit is used for acquiring a processing result corresponding to each ray tracing task and returning the processing result to the initiating end of the ray tracing task. The ray tracing processor improves the operation efficiency of ray tracing tasks.

Description

Ray tracing processor, processor chip, equipment terminal and ray tracing method
Technical Field
The present application relates to the field of computer graphics technologies, and in particular, to a ray tracing processor, a processor chip, an apparatus terminal, and a ray tracing method.
Background
Ray tracing is an algorithm for realizing realistic drawing by simulating ray propagation in the real world, a reflected and scattered path generated on the surface of an object is obtained by tracing rays interacting with the object based on the material properties of each object in a scene, a real virtual scene is simulated, the calculation scale of ray tracing is huge because each ray in the scene needs to be traced in the whole simulation process, the ray tracing is widely applied only in the fields of the film industry and the like, however, real-time ray tracing cannot be developed all the time due to the limitation of hardware, in recent years, with the continuous development of general calculation of a Graphics Processing Unit (GPU), various solutions for the problems are provided, for example, the calculation process of the whole ray tracing is realized on the GPU, namely, the traversal from ray generation to acceleration structure is executed on the GPU until the final coloring is executed on the GPU, however, this greatly increases the burden on the GPU, and when the core algorithm is very complex, the overhead caused by logic control may be very large, resulting in a significant reduction in overall computational performance. Some manufacturers equip an independent ray tracing processor, for example, RT core of NVIDIA corporation, which is a special hardware unit dedicated to accelerate real-time ray tracing computation, but the greatest problem is that the ray tracing processor cannot handle the divergence problem of rays well, because a lot of different materials are commonly present in a scene, the Shader identity code (Shader ID) of the final Shader Group (Hit Group) of each ray after traversal will be diverged greatly, and the hardware solution in the market at present cannot handle such problem well, which results in the decrease of computation efficiency by tens of times when divergence problems occur greatly.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present application is to provide a ray tracing processor, a processor chip, a device terminal and a ray tracing method, which improve the operation efficiency of ray tracing and solve the problems of the prior art.
To achieve the above and other related objects, a first aspect of the present application provides a ray tracing processor, comprising: the system comprises a global task allocation unit, a plurality of light ray processing controllers, a global cache unit and a task return unit; the global task allocation unit is coupled to each ray processing controller and used for receiving the ray tracing tasks and allocating the ray tracing tasks according to the working states of the ray processing controllers; the ray tracing task comprises a ray intersection calculation task; each of the ray processing controllers includes: the system comprises a local task allocation unit and a plurality of types of computing units; the local task allocation unit is used for receiving the ray tracing task of the global task allocation unit and allocating the ray tracing task to the adaptive computing unit according to the ray intersection computing task to perform ray intersection computing; wherein the adaptation refers to the type adaptation of an intersection object of the light intersection calculation and the calculation unit;
the global cache unit is coupled to each ray processing controller and used for caching the processing result of each ray processing controller for processing the ray tracing task; the task returning unit is coupled to the global task allocation unit and the global cache unit, and configured to obtain a processing result corresponding to each ray tracing task and return the processing result to an originating end of the ray tracing task.
A second aspect of the present application provides a processor chip comprising: at least one main processor; the ray tracing processor of claims 1-6, coupled to the at least one host processor, for receiving and processing ray tracing tasks issued by the at least one host processor and returning the computation results.
A third aspect of the application provides a device terminal comprising the processor chip provided in the second aspect.
A fourth aspect of the present invention provides a ray tracing method applied to the ray tracing processor, the ray tracing method comprising:
receiving the ray tracing tasks and distributing the ray tracing tasks according to the working states of the ray tracing controllers; distributing the distributed ray tracing tasks to the adaptive computing units for ray intersection computing according to the ray intersection computing tasks; and acquiring a processing result corresponding to each ray tracing task and returning the processing result to the initiating end of the ray tracing task.
In an embodiment of the first aspect, the ray processing controller further includes a local cache unit, and the local cache unit is configured to cache a calculation result of the ray tracing task to the global cache unit.
In an embodiment of the first aspect, the plurality of types of computing units comprises a general purpose computing unit and a ray acceleration unit.
In an embodiment of the first aspect, the optical tracking system further includes a preprocessing unit, coupled to the global task allocation unit, for performing data parsing and data conversion on the received ray tracing task.
In an embodiment of the first aspect, the preprocessing unit further comprises a reorganization unit for reorganizing the ray tracing data.
In an embodiment of the first aspect, the global task allocation unit is further coupled to the task return unit.
In an embodiment of the second aspect, the main processor comprises a graphics processor.
In an embodiment of the second aspect, the system further includes a circular cache unit, coupled to the main processor and the ray tracing processor, for caching the ray tracing task issued by the main processor and providing the ray tracing task to the global task allocation unit for allocation.
Compared with the prior art, the technical scheme of the embodiment of the application has the following beneficial effects:
on one hand, different computing units are adopted according to different scene requirements, and the problem of efficiency reduction caused by light direction and material divergence is solved.
On the other hand, the optimal power, performance and area (PPA) can be greatly improved at the same calculation speed, and the probability of repeatedly reading the calculation units of the light processing controller is reduced.
Drawings
FIG. 1 is a block diagram of an exemplary ray tracing processor according to the present application;
FIG. 2 is a block diagram of a ray tracing processor according to an embodiment of the present application;
FIG. 3 is a block diagram of a ray tracing processor according to the third embodiment of the present application;
fig. 4 is a flowchart illustrating a ray tracing method according to a fourth embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.
In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.
Throughout the specification, when a device is referred to as being "connected" to another device, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a device "includes" a certain component, unless otherwise stated, the device does not exclude other components, but may include other components.
When a device is said to be "on" another device, this may be directly on the other device, but may also be accompanied by other devices in between. When a device is said to be "directly on" another device, there are no other devices in between.
Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface, etc. are described. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.
Terms representing relative spatial terms such as "lower", "upper", and the like may be used to more readily describe one element's relationship to another element as illustrated in the figures. Such terms are intended to include not only the meanings indicated in the drawings, but also other meanings or operations of the device in use. For example, if the device in the figures is turned over, elements described as "below" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "under" and "beneath" all include above and below. The device may be rotated 90 or other angles and the terminology representing relative space is also to be interpreted accordingly.
Although not defined differently, including technical and scientific terms used herein, all terms have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Terms defined in commonly used dictionaries are to be additionally interpreted as having meanings consistent with those of related art documents and the contents of the present prompts, and must not be excessively interpreted as having ideal or very formulaic meanings unless defined.
The ray tracing principle employed in the embodiment of the present application is, in brief, to emit a beam of light to a scene from the position of a camera through a pixel position (or a sampling position) on an image plane, find a closest intersection point between the light and a geometric figure, and then find the coloring of the intersection point. If the intersection point is made of a reflective material, tracking can be continued in the reflection direction at the intersection point. Ray tracing is not limited to triangles as a unit of geometry, in addition to easily supporting some global lighting effects. Any geometry that can calculate the intersection point with a ray can be supported. If there is no intersection point, the light ray exits the plane and the tracking is finished. With the recent rise of general purpose computing of GPU, microsoft dxr (directx raytracing) has standardized the ray tracing process to a great extent, making it possible to accelerate hardware in a large scale. However, the existing scheme cannot deal with the problem of ray divergence very well, and because a large number of objects made of different materials exist in a scene, each ray can generate a large number of divergences after traversal, so that the operation efficiency can be influenced.
The ray tracing processor provided by the first embodiment of the present application can adopt different computing units to perform operations according to a large number of different materials existing in a scene, so as to reduce the problem of efficiency reduction caused by divergence of ray directions and materials.
Embodiments of the present application are described below with reference to the drawings. As shown in fig. 1, the ray tracing processor 1 according to the embodiment of the present application includes a global task allocation unit 10, a plurality of ray processing controllers 20, a global cache unit 30, and a task return unit 40. The global task allocation unit 10 is coupled to each of the ray processing controllers 20, and configured to receive the ray tracing tasks sent by the main processor 2, and reasonably allocate the ray tracing tasks according to the working states of the plurality of ray processing controllers 20. Specifically, the global task allocation unit 10 may equally allocate the ray tracing tasks to the respective ray processing controllers 20 according to the workload statuses of the respective ray processing controllers 20. The number of the light processing controllers 20 can be two as shown in fig. 1, and of course, can be increased or decreased according to specific requirements to meet different application requirements. The main processor 2 may be a graphic processor GPU, a central processing unit CPU or other level processor, and is not limited in particular here. The embodiment of the application is only exemplified by a GPU. Additionally, ray tracing tasks include, but are not limited to, ray intersection computing tasks.
The ray processing controller 20 comprises a local task allocation unit 21, a plurality of types of computation units including a general purpose computation unit 22 and a ray acceleration unit 23, a local cache unit 24. The local task allocation unit 21 is configured to receive the ray tracing task sent by the global task allocation unit 10, and allocate the ray tracing task to an adaptive computing unit according to a ray intersection computing task in the task to perform ray intersection computing. Specifically, the Ray intersection calculation tasks include various types, such as Ray and Triangle intersection calculation (Ray Triangle Test), Ray and transparent object intersection calculation, and Ray and Non-Triangle intersection calculation (Ray Non-Triangle Test). In addition, the adaptation in the adapted computing unit refers to the type adaptation of the intersection object of the light intersection computation and the computing unit, the intersection object includes a triangle, a transparent object and a non-triangle, and the computing unit includes a general computing unit 22 and a light acceleration unit 23. The light acceleration unit 23 includes, but is not limited to, a Ray-Box interaction Accelerator (RBIA), a sealing Ray-Box interaction Accelerator (WTRTIA). Specifically, for further triangle intersection calculations, tasks may be submitted to WTRTIA for operation. The intersection calculation of the light and the transparent object and the intersection calculation of the light and the non-triangle can be submitted to the general calculation unit 22 for operation. The ray processing controller further includes a local cache unit 24, and when the ray processing controller 20 completes the calculation of the ray tracing task, the local cache unit 24 is configured to cache the calculation result of the ray tracing task to the global cache unit 30. The global cache unit 30 is coupled to each of the ray processing controllers 20 for caching the processing results of the ray tracing tasks processed by each of the ray processing controllers. The ray tracing processor 1 further comprises a task return unit 40, the task return unit 40 being coupled to the global cache unit 30. The cached processing result is sent to the initiating end of the ray tracing task, i.e. the main processor 1, by the task returning unit 40. In some embodiments, the task returning unit 40 is further coupled to the global task allocating unit 10, and the global task allocating unit 10 can return the ray tracing task to the main processor 1 for processing through the task returning unit 40, that is, the global task allocating unit 10 can allocate the task to the ray tracing processor 1 for processing and can also send the task back to the main processor 1 for processing, thereby increasing the flexibility of the processing manner, improving the overall efficiency of the ray tracing, and reducing the ray tracing cost.
In some embodiments, the ray tracing processor further comprises a pre-processing unit. As shown in fig. 2, the preprocessing unit 50 is coupled to the global task allocation unit 10 for performing data parsing and data conversion on the received ray tracing task, and converting the data into a form that is convenient for the ray processing controller 20 to process. In some embodiments, the preprocessing unit 50 further includes a reorganizing subunit (not shown) for reorganizing the ray tracing data to reduce resource waste caused by unbalanced ray tracing task calculation.
A second embodiment of the present application further provides a processor chip, including: at least one main processor and the ray tracing processor of the first embodiment of the present application. Specifically, as shown in fig. 1 and fig. 2, the ray tracing processor 1 is coupled to a main processor 2 for receiving and processing the ray tracing task issued by the main processor 2, and returning the calculation result. In the embodiment, there are a plurality of host processors 2, and the plurality of host processors 2 are coupled to the ray tracing processor 1. In some embodiments, the processor chip of the embodiments of the present application further includes a loop cache unit, coupled to the main processor and the ray trace processor, for caching the ray trace task issued by the main processor and providing the ray trace task to the global task allocation unit for allocation. In the second embodiment of the present application, a plurality of main processors are used to perform ray tracing on one ray tracing processor, so that the scene rendering cost can be greatly reduced.
FIG. 3A third embodiment of the present application provides a specific embodiment of a processor chip. As shown in fig. 3, the processor chip includes two main processors 2, and the main processors 2 are Graphics processors GPU (GPU M and GPU N, respectively, in this embodiment), each GPU is composed of N Graphics Processing clusters 21 (GPCs) (GPC 0, GPC1 … GPCn, respectively), and the Graphics Processing clusters 21 belong to a sub-unit of the GPU. Each GPU is configured with a Global Cache 22(Global Cache), a Ray Task manager 23(Ray Task Monitor), and a Ray tracing Task group and Task repacking 24(Ray tracing Task high group Task Repack). After the ray tracing task is repacked by the ray tracing task group and task repacking 24, the ray tracing task is sent to the loop Buffer unit 51(Ring Buffer a and Ring Buffer B), and the ray tracing command processor 52(ray tracing command processor, RTCP) starts to parse the data in Ring Buffer a and Ring Buffer B and converts the data into a form that is favorable for the ray tracing processor to process. After the conversion, the data of the Ray tracing task is sent to a reorganizing subunit 53(Ray Sorter), and the reorganizing subunit 53 receives the data from the Ray tracing command processor 52 and performs a certain reorganization on the part of data to reduce the resource waste caused by the unbalanced Ray tracing task calculation. After the data is completely reassembled, the data is sent to a Global Task Distributor 10 (GTCAD), and the Global Task Distributor 10 distributes the ray tracing Task to each ray processing controller 20 in a balanced manner according to the workload state of each ray processing controller 20. In this embodiment, the number of the Ray processing controllers 20 is two (the Ray processing controller M and the Ray processing controller N), each Ray processing controller 20 includes a Local Task allocation Unit 21(Local Task Distributor), a plurality of types of computation units including a General Purpose computation Unit 22 (GPCU) and a Ray acceleration Unit 23 (including 4 Ray-Box Intersection accelerators 230 (RBIA) and accelerators that perform sealed Ray-Triangle Intersection 231 (wrr)), and the Ray processing controller 20 further includes a Local Cache Unit 24(Local Cache), and when the Ray processing controller 20 completes computation of the Ray tracing Task, the Local Cache Unit 24 caches the computation result of the Ray tracing Task to the Global Cache Unit 30(Global Cache ti). The global cache Unit 30 is coupled to the Memory Hub Unit 60(Memory Hub Unit), and the global cache Unit 30 caches a processing result of each ray processing controller for processing the ray tracing task; the Task returning unit 40(Ray Task Sync and Return Copy) is coupled to the memory hub unit 60, repackages the cached processing results through the Task returning unit 40 and sends the repackaged results to the Ray Task manager 23(Ray Task Monitor) of the graphics processor 2 through a high-speed serial computer expansion bus (PCIe) or an on-chip SRAM (on-chip SRAM), the Task returning unit 40 is further coupled to the Global Task Distributor 10(Global Task manager and Distributor, GTCAD), the Global Task Distributor 10 can Return the Ray trace tasks to the GPU for processing through the Task returning unit 40, that is, the Global Task Distributor 10 can distribute the tasks to the Ray trace processor 1 for processing and also to the GPU for processing, thereby increasing the flexibility of the processing mode and improving the overall efficiency of the Ray trace. And the ray task manager 23 repackages the shaders according to the ID of the shaders to finish scene rendering. When the shader rendering is finished, no new ray tracing intersection request occurs, and the ray tracing task is completed. The ray tracing processor provided in the third embodiment of the present application can adopt different computing units to perform operations according to a large number of different materials in a scene, so as to reduce the problem of efficiency reduction caused by divergence of ray directions and materials.
A fourth embodiment of the present application provides a ray tracing method applied to the ray tracing processor of the above embodiments, including the steps of: receiving the ray tracing tasks and distributing the ray tracing tasks according to the working states of the ray tracing controllers; distributing the distributed ray tracing tasks to the adaptive computing units for ray intersection computing according to the ray intersection computing tasks; and acquiring a processing result corresponding to each ray tracing task and returning the processing result to the initiating end of the ray tracing task.
As shown in fig. 4, the method specifically includes the following steps:
step 401: and receiving the ray tracing tasks and distributing the ray tracing tasks according to the working states of the ray tracing controllers.
In particular, the Ray tracing process may be divided into several stages, which are together referred to as a pipeline (pipeline) in the general sense, where the pipeline (pipeline) includes a fixed pipeline and a programmable pipeline, where the fixed pipeline performs some fixed tasks, such as vertex fetching or culling, vertex data interpolation, etc., the modules can only be controlled by some configuration parameters, and the programmable pipeline allows a user to process data (vertices, planes, or pixels) by using a custom shader (shader), and there are many different types of shaders in the application Programming interface API (application Programming interface) of the DXR, such as Ray generation shaders (Ray generation shaders), which are responsible for initializing rays, are entries of the whole DXR programmable pipeline, and a Ray is emitted to a scene by calling the TraceRay () function and the whole Ray tracing process is started, specifically, the shader may submit a ray intersection request (ray intersection request) via the TraceRay () function, the ray including a data structure of an origin, a direction, and a distance interval, where the origin is the viewpoint and the interval defines the range where intersection of the ray with the surface of an object in the scene may occur, for example, too far a distance may be ignored. A ray is defined by the user and thus this data structure can be modified. The Ray information may be sent to a Ray Task manager (Ray Task Monitor) of the GPU via lightscape (ls), and the Ray Task manager is configured to generate a relevant instruction for restoring a corresponding shader editing entry and for activating the Ray tracing coprocessor according to the Ray information. Specifically, the ray information needs to generate corresponding ray tracing data through a preprocessing unit, where the preprocessing unit includes a Ray Tracing Command Processor (RTCP), and after the ray information is written into a circular buffer (ring buffer), the RTCP starts to parse the data in the circular buffer and convert the data into a format that is favorable for processing by the ray tracing processor, so as to obtain the data of the ray tracing task of the scene to be rendered. In some embodiments, the preprocessing unit further includes a reorganization subunit, and the reorganization unit is configured to reorganize the data processed by the parsing and converting unit to reduce a decrease in system efficiency caused by unbalanced light testing. Ray tracing task data acquired by RTCP determines how tasks are distributed according to the actual state of the ray processing controller through a global task director and distributor (GTCAD).
Step 402: and distributing the distributed ray tracing tasks to the adaptive computing units for ray intersection computing according to the ray intersection computing tasks.
Specifically, there are many objects with different materials in a rendering scene, and at this time, different shaders are needed to render the objects with different materials, so that a method is needed to select the corresponding shader to render the object after the light is applied to the object. One path of the global distribution Unit GTCAD is distributed to a local distribution Unit (local distributor) of the Ray tracing processor to distribute and calculate a task to a corresponding Ray-and-bounding Box Intersection Accelerator (RBIA), for further performing a Ray tracing data of a Triangle Intersection Test (Ray-and-bounding Box Intersection Accelerator), for a transparent object or a non-Triangle Intersection Test, the local distribution Unit (local distributor) submits the task to a General purpose computing Unit (General purpose computer Unit) for processing, and when all the Ray tracing Intersection tests are completed, a final result is written into a global cache Unit (global cache) to be processed next.
Step 403: and acquiring a processing result corresponding to each ray tracing task and returning the processing result to the initiating end of the ray tracing task.
After intersection calculation of Ray tracing is completed, data written into the global cache unit is packaged and sent to the Ray task manager through a task return unit (Ray task Sync and return Copy), the Ray task manager reconstructs a shader table representing the corresponding relation between the shader and objects of different materials according to the identity code of the shader, and scene rendering is completed according to the shader table.
Specifically, the ray task manager reconstructs a shader table (ShaderTable) according to the identification codes of the shaders, the shader table records the shaders of the objects in the scene, and when rays are shot onto an aggregate in the scene, the corresponding shaders can be searched through the shader table (ShaderTable), so that the rendering efficiency is improved. Ray tracing ends when no new ray intersections occur. The embodiment of the application can greatly reduce the influence of efficiency reduction caused by divergence of light direction and material, optimize power, performance and area (PPA) under the same light speed measurement condition, reduce repeated reading of the accelerator, improve the overall efficiency of ray tracing and reduce cost.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (10)

1. A ray tracing processor, comprising: the system comprises a global task allocation unit, a plurality of light ray processing controllers, a global cache unit and a task return unit;
the global task allocation unit is coupled to each ray processing controller and used for receiving the ray tracing tasks and allocating the ray tracing tasks according to the working states of the ray processing controllers; the ray tracing task comprises a ray intersection calculation task; each of the ray processing controllers includes: the system comprises a local task allocation unit and a plurality of types of computing units; the local task allocation unit is used for receiving the ray tracing task of the global task allocation unit and allocating the ray tracing task to the adaptive computing unit according to the ray intersection computing task to perform ray intersection computing; wherein the adaptation refers to the type adaptation of an intersection object of the light intersection calculation and the calculation unit;
the global cache unit is coupled to each ray processing controller and used for caching the processing result of each ray processing controller for processing the ray tracing task;
the task returning unit is coupled to the global task allocation unit and the global cache unit, and configured to obtain a processing result corresponding to each ray tracing task and return the processing result to an originating end of the ray tracing task.
2. The ray tracing processor of claim 1, wherein the ray processing controller further comprises a local cache unit, the local cache unit being configured to cache the calculation results of the ray tracing task to the global cache unit.
3. The ray tracing processor of claim 1, wherein the plurality of types of computational units comprise a general purpose computational unit and a ray acceleration unit.
4. The ray tracing processor of claim 1, further comprising a preprocessing unit coupled to the global task allocation unit for performing data parsing and data conversion on the received ray tracing task.
5. The ray tracing processor of claim 4, wherein the preprocessing unit further comprises a reorganization subunit configured to reorganize the ray tracing data.
6. The ray tracing processor of claim 1, wherein the global task allocation unit is further coupled to the task return unit.
7. A processor chip, comprising: at least one main processor; the ray tracing processor of claims 1-6, coupled to the at least one host processor, for receiving and processing ray tracing tasks issued by the at least one host processor and returning the computation results.
8. The processor chip of claim 7, wherein the main processor comprises a graphics processor.
9. A device terminal, characterized in that it comprises the processor chip of the above claim 7 or 8.
10. A ray tracing method applied to the ray tracing processor according to any one of claims 1-6, the ray tracing method comprising:
receiving the ray tracing tasks and distributing the ray tracing tasks according to the working states of the ray tracing controllers;
distributing the distributed ray tracing tasks to the adaptive computing units for ray intersection computing according to the ray intersection computing tasks;
and acquiring a processing result corresponding to each ray tracing task and returning the processing result to the initiating end of the ray tracing task.
CN202110629352.3A 2021-06-07 2021-06-07 Ray tracing processor, processor chip, equipment terminal and ray tracing method Active CN113344766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110629352.3A CN113344766B (en) 2021-06-07 2021-06-07 Ray tracing processor, processor chip, equipment terminal and ray tracing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110629352.3A CN113344766B (en) 2021-06-07 2021-06-07 Ray tracing processor, processor chip, equipment terminal and ray tracing method

Publications (2)

Publication Number Publication Date
CN113344766A true CN113344766A (en) 2021-09-03
CN113344766B CN113344766B (en) 2022-09-06

Family

ID=77474306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110629352.3A Active CN113344766B (en) 2021-06-07 2021-06-07 Ray tracing processor, processor chip, equipment terminal and ray tracing method

Country Status (1)

Country Link
CN (1) CN113344766B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529444A (en) * 2022-04-22 2022-05-24 南京砺算科技有限公司 Graphics processing module, graphics processor and graphics processing method
CN115640138A (en) * 2022-11-25 2023-01-24 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for ray tracing scheduling

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102074038A (en) * 2010-12-28 2011-05-25 长春理工大学 Method for drawing surface caustic effect of 3D virtual scene generated by smooth surface refraction
CN103049927A (en) * 2013-01-17 2013-04-17 浙江大学 Real-time ray tracing and rendering method based on GPU (Graphics Processing Unit) aggregate
CN103942825A (en) * 2008-09-10 2014-07-23 柯斯提克绘图公司 Ray tracing system architectures and methods
CN104200508A (en) * 2014-08-19 2014-12-10 山东大学 Light ray tracing acceleration method based on Intel multiple core framework peer mode
CN109389666A (en) * 2018-09-29 2019-02-26 吉林动画学院 Distributed Real-time Rendering device and method
US20190197761A1 (en) * 2017-12-22 2019-06-27 Advanced Micro Devices, Inc. Texture processor based ray tracing acceleration method and system
CN110796588A (en) * 2018-08-02 2020-02-14 辉达公司 Simultaneous computation and graph scheduling
US20200174829A1 (en) * 2018-12-04 2020-06-04 Imagination Technologies Limited Buffer Checker for Task Processing Fault Detection
CN111402388A (en) * 2020-04-03 2020-07-10 山东大学 Light parallel intersection method based on many-core processor and light path tracking system
CN112734892A (en) * 2021-01-12 2021-04-30 北京卓越电力建设有限公司 Real-time global illumination rendering method for virtual cable tunnel scene model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942825A (en) * 2008-09-10 2014-07-23 柯斯提克绘图公司 Ray tracing system architectures and methods
CN102074038A (en) * 2010-12-28 2011-05-25 长春理工大学 Method for drawing surface caustic effect of 3D virtual scene generated by smooth surface refraction
CN103049927A (en) * 2013-01-17 2013-04-17 浙江大学 Real-time ray tracing and rendering method based on GPU (Graphics Processing Unit) aggregate
CN104200508A (en) * 2014-08-19 2014-12-10 山东大学 Light ray tracing acceleration method based on Intel multiple core framework peer mode
US20190197761A1 (en) * 2017-12-22 2019-06-27 Advanced Micro Devices, Inc. Texture processor based ray tracing acceleration method and system
CN110796588A (en) * 2018-08-02 2020-02-14 辉达公司 Simultaneous computation and graph scheduling
CN109389666A (en) * 2018-09-29 2019-02-26 吉林动画学院 Distributed Real-time Rendering device and method
US20200174829A1 (en) * 2018-12-04 2020-06-04 Imagination Technologies Limited Buffer Checker for Task Processing Fault Detection
CN111402388A (en) * 2020-04-03 2020-07-10 山东大学 Light parallel intersection method based on many-core processor and light path tracking system
CN112734892A (en) * 2021-01-12 2021-04-30 北京卓越电力建设有限公司 Real-time global illumination rendering method for virtual cable tunnel scene model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ISSAC D. SCHERSON .ETAL: "Multiprocessing for ray tracing: a hierarchical self-balancing approach", 《THE VISUAL COMPUTER 》 *
易法令 等: "基于BSP的光线跟踪并行处理研究", 《长江大学学报》 *
黄涛: "光线追踪的OpenCL加速实现研究", 《计算机与现代化》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529444A (en) * 2022-04-22 2022-05-24 南京砺算科技有限公司 Graphics processing module, graphics processor and graphics processing method
CN114529444B (en) * 2022-04-22 2023-08-11 南京砺算科技有限公司 Graphics processing module, graphics processor, and graphics processing method
CN115640138A (en) * 2022-11-25 2023-01-24 摩尔线程智能科技(北京)有限责任公司 Method and apparatus for ray tracing scheduling

Also Published As

Publication number Publication date
CN113344766B (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US11715251B2 (en) Neural network model trained using generated synthetic images
US20200211252A1 (en) Cluster of scalar engines to accelerate intersection in leaf node
CN113344766B (en) Ray tracing processor, processor chip, equipment terminal and ray tracing method
US20230084570A1 (en) Accelerating triangle visibility tests for real-time ray tracing
US20200211264A1 (en) Apparatus and method for ray tracing with grid primitives
CN110956687A (en) Apparatus and method for cross-instance, front-to-back traversal of ray-traced heavily instantiated scenes
JP2021149941A (en) Motion blur using dynamic quantization grid
CN111402389A (en) Early termination in bottom-to-top accelerated data structure trimming
US11847733B2 (en) Performance of ray-traced shadow creation within a scene
US20200211259A1 (en) Apparatus and method for acceleration data structure refit
US20220198746A1 (en) Reservoir-based spatiotemporal importance resampling utilizing a global illumination data structure
US11010963B2 (en) Realism of scenes involving water surfaces during rendering
US11069095B1 (en) Techniques for efficiently sampling an image
US20240095993A1 (en) Reducing false positive ray traversal in a bounding volume hierarchy
US20240095995A1 (en) Reducing false positive ray traversal using ray clipping
US11908064B2 (en) Accelerated processing via a physically based rendering engine
US20200211267A1 (en) Cell primitive for unstructured volume rendering
US20240095994A1 (en) Reducing false positive ray traversal using point degenerate culling
US11853764B2 (en) Accelerated processing via a physically based rendering engine
US11704860B2 (en) Accelerated processing via a physically based rendering engine
US11830123B2 (en) Accelerated processing via a physically based rendering engine
US11875444B2 (en) Accelerated processing via a physically based rendering engine
Barladyan et al. The use of coherent ray tracing for physically accurate rendering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant