CN101826215B

CN101826215B - Real-time secondary ray tracing concurrent rendering method

Info

Publication number: CN101826215B
Application number: CN2010101505664A
Authority: CN
Inventors: 许端清; 杨鑫; 赵磊; 葛蓉
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2010-04-19
Filing date: 2010-04-19
Publication date: 2012-05-09
Anticipated expiration: 2030-04-19
Also published as: CN101826215A

Abstract

The invention discloses a real-time secondary ray tracing concurrent rendering method, which comprises the following steps: (1) by constructing an octree, partitioning the scene data of the model space to be rendered; (2) when the proportion of the effective rays in a ray packet is higher than the threshold, carrying out step (3), otherwise, carrying out step (5); (3) traversing the ray packet along the three coordinate axes of the model space; (4) orderly carrying out intersection testing on ray subpackets and facets in the leaf node to obtain a ray having an intersection relationship with the facets; (5) carrying out intersection testing on each ray in the ray packet to obtain a triangular facet which is nearest to the starting point of the ray and intersects with the ray; and (6) according to the material type of the model to be rendered in the model space, classifying the rays in the ray subpackets, and orderly rendering. The concurrent rendering method can reduce the ray traversal time and the time required by intersection by effectively utilizing the concurrent computation capability of hardware.

Description

Real-time secondary ray tracing concurrent rendering

Technical field

The present invention relates to graph in real time and play up technical field, relate in particular to a kind of concurrent rendering method of the secondary ray tracing based on multicore architecture.

Background technology

Virtual reality technology has become means more and more important in the industrial design field.Especially in some large-scale high-end engineering projects, like industries such as automobile, aircraft manufacturings, because the real physical mask cost of manufacturing is very high, so these projects have brought into use digitizing technique to carry out design effort.Because some best effects can only appear in the real-time design of Interaction, for example, in the car light design, some high light effects can only appear under the specific viewpoint position or illumination condition, so real-time is a basic demand of industrial design.In order to realize real-time, what current virtual reality system adopted basically is the rendering technique of rasterisation (rasterization).But because the limitation of this algorithm self; Be difficult to provide the picture of high realism to industrial design person; For example, the rasterisation rendering technique can't be drawn out the multipath reflection, refraction of light, reflection (self-reflection) and camber reflection effects such as (curved reflection) certainly accurately, the design of mistake so just possibly occur; And then cause expensive manufacturing expense, reduce the efficient of industrial design.

Ray tracking technology is a kind of technology in the widespread use of graphic plotting field.Compare rasterization technique, ray tracking technology has a lot of advantages, comprises the automatic calculating of object observability, and pel quantity is the sublinear relation in time complexity and the scene, is more suitable for parallel architecture processing etc.But because ray tracking technology simulation is the physical pathway of light in the real world, so its maximum advantage is to produce high-quality image, draws out level and smooth reflection, refraction, the soft shadow global illumination effects such as (soft shadow) of high realism.Whitted uses ray tracking technology to simulate the global illumination effect the earliest.Because ray tracking technology need spend very high calculated amount, therefore this in the past technology can only be applied in the non real-time field of drawing.

Current, the growth rate that the computing power of computer hardware is exponential form has made interactively real time ray tracing become possibility.Modern high-performance hardware parallel architecture mainly contains two important characteristics: the parallel computation of a plurality of nuclears, and the mode of operation of interior SIMD (the single instruction multi data) form of each nuclear.Ray trace algorithm itself is exactly the algorithm that is fit to parallel processing, therefore can well utilize these new characteristics of high-performance hardware.On the one hand, NVIDIA G80/G92 framework has made the performance of GPU and programmability improve greatly, and its programming platform CUDA (Compute Unified Device Architecture) makes GPU become the general processor of a highly-parallelization in essence.In the NVIDIA G80/G92 framework a plurality of nuclears are arranged, in each nuclear a plurality of stream handles are arranged, these stream handles are carried out a plurality of threads simultaneously with SIMD (saying exactly, should be the SPMD form, i.e. single program multi data) form.On the other hand; The raising of processor clock frequency is also pursued in the design of modern CPU no longer simply; But beginning integrated more its processing core on a chip; Each is endorsed to carry out the parallel processing of data through the SIMD instruction simultaneously, and like the SSE of Intel, the Altivec instruction set of IBM/Motorola, the up-to-date SSE4 instruction set of Intel has made the processing width of SIMD instruction reach 8; And the Larrabee framework that Inter is about to issue will adopt more process nuclear, make SIMD handle width simultaneously and reach 16.

On the other hand, with an algorithm be mapped under the SIMD data processing mode processing obviously than algorithm is divided into a plurality of threads be assigned to a plurality of nuclear dealing with complicated many, and this complexity can be along with the increase of the width of SIMD and increase.Current, But most of algorithms mainly is to be 4 design to the SIMD width, order execution on a process nuclear, so algorithm design is relatively easy.But the development trend of following hardware parallel architecture will be to integrate more process nuclear; The SIMD width is increasing simultaneously; How to use these new features of hardware effectively and apply it in the ray trace algorithm; Making the computing power of hardware obtain effectively utilizing to the full extent, is the problem that the ray trace field presses for solution.

Design a kind of high-quality method of acceleration structure construct fast; Make it can make full use of the powerful parallel processing capability of polycaryon processor; Thereby the accelerating structure construction speed further being accelerated, satisfied the requirement of real-time, is a key that can the ray trace algorithm effectively be carried out.The structure of hierarchical structure can not a plurality of nuclears of good use parallel processing capability.The subject matter that wherein exists is; The top-down recurrence make of hierarchical structure can produce a binary tree structure usually; Can only generate a small amount of node at the construction phase initial stage; Be difficult to utilize computation capability of hardware fully, thereby cause the poor efficiency of hardware to use, and the delay of memory access has also caused difficulty to structure.

Traditional Octree structure is divided scene according to the centre position of object on three vertical coordinate axle x, y, the z direction; Although this dividing mode is fast simple; But its coarse quality has caused a large amount of invalid traversals and has intersected operation; Also caused a large amount of empty nodes and wasted storage space, made Octree be fabricated the higher kd tree of quality gradually and replace.The kd tree uses the SAH strategy to confirm optimum cut-point usually, yet a large amount of calculating makes the structure of kd tree need some times, is difficult to satisfy the real time ray tracing calculation requirement of dynamic scene.On the other hand; Current GPU framework comprises a plurality of polycaryon processors; Need move the computing power that up to ten thousand threads just can make full use of these processors simultaneously; And accelerating structures such as kd tree have only a spot of node to supply process nuclear to use at the beginning of it makes up, and have wasted the computational resource of GPU greatly, and then have influenced desin speed.

Summary of the invention

The present invention provides a kind of real-time secondary ray tracing concurrent rendering, can effectively utilize computation capability of hardware, improves the rendering efficiency of algorithm.

A kind of real-time secondary ray tracing concurrent rendering comprises:

(1) through making up Octree, the contextual data in the model space is divided;

On three coordinate axis of the model space that needs are played up (orthogonal X axle, Y axle and Z axle); Choose several sampling cut-points respectively; Calculate SAH (the surface area heuristic) cost (method of the assessment accelerating structure the best cutting point that MacDonald etc. propose is shown in formula (a)) of each sampling cut-point in document " Heuristics for ray tracing using space subdivision.Visual Computer; 1990. ", choose the minimum cut-point of SAH cost on each coordinate axis then; These three cut-points of choosing as segmentation plane, are divided the contextual data in the model space.

With the formed bounding box of whole scene as root node; Divide for the first time and can obtain eight node; Again child node is divided with same method then; According to this construction process of mode recurrence of breadth-first, successively divide, until being divided into no more than 10 of the contained dough sheet number of each node.

The make that the present invention adopts can produce mass data rapidly and supply thousands of GPU thread to use, and makes them keep the state of operating at full capacity always; Secondly, because we carry out dividing based on the selection of SAH synchronously, make that the octree structure quality after this improvement is higher on three dimensions; In addition, the degree of depth of the Octree after the improvement is more shallow, has significantly reduced invalid traversal and has intersected operation, especially is fit to the traversal of big light bag.

By the powerful computation capability of hardware; The advantage that the Octree accelerating structure that the present invention adopts has condensed traditional accelerating structure: first; Compare the BVH structure; We just intersect light and divisional plane when carrying out traversing operation, rather than intersect with bounding box, have reduced the number of times of crossing calculating; The second, because use is the SAH strategy, the traversing operation of having guaranteed us is a kind of orderly traversal, promptly when intersecting operation, in case produce intersection point, thinks that then this intersection point is first intersection point of ray intersection, and traversing operation can stop immediately.

Use formula (a) to calculate each potential cut-point, we also need know contained dough sheet number of each child node and surface area thereof in addition.Wald etc. calculate these quantity through using ordering; Sorting operation for fear of costliness; We use bin method (POPOV S.et al.:Experiences with Streaming Construction of SAH KD-Trees.In Proceedings of the 2006 IEEE Symposium on Interactive Ray Tracing (Sept.2006), pp.89-94.1,3; 4,6) to reduce the use of bandwidth.Along with the intensification of structure level, the data that process nuclear institute will handle obviously reduce, and make that to calculate the time that SAH spends shorter, thereby make construction speed faster.

The SAH cost of sampling cut-point

C_{P} = K_{T} + \frac{KI}{SA (N)} [n_{l} SA (N_{l}) + n_{r} SA (N_{r})], - - - (a)

Here, n _lAnd n _rRepresent the left and right child node contained dough sheet quantity adjacent, SA (N respectively with current sampling cut-point _l), SA (N _r) surface area of the expression left and right child node adjacent with current sampling cut-point respectively, SA (N) representes the surface area (getting 0 when not having father node) of the father node of current sampling cut-point, K _TExpression travels through the cost (getting 0 when not having father node) that is produced, K to the father node of current sampling cut-point _IExpression is intersected the cost that operation is produced to current sampling cut-point.

K _TAnd K _IImplication be that computer hardware is handled traversal or intersected operation consumed time, K _TAnd K _IAbsolute value can artificially set, for example can set K _T=10, K _I=20; The cost of the intersecting operation cost greater than traversing operation is described, but 10,20 might not require to represent the real processing time, only represent both relativenesses.

We choose optimum cut-point, make resulting SAH cost C on this aspect _PIf minimum is perhaps K _IThe n cost is littler, at this moment n=n _l+ n _r, n is the contained dough sheet quantity of present node.

In order to utilize computation capability of hardware efficiently, we further enlarge concurrency aspect two when realizing:

On the one hand in top-down construction process; Each node is divided into eight new nodes by rule; Can independently carry out and do not rely on other node because these cut apart work; Therefore we can give the work of cutting apart of these nodes a plurality of nuclears and handle simultaneously fully, to accelerate desin speed, can hide the delay of memory access simultaneously through the switching of cutting apart task.We can be provided with a formation and the pending node of cutting apart calculating such as be used for depositing; Accomplished when a process nuclear like this and calculated cutting apart of a node later on and can in this formation, obtain new work at once, the desire split node that also will just produce is simultaneously put into formation.

Because current GPU framework is not also supported storage coherence, for fear of the synchronization overhead of using lock mechanism to bring, we are provided with two formations and write down these positions, and a formation is used for keeping in father node information, and another formation is used for keeping in child node information.We utilize the mapping relations of father and son's node; The position of child node in the child node formation that is father node k should be 8*k+t (t=0; 1 ... 7), we can calculate shared memory (shared memory) lining in the speed in the thread block (block) faster as for the concrete value of t.After the whole calculating of the child node of current level finished, we used squeeze operation (compaction) that empty node is removed, and form new father node formation.

When using the SAH strategy that node is cut apart calculating, we use multithreading with this operation parallelization on the other hand.When carrying out SAH calculating; Suppose on each coordinate axis, to get p sampling cut-point; So for the minimum the best cutting point of calculation cost; We will carry out 3p calculating altogether on three dimensions, owing to these calculating are just handled different data with same operation, so we can operate parallel processing with these through the SIMD processing power of process nuclear fully.At last, when these cut-points calculating were all accomplished, we can use reduction operation (reduction) to find out the cut-point with minimum cost.

(2) three of the ray trace process Main Stage; The traversal (travel), pel that is accelerating structure intersects (intersection), painted (shader) can regard a condition series of operations as, has only the light of the test of having passed through the previous stage just can enter into the next stage and proceeds to calculate.We are applied to whole secondary light bag with the condition sequence; Through the test of these conditions, we can remove those in each stage of ray trace not through the light of test, use the compactness operation (compact) of hardware supports then; To move on to together through the light of test; Produce fast and can get into the light bag of next test phase, thereby guarantee the similarity of light in the light bag to the full extent, and then carry out the multi-threaded parallel operation efficiently.

When beginning to play up, at first judge the ratio of effective sunlight in the employed light bag, when ratio is higher than the threshold value that sets, carry out the operation of step (3), otherwise carry out the operation of step (5);

Effective sunlight in the described light bag is meant the light with identical act of execution, and described act of execution is meant carries out traversing operation, intersects operation or shading operations same node data.

(3) octree structure set up based on step (1) of light bag travels through along three coordinate axis of the model space;

A) the X axle of light bag along the model space traveled through:

All child nodes that will be positioned at X axle segmentation plane homonymy are defined as X roller node; Obtain two X roller nodes; All child nodes that for example will be positioned at X axle segmentation plane one side are defined as X roller node A, and all child nodes of X axle segmentation plane opposite side are defined as X roller Node B; Described X axle segmentation plane is perpendicular to the X axle of the model space, that plane of when promptly making up Octree the X axle being cut apart.

The light bag is done crossing test with the bounding box of two X roller nodes successively according to light going direction; When the bounding box of light bag and first X roller node has crossing light; To intersect light and generate sub-light bag; Otherwise the light bag continued to do with the bounding box of second X roller node intersect test, and will intersect light and generate sub-light bag.

Because the light bag is to I haven't seen you for ages and the bounding box of one of them X roller node has crossing light; So will inevitably obtain a light bag; If crossing light is arranged, just no longer do to intersect for the bounding box of second X roller node so and tested with the bounding box of first X roller node.

B) the Y axle of light bag along the model space traveled through:

Being surrounded by in that X roller node that intersects light of step a) with light; All child nodes that will be positioned at Y axle segmentation plane homonymy are defined as Y roller node; Obtain two Y roller nodes; All child nodes that for example will be positioned at Y axle segmentation plane one side are defined as Y roller node A, and all child nodes of Y axle segmentation plane opposite side are defined as Y roller Node B; Described Y axle segmentation plane is perpendicular to the Y axle of the model space, that plane of when promptly making up Octree the Y axle being cut apart.

The sub-light bag that obtains of step a) is done crossing test with the bounding box of two Y roller nodes successively according to light going direction; When the bounding box of group light bag and first Y roller node has crossing light; To intersect light and generate sub-light bag; Otherwise sub-light bag continued to do with the bounding box of second Y roller node intersect test, and will intersect light and generate sub-light bag.

Because the light bag is to I haven't seen you for ages and the bounding box of one of them Y roller node has crossing light; So will inevitably obtain a light bag; If crossing light is arranged, just no longer do to intersect for the bounding box of second Y roller node so and tested with the bounding box of first Y roller node.

C) the Z axle of light bag along the model space traveled through:

Being surrounded by in that Y roller node that intersects light of step b) with light; All child nodes that will be positioned at Z axle segmentation plane homonymy are defined as Z roller node; Obtain two Z roller nodes; All child nodes that for example will be positioned at Z axle segmentation plane one side are defined as Z roller node A, and all child nodes of Z axle segmentation plane opposite side are defined as Z roller Node B; Described Z axle segmentation plane is perpendicular to the Z axle of the model space, that plane of when promptly making up Octree the Z axle being cut apart.

The sub-light bag that obtains of step b) is done crossing test with the bounding box of two Z roller nodes successively according to light going direction; When the bounding box of group light bag and first Z roller node has crossing light; To intersect light and generate sub-light bag, intersect test, and will intersect light and generate sub-light bag otherwise sub-light bag continued to do with the bounding box of second Z roller node; Because the light bag is to I haven't seen you for ages and the bounding box of one of them Z roller node has crossing light; So will inevitably obtain a light bag,, just no longer do to intersect for the bounding box of second Z roller node so and tested if crossing light is arranged with the bounding box of first Z roller node.

So far, in step (1), the model space is divided in eight node that obtain for the first time, can confirm that one all is surrounded by a node that intersects light with light bag or sub-light in step a), step b) and step c); To this child node repeating step a), the operation of step b) and step c); Run into leaf node until sub-light bag, if before sub-light bag runs into leaf node, when the ratio of effective sunlight is less than or equal to threshold value in the sub-light bag; Then carry out the operation of step (5)

(4) will run in sub-light bag and the leaf node of leaf node dough sheet successively (by the storage order of dough sheet in storage area) do to intersect and test, obtain having the light of overlapping relation with dough sheet;

Can adopt Boolean to write down which light and pass through crossing test and promptly have crossingly, write down these crossing information simultaneously with dough sheet.

(5) every light in the light bag is handled respectively as follows:

With the mode of breadth-first according to node level by low paramount (beginning) by root node; Each the doing with the bounding box of N node simultaneously of light intersected test; When running into leaf node; With light simultaneously and the tri patch in N the leaf node do and intersect test, all tri patchs have all been accomplished crossing test in this leaf node, obtain with the light starting point nearest and with the tri patch of ray intersection.

N is the SIMD width, i.e. the number of the parallel computation unit of computer hardware support, the i.e. breadth extreme of data parallel processing.

The material type of the model of (6) playing up according to model space needs,

To giving corresponding painted code, play up according to the material type of model through the light that intersects test in the step (4);

To classifying through the light in the sub-light bag that intersects test in the step (5), give corresponding painted code respectively, play up successively according to the material type of model.

Be that the part that material is identical on the model is played up simultaneously, carry out playing up of other material parts afterwards again, do not have requirement as for the pairing order of playing up of material type.

Although adopt the light packet technology significantly to reduce calculated amount and bandwidth use; But prerequisite is to carry out SIMD operation efficiently; If have only an effective sunlight in the light bag; Promptly have only a light to pass through test, adopt the running time of light packet technology will be also longer than the running time of the single ray that does not adopt packet technology so, so in the step (2) process of ray trace is divided into two stages: the quantity shared ratio in whole light bag with effective sunlight in the light bag be a foundation; When this ratio is higher than certain value (for example 50%), adopt the light packet technology; When this ratio was lower than this value, we had used another kind dough sheet data parallel processing mode efficiently.

The inventive method is adaptive; It is without any need for the presort operation of costliness; Need not use the strategy of light similarity in the assessment light bag yet; Whether the order to initial light also has no requirement, just in the ray trace process, accelerating structure and scene solid are carried out same operation and generated the light bag automatically according to the light in each stage.We make light number in the initial light bag much larger than the SIMD width of hardware, thereby guarantee that each as far as possible operation can both find abundant light, satisfy the requirement of SIMD width, keep the high efficiency of SIMD operation.

Description of drawings

Fig. 1 is that the inventive method and SIMD light bag method are to the contrast of GPU effective rate of utilization along with the increase of the Octree traversal degree of depth.

Embodiment

Select 4 nuclear CPU that are furnished with an Intel Xeon 3.7GHz, the PC of a NvidiaGTX285 (1G video memory) realizes our this algorithm.Use the CUDA programming framework of Nvidia company, it provides a general C DLL for GPU calculates, and makes things convenient for programmer to use some new ardware features.

The present invention supplies a plurality of process nuclear to carry out parallel processing efficiently in order to produce abundant data at the acceleration structure construct initial stage; Use a kind of improved Octree building method; Adopt the mode of breadth-first, the minimum cut-point of cost on each coordinate axis is chosen in the SAH of calculating sampling cut-point cost respectively on three coordinate axis then; These three cut-points of choosing as segmentation plane, disposablely eight nodes have been generated.When calculating the SAH cost, need the pel quantity at two ends, computed segmentation plane, the present invention adopts the binning method to reduce the use of bandwidth.Along with the intensification of structure level, the data that process nuclear institute will handle obviously reduce, and make that to calculate the time that SAH spends shorter, thereby make construction speed faster.

In order to utilize computation capability of hardware efficiently, the present invention has enlarged concurrency aspect two when realizing:

1) in top-down construction process; Node is divided into eight new nodes by rule; Can independently carry out and do not rely on other node because these cut apart work; Therefore can the work of cutting apart of these nodes be given a plurality of nuclears and handles simultaneously,, can hide the delay of memory access simultaneously through the switching of cutting apart task to accelerate desin speed.We are provided with a formation and the pending node of cutting apart calculating such as are used for depositing; Accomplished when process nuclear like this and can in this formation, obtain new work at once after calculating cutting apart of a node, desiring of also will just having produced simultaneously cut apart node and put into formation;

2) when using the SAH strategy that node is cut apart calculating, use the SIMD mode with this operation parallelization.When carrying out SAH calculating; Suppose on each coordinate axis, to produce k accurate cut-point; So for the cut-point of compute optimal; We will carry out 3k calculating altogether, and owing to these calculating are just handled different data with same operation, so we can operate parallel processing with these through the SIMD processing power of process nuclear fully.At last, when these cut-points calculating are all accomplished, use the reduction operation (reduction) of hardware supports to find out cut-point with minimum cost.

Three Main Stage of ray trace process; The traversal (travel), pel that is accelerating structure intersects (intersection), painted (shader) can regard a condition series of operations as, has only the light of the test of having passed through the previous stage just can enter into the next stage and proceeds to calculate.We are applied to whole secondary light bag with the condition sequence; Through the test of these conditions, we can remove those in each stage of ray trace not through the light of test, use the compactness operation (compact) of hardware supports then; To move on to together through the light of test; Produce fast and can get into the light bag of next test phase, thereby guarantee the similarity of light in the light bag to the full extent, and then carry out the multi-threaded parallel operation efficiently.

Although adopt the light packet technology significantly to reduce calculated amount and bandwidth use; But prerequisite is to carry out SIMD operation efficiently; If have only an effective sunlight in the light bag; Promptly have only a light to pass through test, adopt the running time of light packet technology will be also longer than the running time of the single ray that does not adopt packet technology so, so we are divided into two stages with the process of ray trace: the quantity shared ratio in whole light bag with effective sunlight in the light bag be a foundation; When this ratio is higher than certain value, adopt the light packet technology; When this ratio was lower than this value, the present invention used another kind data parallel processing mode efficiently.

1) technological based on the ray tracing of light bag: in order to save storage space, we only write down the ID of light, when needs light data, can use the read operation at random (gather) of hardware supports to carry out accessing operation at random.

1. traversal: remain high efficiency calculating in order to make hardware, make light number in the initial light bag much larger than the SIMD width of hardware supports.The node bounding box of light bag and accelerating structure is done test, and all form new sub-light bag through the light of test through the compact operation, proceed traversal or intersect operation.

2. intersect: we intersect the Boolean that the test back produces with light bag and pel and show that which light has passed through crossing test, simultaneously these crossing information are operated (scatter) through the random write of hardware supports and write back among the buffer that is used for storing light information and go.

3. painted: as test condition, the light in the sub-light bag of generation will be carried out identical painted code (shader) with the material type.The light continuation through this test is not that test condition is tested with other material type, has all carried out painted code up to all light.

2) node/tri patch data SIMD parallel computation: different with the ray tracing technology based on the light bag, we only follow the trail of single ray, and this light and N (establishing N is the SIMD width) individual different nodes bounding box or tri patch are intersected.Although this method is than light bag method poor efficiency, we think that it can obtain good performance for the extremely low secondary light of similarity.

1. travel through: the light in the light bag is carried out traversing operation respectively; Adopt level traversal mode to postpone to reduce memory access; Every each while of light is done test with the bounding box of N node; All nodes up to this layer have all been accomplished test, and the Boolean that returns has shown that whether test is successful, is provided with this layer of list records all node through test simultaneously.

2. intersect: when running into leaf node; Light is done crossing test with N tri patch simultaneously; All accomplished test up to all tri patchs of this intranodal, and record is through the relevant information of the tri patch of test, so that decision and the nearest tri patch of light starting point.

Selection has the test scene of different geometry complexity, and Bunny, Toys, Conference are as the test model file, and the resolution of each test scene is 1024*1024.In experiment; In order fully to obtain the dissimilarity of light parlor light, we make light reflection 5 times by force, owing to Bunny, Toys scene are not sealed; Possible some light reflection does not also reach 5 times and has just penetrated the scape of leaving the theatre; This also becomes a kind of similarity of light in a sense, and is as shown in table 1

Table 1

Under different light order of reflection, method of the present invention and SIMD light bag method are to the GPU hardware utilization.

The GPU utilization factor possibly go up not down in several secondary reflections in back, and this problem just is being described.We can contrast from figure, and the inventive method is less relatively to the performance boost that simple model obtained, and reason is that the dough sheet of these naive models is all bigger, so the similarity of light is higher in the light bag, make the necessity of data recombination reduce.

Use of the influence of table 2 Mingguang City line bag size to the inventive method.

To the Toys scene, to have listed respectively when Thread Count is set to 64,256,512 in to the thread block block of CUDA, light bag size is 16*16,32*32, during 64*64, the frame per second performance that adopts our self-adaptation light bag generation method to be drawn out.It is visible by table 2,

The inventive method has obtained best performance when the block Thread Count is set to 256 and light bag size during for 64*64.Because resources such as register, shared memory all are limited under the CUDA framework, if Thread Count was provided with conference and causes thread can't obtain enough resources and can not start in the thread block.On the other hand, if in the light bag light number be provided with too small, may be owing to finding enough light to make new light bag to generate, and then cause the GPU effective rate of utilization to reduce with similarity; And if the light number is provided with excessively in the light bag, generate light Bao Shihui at tissue and produce GPU operations such as too much compaction, scatter, gather, and these operations are more consuming time with similarity, therefore can reduce the execution performance of algorithm.

In order further to verify the parallel use ability of the inventive method to hardware; We have write down that each layer at Octree travels through or the utilization factor of scalar processor when intersecting operation, and it has reflected that directly can our self-adaptation light bag method of generationing and adaptive optical line following process develop the executed in parallel ability of algorithm on hardware to greatest extent.Notice that we do not use the testing standard of the operating position of ALU as us, even if because thread slot is occupied sometimes, but ALU also possibly used because of the poor efficiency of memory access delay or SIMD fully.At the Octree traversal initial stage; The advantage of the inventive method is also not obvious, as shown in Figure 1, compares SIMD light bag method; The inventive method is through using adaptive light bag generation method; Improve the similarity of light in the light bag, and then when Octree is traveled through, can guarantee higher GPU utilization factor always.Intensification along with the traversal degree of depth; Possibly there is not abundant similarity light to form new light bag in the light bag; At this moment the adaptive optical line following process that we designed will adopt another kind data parallel processing mode efficiently; Promptly only follow the trail of single ray, this light is intersected with N (establishing N is the SIMD width) individual different node bounding box or tri patch.Visible by Fig. 1, this makes our method in the traversal later stage of Octree, still the GPU computing unit is being kept the higher effective utilization factor.

Claims

1. a real-time secondary ray tracing concurrent rendering is characterized in that, comprising:

(1) through making up Octree; The contextual data of the model space that needs are played up is divided: three coordinate axis in the model space that needs are played up are on orthogonal X axle, Y axle and the Z axle; Choose several sampling cut-points respectively, calculate the SAH cost of each sampling cut-point, choose the minimum cut-point of SAH cost on each coordinate axis then; These three cut-points of choosing as segmentation plane, are divided the contextual data in the model space; With the formed bounding box of whole scene as root node; Divide for the first time and can obtain eight node; Again child node is divided with same method then; According to this construction process of mode recurrence of breadth-first, successively divide, until being divided into no more than 10 of the contained dough sheet number of each node;

The ratio of effective sunlight when ratio is higher than threshold value, is carried out the operation of step (3), otherwise is carried out the operation of step (5) in the light bag that uses when (2) judgement is played up;

(3) travel through in the Octree that the light bag has been built in (1) along three coordinate axis of the model space;

A) the X axle of light bag along the model space traveled through:

All child nodes that will be positioned at X axle segmentation plane homonymy are defined as X roller node; Obtain two X roller nodes that lay respectively at X axle segmentation plane both sides; The light bag is done crossing test with the bounding box of two X roller nodes successively according to light going direction; When the bounding box of light bag and first X roller node has crossing light; To intersect light and generate sub-light bag, intersect test, and will intersect light and generate sub-light bag otherwise the light bag continued to do with the bounding box of second X roller node;

B) the Y axle of light bag along the model space traveled through:

Being surrounded by in that X roller node that intersects light of step a) with light; All child nodes that will be positioned at Y axle segmentation plane homonymy are defined as Y roller node; Obtain two Y roller nodes that lay respectively at Y axle segmentation plane both sides; The sub-light bag that obtains of step a) done with the bounding box of two Y roller nodes according to light going direction successively intersect test, when the bounding box of group light bag and first Y roller node has crossing light, will intersect light and generate sub-light bag; Otherwise sub-light bag continued to do with the bounding box of second Y roller node intersect test, and will intersect light and generate sub-light bag;

C) the Z axle of light bag along the model space traveled through:

Being surrounded by in that Y roller node that intersects light of step b) with light; All child nodes that will be positioned at Z axle segmentation plane homonymy are defined as Z roller node; Obtain two Z roller nodes that lay respectively at Z axle segmentation plane both sides; The sub-light bag that obtains of step b) done with the bounding box of two Z roller nodes according to light going direction successively intersect test, when the bounding box of group light bag and first Z roller node has crossing light, will intersect light and generate sub-light bag; Otherwise sub-light bag continued to do with the bounding box of second Z roller node intersect test, and will intersect light and generate sub-light bag;

In step (1), the model space is divided in eight node that obtain for the first time; Confirm that one all is surrounded by a node that intersects light with light bag or sub-light in step a), step b) and step c); To this child node repeating step a), the operation of step b) and step c), run into leaf node until sub-light bag, if before sub-light bag runs into leaf node; When the ratio of effective sunlight is less than or equal to threshold value in the sub-light bag, then carry out the operation of step (5);

(4) will run into dough sheet in sub-light bag and the leaf node of leaf node and take turns doing and intersect test, obtain having the light of overlapping relation with dough sheet;

(5) every light in the light bag is handled respectively as follows:

Mode with breadth-first is paramount by hanging down according to node level; Each the doing with the bounding box of plurality of nodes simultaneously of light intersected test; When running into leaf node; With light simultaneously and the tri patch in several leaf nodes do and intersect test, all tri patchs have all been accomplished crossing test in this leaf node, obtain with the light starting point recently and with the tri patch of ray intersection;

The material type of the model of (6) playing up according to model space needs to giving corresponding painted code through the light that intersects test in the step (4), is played up according to the material type of model; To classifying through the light in the sub-light bag that intersects test in the step (5), give corresponding painted code respectively, play up successively according to the material type of model.

2. concurrent rendering method as claimed in claim 1; It is characterized in that; On orthogonal X axle, Y axle and the Z axle of the model space that needs are played up, the minimum cut-point of cost on each coordinate axis is chosen in the SAH of calculating sampling cut-point cost respectively then in the step (1); These three cut-points of choosing as segmentation plane, are divided the contextual data in the model space.

3. concurrent rendering method as claimed in claim 2 is characterized in that, in the step (3), described X axle segmentation plane is that plane of when making up Octree the X axle being cut apart; Described Y axle segmentation plane is that plane of when making up Octree the Y axle being cut apart; Described Z axle segmentation plane is that plane of when making up Octree the Z axle being cut apart.