CN109543358A - The ray tracing acceleration system and KD tree output method of the upper KD tree of GPU - Google Patents

The ray tracing acceleration system and KD tree output method of the upper KD tree of GPU Download PDF

Info

Publication number
CN109543358A
CN109543358A CN201910025229.3A CN201910025229A CN109543358A CN 109543358 A CN109543358 A CN 109543358A CN 201910025229 A CN201910025229 A CN 201910025229A CN 109543358 A CN109543358 A CN 109543358A
Authority
CN
China
Prior art keywords
array
module
node
tree
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910025229.3A
Other languages
Chinese (zh)
Other versions
CN109543358B (en
Inventor
吴宪云
王康
李云松
赵罡
苏丽雪
孙力
司鹏辉
郑为
申珅
雷杰
王柯俨
吕维
孙乃葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910025229.3A priority Critical patent/CN109543358B/en
Publication of CN109543358A publication Critical patent/CN109543358A/en
Application granted granted Critical
Publication of CN109543358B publication Critical patent/CN109543358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a kind of ray tracing acceleration system of KD tree on GPU and KD tree output methods, solves the problems, such as the sound field intensity Fast Prediction to underwater complex target, system is had by the module that signal connection direction successively includes: video memory preliminery application, data prediction, the generation of KD tree acceleration search, the generation of virtual aperture face, ray tracing, integral and interface display module.It is additionally arranged video memory preliminery application module, and accelerates node partition process in KD tree acceleration search generation module.KD tree output method realizes step: input data;Subdivision judgement;Calculate split surface;Auxiliary array assignment simultaneously scans;Subdivision present node;Array is exported to complete to contribute.The present invention can make a sound to any triangle face-units field intensity calculating.Using the communication process of present invention emulation sound ray, it is not necessary that face element judgement is blocked, calculating speed is fast, and precision is high, to the adaptable of different target.For the Fast Prediction emulation to submarine target intensity.

Description

The ray tracing acceleration system and KD tree output method of the upper KD tree of GPU
Technical field
The invention belongs to technical field of acoustics, it is related to the emulation technology that the quick sound field intensity of marine acoustics calculates, specifically one The ray tracing acceleration system and KD tree output method of KD tree on kind GPU, for the Fast Prediction for submarine target.
Background technique
Now, sound field characteristic of the submarine target under various postures is accurately calculated, the target of the underwater ordnances such as torpedo is examined It surveys and national defense construction has great significance.With the development of torpedo and sonar technology, it is desirable that special to the sound field intensity of target Property calculate precision it is higher.The method of present analysis underwater target echo characteristic mainly has based on Kirchhhoff approximation method Plate member method, and introduce the improvement plate member method of Gordon integration method.And both methods is in target surface face element In the case where blocking two-by-two, will do it complicated shadowing leads to computationally intensive and low efficiency.
Fan Jun, soup Weihe continuous heavy rain etc. were published in non-patent literature " the sonar target echo on fluid and structural acoustic in 2012 It is proposed in the plate member method of Prediction on Characteristics " and the plate member method for calculating radar cross section is introduced into sonar target strength Method in calculating.It is approximate multiple with one group of plane plate member in application Kirchhoff approximate calculation water when target scattering sound field Miscellaneous shape objects curved surface, approximation that the is scattering acoustic field of all plate members and being exactly total scattering sound field.By by single plate The integral of member turns to algebraical sum and avoids Line Integral operation so that planar element method is improved than common face element integration method calculating speed Manyfold.The case where for sonar, the method was generalized to Target near field and non-rigid surface's echoing characteristic is forecast.But this method It will appear the case where integral denominator is zero when plate member space coordinate divides, thus can by the target strength that face element is calculated There can be singular value, it is unstable so as to cause calculating.
The non-patent literature that Sun Naiwei, Li Jianchen etc. are published in torpedo technique for 2016 is " based on improvement planar element method Submarine Target forecast of intensity emulation " in propose it is a kind of when calculating target strength can to avoid generation singular value method.It should Integral denominator may be zero when method calculates Submarine Target intensity for planar element method, ask so as to cause calculated result is unstable The forecast that Gordon integral algorithm is used for Submarine Target intensity is emulated, and simplifies face element shadowing for complex target by topic Process.Therefore the calculated result of this method is not in singular value, and more stable.But due to sentencing to blocking face element It is disconnected, cause this method when more target is blocked in calculating, calculated result is inaccurate.
In the prior art, not high enough for the forecast calculation precision of submarine target, the calculating time is not able to satisfy real-time and wants It asks.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, that the present invention provides a kind of computational accuracies is high, and calculating speed is fast The ray tracing acceleration system and KD tree output method of the upper KD tree of GPU.
The present invention is a kind of ray tracing acceleration system of KD tree based on GPU first, according to signal connection direction include and It is connected with data preprocessing module, acceleration search generation module, virtual aperture face generation module, ray tracing module, integral mould Block, interface display module, which is characterized in that be initially equipped with video memory preliminery application module in system, the acceleration search generates mould Block is KD tree acceleration search generation module, and each module, which is interconnected together to constitute, realizes ray based on GPU with KD tree acceleration search Tracking, is described below each module:
Video memory preliminery application module: the input of the module is the quantity of the Triangular object model of triangle face-units, in the module Portion according to the quantity of Triangular object model calculate Triangular object model bounding box model array shared by video memory size, it is primary in advance at the end CPU Property application GPU on video memory space, which is at least enough Lothrus apterus and generates KD tree, and the output of the module is preliminery application The initial address of video memory;
Data preprocessing module: on GPU, using the video memory of video memory preliminery application module application, to inputting the three of the module Each Triangular object model of edged surface meta-model constructs bounding box parallel, and initializes current incident angle, exports bounding box pattern number Group;
KD tree acceleration search generation module: the module is the parallel generation on GPU, accelerates ray tracing mistake on KD tree Journey, dynamic uses the video memory of video memory preliminery application module application in a manner of the access of address, accelerates the generation of KD tree, data prediction The bounding box model array exported in module is the input of the module, the two-way synchronism output of the module, all the way for corresponding to this The KD tree of bounding box model array, another way are the bounding box model array reorganized, which directly makees For the input of ray tracing module;
Virtual aperture face generation module: the input of the module be data preprocessing module output bounding box model array and Current incident angle calculates the boundary value for surrounding BOX Model, and the parallel generation ray tube on GPU, the mould in the module The output of block is the ray tube of several identical sizes;
Ray tracing module: the input of the module is respectively ray tube and the acceleration of virtual aperture face generation module output The two-way synchronism output of generation module is searched for, which is synchronous input, accelerates the ray of ray tube to chase after parallel on GPU Track, ray tracing process are completed by traversal KD tree, and the output of the module is reflection of the ray tube after traversing KD tree Information;
Gordon integration module: Gordon integration module receives the reflective information of tracing module output, which uses Gordon integral formula integrates ray tube on GPU parallel, using reduction operation to the integral of all ray tubes As a result it is summed parallel, the output of the module is the algebraical sum of all ray tube integrated values under current angular;
Interface display module: the integrated value and relevant information of Gordon integration module output are shown on interface, are convenient for Information inspection and debugging.
The present invention or a kind of KD leaf nodes inner bounding volume array unify output method, which is characterized in that including just like Lower step:
1) it input data and initializes: input Triangular object model bounding box model array and two auxiliary arrays, supplementary number The length of group and the bounding box array of every layer of KD tree are isometric, divide the bounding box array in present node, initialization for assisting Root node is as present node;
2) it terminates subdivision judgement: judging whether present node meets and terminate subdivision condition, current node address position Zhong Bao Whether the bounding box quantity contained, which is less than, terminates subdivision threshold value, if satisfied, step 7) is executed, if not satisfied, executing step 3);
3) calculate best split surface position: point-score or SAH method calculate best split surface position and subdivision axis in use;
4) it assists array assignment: being divided using two auxiliary array auxiliary, according to best split surface position and subdivision axis, position Corresponding position is designated as 1 in first auxiliary array by the bounding box on the left of best split surface;On the right side of best split surface Bounding box, corresponding position is designated as 1 in second auxiliary array;It is auxiliary at two by the bounding box of best split surface cutting It helps in array and corresponding position is designated as 1;
5) scanning auxiliary array: calculating the initial memory address when array in present node copies to left and right child node, And obtain location index of each element in left and right child node corresponding address in array;
6) present node subdivision subdivision present node: will be worked as into prosthomere according to location index for left and right two child nodes Point inner bounding volume array copies in the corresponding position of left and right node storage address, updates left and right child node relevant information, and It is entered step left and right child node as present node 2), is again introduced into the building of KD tree and loops to determine;
7) it forms output array: present node being labeled as leaf node, the array in the address of leaf node storage is added It is recorded in leaf node to output array tail portion, and by address of the array in leaf node in output array, leaves ray for and chase after Track module uses;
8) it terminates subdivision process: after all nodes all terminate subdivision, obtaining the high quality KD that a building is completed Bounding box model array after tree, and reorganization.
The present invention, which solves, has the more concave surface target blocked for surface in the prior art, and the face element that is blocked needs to count Complicated hiding relation is calculated, and then leads to that calculation amount increases significantly and computational efficiency is lower, is underwater complex Target Sound Field intensity Quick and precisely calculating provide technological means.
Compared with prior art, technical advantage of the invention:
Computational accuracy is high: taking the face element that is blocked relative to traditional plate member and improvement plate member calculating side and abandons strategy Caused by accuracy decline, the present invention for surface have the more concave surface target blocked, use ray tracing technique obtain quilt The sound wave of face element reflection is blocked, therefore higher computational accuracy can be obtained using system of the invention;
KD tree building speed is fast: only the SAH calculating section of KD tree building process accelerated using GPU in the prior art, and Do not accomplish that complete GPU accelerates, the present invention also accelerates the process that father node is split into child node using GPU, reduces The time of KD tree building on GPU, the present invention have disposably applied for that before construct KD tree, enough Lothrus apterus construct showing for KD tree Space is deposited, in the form of address access instead of the more dynamic application of time-consuming, therefore the achievement in the present invention during contributing Method can accelerate the building speed of KD tree;
Video memory shared by KD tree construction is small: due to not saving bounding box array inside KD tree node, only including in preservation node Initial address of the bounding box array in video memory, accesses according to address when in use, therefore, shared by the KD tree that the present invention constructs Video memory is smaller.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of the ray tracing acceleration system of the KD tree based on GPU in the present invention;
Fig. 2 is that a kind of KD leaf nodes inner bounding volume array unifies output method flow chart;
Fig. 3 KD tree acceleration search generation module Inner Constitution schematic diagram;
Fig. 4 is that the bounding box array in present node is divided to left and right child node and terminates subdivision schematic diagram;
Fig. 5 is triangle face-units schematic diagram used in embodiment 7;
Fig. 6 is the sound field intensity curve graph using the present invention to model emulation shown in Fig. 5;
Fig. 7 is the sound ray pipe that used time and generation are calculated the ray tracing module of model emulation shown in Fig. 5 using this system Beam number curve figure.
Specific embodiment
With reference to the accompanying drawing to the detailed description of the invention
Embodiment 1:
Ray tracing technique is widely used in picture rendering, and the fields such as electromagnetic field intensity calculating, ray tracing technique is also It can be applied to the quick and precisely forecast to underwater complex Target Sound Field intensity, forecast analogue system in traditional submarine target sound field In, what is used is all that plate member and improvement plate member method carry out emulation forecast to submarine target.Traditional technology hides surface More target is kept off, needs to carry out a large amount of face element shadowing, and the face element being blocked is abandoned, so as to cause computational accuracy There is certain loss.With the continuous development of GPU technology, it is possibly realized on GPU using some more quick algorithms.This hair It is bright to expand research for this status, a kind of ray tracing acceleration system of KD tree based on GPU is proposed, according to signal connection side To including and be connected with data preprocessing module, acceleration search generation module, virtual aperture face generation module, ray tracing mould Block, integration module, interface display module, referring to Fig. 1, the present invention is equipped with video memory preliminery application mould before data preprocessing module Block, acceleration search generation module is KD tree acceleration search generation module in the present invention, and each module of system is interconnected together to constitute Ray tracing is realized with KD tree acceleration search based on GPU, each module is described below:
Video memory preliminery application module: the input of the module is the quantity of the Triangular object model of triangle face-units, in the module Portion according to the quantity of Triangular object model calculate Triangular object model bounding box model array shared by video memory size, it is primary in advance at the end CPU Property application GPU on video memory space, which is at least enough Lothrus apterus and generates KD tree, and the output of the module is preliminery application The initial address of video memory.
Data preprocessing module: on GPU, using the video memory of video memory preliminery application module application, to inputting the three of the module Each Triangular object model of edged surface meta-model constructs bounding box parallel, and initializes current incident angle, exports bounding box pattern number Group.
KD tree acceleration search generation module: the module is the parallel generation on GPU in the present invention, accelerates to penetrate on KD tree Line tracing process, dynamic uses the video memory of video memory preliminery application module application in a manner of the access of address, accelerates the generation of KD tree, Input of the bounding box model array exported in data preprocessing module for the module, the two-way synchronism output of the module, all the way For the KD tree corresponding to the bounding box model array, another way is the bounding box model array reorganized, and the two-way is synchronous Export the input directly as ray tracing module.
Virtual aperture face generation module: the input of the module be data preprocessing module output bounding box model array and Current incident angle calculates the boundary value for surrounding BOX Model, and the parallel generation ray tube on GPU, the mould in the module The output of block is the ray tube of several identical sizes.
Ray tracing module: tracing module is ray tracing module in the present invention, and the input of the module is respectively virtual aperture The ray tube of diametric plane generation module output and the two-way synchronism output of acceleration search generation module, three tunnel are synchronous input, The module accelerates the ray tracing of ray tube parallel on GPU, and ray tracing process is by using virtual aperture face mould block What the KD tree that the ray tube traversal KD tree acceleration search generation module of generation generates was completed, the output of the module is ray tube Reflective information after traversing KD tree-model.
Gordon integration module: Gordon integration module receives the reflective information of tracing module output, which uses Gordon integral formula integrates ray tube on GPU parallel, using reduction operation to the integral of all ray tubes As a result it is summed parallel, the output of the module is the algebraical sum of all ray tube integrated values under current angular.
Interface display module: the integrated value and relevant information of Gordon integration module output are shown on interface, are convenient for Information inspection and debugging.
The present invention change incident angle, repeat call virtual aperture face generation module, ray tracing module, integration module, Interface display module completes integral calculation and display to triangle face-units 0-180 degree.
Due in Target Sound Field Strength Simulation forecast system under water, to the speed of submarine target sound field intensity emulation forecast And precision has higher requirement, the present invention improves the forecast precision of conventional method using ray tracing technique, is tied using KD tree Structure accelerates the search process of ray tracing, while the building of KD tree and ray tracing process being accelerated using GPU, The speed of ray tracing is further improved, to improve the speed and precision to the emulation forecast of submarine target sound field intensity. , will be in the array organization of unity in leaf node to output array in the building of KD tree, it is right during ray tracing both to have facilitated The merging of data accesses, and the video memory for also saving KD tree occupies.
Embodiment 2:
The ray tracing acceleration system of KD tree based on GPU is with embodiment 1, the preparatory Shen of video memory preliminery application module in the present invention Video memory space size please is (DPI)+2F+ (PI)
Wherein, D is the maximal tree depth estimated, and P is the coefficient of expansion, and I is the big of video memory shared by bounding box array in root node Small, F is the size of video memory shared by an auxiliary array, which is that every layer of KD tree is assigned with size all as the aobvious of (PI) Space is deposited, while being to export the video memory space that array application size is (PI).The present invention is in subdivision father node, according to cuing open The bounding box model array of father node is copied in the video memory block for belonging to child node, therefore is copying parallel by facet and subdivision axis During shellfish, video memory access conflict will not occur;In addition, the setting of the coefficient of expansion also ensures the triangle divided by split surface Face element bounding box being placed into two child nodes by safety.
It in the prior art, is dynamic application video memory space in the building process of KD tree, this process is more time-consuming.This hair Bright video memory preliminery application module has disposably applied for the video memory space of enough Lothrus apterus building KD trees before constructing KD tree, with ground The form of location access is instead of dynamic application more time-consuming during contributing, therefore the present invention can accelerate the building of KD tree Speed.
Embodiment 3:
The ray tracing acceleration system of KD tree based on GPU with embodiment 1-2,
In the prior art, after the completion of KD tree constructs, the index of data is stored in the leaf node of KD tree, due to SAH algorithm can stop subdivision, therefore the index of intra-node data when node inner bounding volume quantity is greater than subdivision threshold value in advance The memory needed is uncertain.In the present invention, address segment information of the array in output array is only saved in leaf node, together When each leaf node in array be in output array it is continuous, access data can be merged during ray tracing, are accelerated The process of ray tracing.
The process that KD tree acceleration search generation module of the present invention generates KD tree is completed parallel on GPU, KD of the present invention Tree acceleration search generation module unifies output method using KD leaf nodes inner bounding volume array and generates KD tree, referring to Fig. 3, to KD The submodule being equipped in tree acceleration search generation module is described below:
Condition distinguishing submodule: the submodule receives the array of data preprocessing module output first, according to number of elements Judgement determines that present node data are into node partition submodule or to terminate subdivision submodule, terminates subdivision if entered Module then stops the subdivision of present node.If sequentially entering child node inner bounding volume later into node partition submodule Array generates submodule, child node information update submodule, finally again returns to condition distinguishing submodule and carries out again to child node Secondary judgement.The data of circulation receiving node subdivision submodule feedback later, iterative cycles judgement, successively construct KD tree construction, directly All enter to all nodes and terminates subdivision submodule.
Node partition face computational submodule: the input of the submodule is the output of condition distinguishing submodule, the submodule root tuber Split surface position and the subdivision axis that present node is calculated according to the data received, are split into two child nodes for a father node, Include the partial information of father node in child node, and the data that calculated result and the submodule receive are output to child node Inner bounding volume array generates submodule.
Big node partition unit: according to the split surface of calculation of group dividing present node in the input use of node partition submodule Position and subdivision axis.
Minor node subdivision unit: according to the split surface position for calculating present node using SAH method of node partition submodule And subdivision axis.
Child node inner bounding volume array generates submodule: the input of the submodule is the defeated of node partition face computational submodule Out, according to input, present node inner bounding volume array is divided in two child nodes of present node in GPU, it is parallel defeated Two child nodes being born update submodule to two sub- nodal informations respectively.
Child node information update submodule: the submodule is the identical submodule arranged side by side of two structures, and respective input is Child node inner bounding volume array generates a sub- array of submodule output, and the child node of present node first inherits present node Then hint value updates the relevant information of child node according to the subnumber group received, and using child node as present node and son Array is exported to condition distinguishing submodule and is rejudged;It is again introduced into the building cyclic process of KD tree.
Subdivision terminates submodule: the input of the submodule is the data of condition distinguishing submodule output, the submodule termination of a block The subdivision of present node inner bounding volume array, and the subnumber group that present node includes is exported into output array, it waits all Node disposably exports after stopping subdivision.
The input of KD tree acceleration search generation module is the bounding box pattern number of data preprocessing module output in the present invention The input of group, KD tree acceleration search generation module initially enters the condition distinguishing submodule of the inside modules, in condition distinguishing Judge the input and output of condition distinguishing submodule to node partition submodule or subdivision terminating submodule in module;If condition Differentiate that the output of submodule enters subdivision and terminates submodule, then stops the subdivision of present node;If condition distinguishing submodule is defeated Enter node partition submodule out, is then first calculated in node partition submodule, then by the output of node partition submodule The input of submodule is generated as child node inner bounding volume array, child node inner bounding volume array generates submodule and adds on GPU Fast-growing and exports two sub- array individually entering respectively as condition distinguishing submodule at the process of subnumber group, parallel to carry out Node partition terminates submodule until entering subdivision.
Embodiment 4:
The ray tracing acceleration system of KD tree based on GPU with embodiment 1-3,
In the prior art, it during KD tree father node is split into child node, needs first to exist in father node internal array It is ranked up on the direction of subdivision axis, array is copied to left and right child node respectively further according to best split surface position after sequence In.Since the time complexity of sort algorithm is higher, the whole building time is influenced whether.The time is constructed to shorten KD tree, this Invention is not ranked up the array inside father node, but utilizes the parallel accelerating performance of GPU, according to best split surface position It sets, only judge that the array in father node is to copy to left child node? still right child node is copied to? or copy to simultaneously it is left, In right two child nodes.Substantially reduce the data mobile time.
The child node inner bounding volume array, which generates submodule, to be executed on GPU, to child node inner bounding volume array The unit being equipped in submodule is generated to be described below:
Scanning element: the input of the unit is the output of node partition submodule, and use includes scan operation on GPU, Parallel computation goes out index of two child node inner bounding volume array elements in present node inner bounding volume array, and index is protected There are on the corresponding position of auxiliary array of video memory preliminery application module application and export give array indexing unit.
Array indexing unit: the input of the unit is the index of scanning element output, and the parallel index on GPU will be current Array in the address of node storage is copied to according to index respectively in the address of two child nodes storage, and present node is stored The array for including in address is divided in the address of two child nodes storage, is then exported to condition two child nodes respectively and is sentenced Small pin for the case module is rejudged.
Only the SAH calculating section of KD tree building process is accelerated using GPU in the prior art, there is no save to array from father The process that point copies to child node accelerates on GPU.The present invention is in child node subdivision submodule, by the way that scanning element is arranged, Location index when father node inner bounding volume array copies to left and right child node is precomputed, array indexing unit is reused According to index the data in array are copied to respectively in left and right child node, the Parallel Implementation father node inner bounding volume on GPU Array copies to the process of left and right child node, accelerates the building of KD tree.
Embodiment 5:
The present invention or a kind of KD leaf nodes inner bounding volume array unify output method, are in the KD tree based on GPU The unified output that leaf node inner bounding volume is realized in KD tree acceleration search generation module in ray tracing acceleration system, is based on The ray tracing acceleration system of the KD tree of GPU with embodiment 1-4,
Referring to fig. 2, it comprises the following steps that
1) it input data and initializes: input Triangular object model bounding box model array and two auxiliary arrays, supplementary number The length of group and the bounding box array of every layer of KD tree are isometric, divide the bounding box array in present node, initialization for assisting Root node is as present node.
2) it terminates subdivision judgement: judging whether present node meets and terminate subdivision condition, current node address position Zhong Bao Whether the bounding box quantity contained, which is less than, terminates subdivision threshold value, if satisfied, step 7) is executed, if not satisfied, executing step 3).
3) calculate best split surface position: in use point-score or surface area heuritic approach (SAH) calculate best split surface and Subdivision axis.
4) it assists array assignment: being divided using two auxiliary array auxiliary, according to best split surface position and subdivision axis, position Corresponding position is designated as 1 in first auxiliary array by the bounding box on the left of best split surface;On the right side of best split surface Bounding box, corresponding position is designated as 1 in second auxiliary array;It is auxiliary at two by the bounding box of best split surface cutting It helps in array and corresponding position is designated as 1.
5) scanning auxiliary array: calculating the initial memory address when array in present node copies to left and right child node, And obtain location index of each element in left and right child node corresponding address in array.
6) subdivision present node: being left and right two child nodes, the position according to obtained in step 5) by present node subdivision Index, present node inner bounding volume array is copied in the corresponding position of left and right node storage address, updates left and right sub- section Point relevant information, and entered step 2) using left and right child node as present node, it is again introduced into KD tree building circulation and sentences It is disconnected.
7) it forms output array: present node being labeled as leaf node, the array in the address of leaf node storage is added It is recorded in leaf node to output array tail portion, and by address of the array in leaf node in output array, leaves ray for and chase after Track module uses.
8) it terminates subdivision process: after all nodes all terminate subdivision, obtaining the high quality KD that a building is completed Bounding box model array after tree, and reorganization.
Referring to fig. 4, in the present invention, number in present node is accelerated using scanning element and array indexing unit on GPU Group copies to the process in left and right child node;Child node can not again subdivision when, it is unified defeated using KD leaf nodes inner bounding volume Method out, by the unified output of data in node.Due to only saving address segment information of the data in output array in leaf node, The array in each leaf node is continuous in output array simultaneously, therefore the conjunction of data may be implemented during ray tracing And access, accelerate the process of ray tracing.
During achievement, the present invention no longer in the index of node memory storage data, is changed to for array being unifiedly stored to aobvious In the video memory for depositing preliminery application module application, address field of the included array of the node in video memory only is stored up in node memory.Work as section When point is leaf node, the array in leaf node is copied in output array, and records the array in leaf node in leaf node Address field in output array.The present invention can save the size of KD tree shared video memory in GPU;It can be in the packet in node It encloses the number of elements in box array and is greater than the feelings for stopping subdivision process when terminating subdivision threshold value, and being not in data spilling Condition;The ray intersection process of subsequent ray tracing module is allowed to merge the data that access leaf node includes, to accelerate to penetrate The process of line tracking.
The present invention exports the higher KD tree of a quality in a relatively short period of time.KD tree construction is linearly stored in video memory, It is inputted the output array after the root node of KD tree and reorganization as the two-way of ray tracing module, with the side of look-up table Formula uses.
A more detailed example is given below, the present invention is further described:
The ray tracing acceleration system and KD leaf nodes inner bounding volume of KD tree based on GPU unify output method with implementation Example 1-5, referring to Fig. 1:
Video memory preliminery application module: the input of the module is the quantity of the Triangular object model of triangle face-units, in the module Portion according to the quantity of Triangular object model calculate Triangular object model bounding box model array shared by video memory size, it is primary in advance at the end CPU Property application GPU on video memory space, which is at least enough Lothrus apterus and generates KD tree, and the output of the module is preliminery application The initial address of video memory.
Data preprocessing module: on GPU, using the video memory of video memory preliminery application module application, to inputting the three of the module Each Triangular object model of edged surface meta-model constructs bounding box for each Triangular object model parallel, distributes a thread, and calculating should Axial symmetry (AABB) bounding box of Triangular object model, hereinafter referred to as bounding box.And current incident angle is initialized, export bounding box mould Type array and current incident angle.
KD tree acceleration search generation module: the module is the parallel generation on GPU, after the completion of the building of KD tree, in KD tree Upper traversal can greatly accelerate ray tracing process, and during KD tree generates, dynamic is used aobvious in a manner of the access of address The video memory for depositing preliminery application module application accelerates the generation of KD tree, the bounding box model array exported in data preprocessing module For the input of the module, the two-way synchronism output of the module is all the way the KD tree corresponding to the bounding box model array, another way For the bounding box model array reorganized, input of the two-way synchronism output directly as ray tracing module.
Virtual aperture face generation module: the input of the module be data preprocessing module output bounding box model array and Current incident angle calculates the boundary value for surrounding BOX Model, and the parallel generation ray tube on GPU, the mould in the module The output of block is the ray tube of several identical sizes.
Ray tracing module: the input of the module is respectively ray tube and the acceleration of virtual aperture face generation module output The two-way synchronism output of generation module is searched for, which is synchronous input, accelerates the ray of ray tube to chase after parallel on GPU Track, ray tracing process are completed by traversal KD tree, and the output of the module is reflection of the ray tube after traversing KD tree Information.
Gordon integration module: Gordon integration module receives the reflective information of tracing module output, uses Gordon product Divide formula to be integrated parallel on GPU to ray tube, is carried out using integral result of the reduction operation to all ray tubes Parallel summation, the output of the module are the algebraical sum of all ray tube integrated values under current angular.
Interface display module: the integrated value and relevant information of Gordon integration module output are shown on interface, are convenient for Information inspection and debugging.
In video memory preliminery application module, the triangle face-units in this example are the models of submarine, as shown in figure 5, submarine model Size are as follows: 62m × 7.5m × 11m, model is by 147427 nodes, 294850 Triangular object models compositions.Wherein Triangular object model Having a size of 2 λT(λ in this exampleT=5cm);The video memory space size of preliminery application is (DPI)+2F+ (PI)
Wherein, D is the maximal tree depth estimated, and it is the coefficient of expansion that D, which takes 20, P, in this example, and it is root node that M, which takes 10, I, in this example The size of video memory shared by middle bounding box array, F are the size of video memory shared by an auxiliary array, the module be every layer of KD tree all It is assigned with size and is the video memory space of (PI), while being to export the video memory space that array application size is (PI);It is cuing open When dividing father node, according to split surface and subdivision axis, the bounding box model array of father node is copied to the video memory for belonging to child node In block, therefore during parallel copy, video memory access conflict will not occur;In addition, the setting of the coefficient of expansion also ensures The Triangular object model bounding box divided by split surface being placed into two child nodes by safety.
In data preprocessing module, according to the Triangular object model quantity of submarine model, one is opened for each Triangular object model A thread, for calculating the coordinate extreme value (X of Triangular object model in three-dimensional spacemax,Ymax,Zmax) and (Xmin,Ymin,Zmin), Wherein Xmax,Ymax,ZmaxRespectively three vertex of Triangular object model are in X, Y, the maximum value on Z axis, Xmin,Ymin,ZminRespectively Three vertex of Triangular object model are in X, Y, the minimum value on Z axis.By two coordinate extreme values of each Triangular object model, can determine every The size of the compact bounding box of a Triangular object model and position in space;Initializing current incident angle is 0 degree.
KD tree acceleration search generation module is the KD for unifying output method generation using KD leaf nodes inner bounding volume array Tree, is described below the submodule being equipped in KD tree acceleration search generation module:
Condition distinguishing submodule receives the data of data preprocessing module output first, according to present node inner bounding volume Quantity Box_Num judgement determines that present node data are into node partition submodule or to terminate subdivision submodule;Work as Box_ When Num is less than or equal to stop subdivision threshold value Stop_Num, into subdivision submodule is terminated, this example Stop_Num takes 32.If into Enter to terminate subdivision submodule, then stops the subdivision of present node.When Box_Num is less than or equal to stop subdivision threshold value Stop_Num When, into node partition submodule.If sequentially entering child node inner bounding volume array later into node partition submodule Submodule, child node information update submodule are generated, condition distinguishing submodule is finally again returned to and child node is sentenced again It is disconnected.Condition distinguishing submodule recycles the data of receiving node subdivision submodule feedback, and iterative cycles judgement successively constructs KD tree knot Structure, until all nodes, which all enter, terminates subdivision submodule.
The input of node partition face computational submodule is the output of condition distinguishing submodule, and the submodule is according to receiving Data calculate split surface position and the subdivision axis of present node, and a father node is split into two child nodes, is wrapped in child node Partial information containing father node, and the data that calculated result and the submodule receive are output to child node inner bounding volume number Group generates submodule.
Node partition face computational submodule is set there are two unit, determines to use according to the face element quantity N for inputting the submodule Which unit carries out node partition.
When face element quantity is greater than threshold value Big_Node in node, present node is cutd open using big node partition unit Point, this example threshold value Big_Node takes 256, and point-score determines that split surface position and subdivision axis, subdivision axis are node in the unit use The parallel reference axis of inner bounding volume longest edge, split surface position coordinates are the midpoint coordinates value of node inner bounding volume longest edge, Since the process needs to acquire the boundary of node inner bounding volume, ask boundary value that can accelerate using reduction operation at the end GPU Solution procedure.
When in node face element quantity be less than or equal to threshold value Big_Node when using minor node subdivision unit to present node into Row subdivision, the unit determine split surface position and subdivision axis using SAH algorithm.SAH cost function calculation is needed in SAH algorithm The cost value SAH of each candidate's split surfacecost, and selection makes cost value SAHcostThat the smallest candidate split surface is as best Split surface.
SAH cost function are as follows:
Wherein, ctraversalIndicate that ray traverses the cost of present node, SA (V) indicates present node inner bounding volume Surface area, SA (VL) indicate current candidate split surface under left child node inner bounding volume surface area, NLIt indicates in current candidate subdivision The face element quantity of left child node, SA (V are divided under faceR) indicate the surface area of right child node inner bounding volume, NRExpression is divided To the face element quantity of right child node, ChitIndicate that ray and Triangular object model seek the cost value of friendship.
From formula as can be seen that the SAH value calculating of each candidate's split surface independently carries out, so the meter of SAH value Calculation can use GPU and be accelerated, and a block be distributed for each candidate split surface, for calculating the SAH of candidate's split surface Value, and the smallest SAH value and corresponding candidate split surface are found by reduction algorithm at the end GPU.After split surface determines, cut open The vertical reference axis of facet is subdivision axis.
The input that child node inner bounding volume array generates submodule is the output of node partition face computational submodule, according to defeated Enter, present node inner bounding volume array is divided into two sub- arrays, two sub- array difference that parallel output generates in GPU Submodule is updated to two sub- nodal informations;The unit division being equipped in submodule is generated to child node inner bounding volume array such as Under:
Scanning element: the input of the unit is the output of node partition submodule, and use includes scan operation on GPU, Parallel computation goes out index of two child node inner bounding volume array elements in present node inner bounding volume array, and index is protected There are on the corresponding position of auxiliary array of video memory preliminery application module application.
Array indexing unit: the input of the unit is the index of scanning element output, and the parallel index on GPU will be current Array in the address of node storage is copied to according to index respectively in the address of two child nodes storage, and present node is stored The array for including in address is divided in the address of two child nodes storage, is then exported to condition two child nodes respectively and is sentenced Small pin for the case module is rejudged.
Child node information update submodule: the submodule is the identical submodule arranged side by side of two structures, and respective input is Child node inner bounding volume array generates a sub- array of submodule output, and the child node of present node first inherits present node Relevant information, then updates the relevant information of child node according to the subnumber group that receives, and using child node as present node and Subnumber group is exported to condition distinguishing submodule and is rejudged;It is again introduced into the building cyclic process of KD tree.
Subdivision terminates submodule: the input of the submodule is the data of condition distinguishing submodule output, the submodule termination of a block The subdivision of present node inner bounding volume array, and the subnumber group that present node includes is exported into output array, it waits all Node disposably exports after stopping subdivision.
The KD tree and the encirclement by reorganizing that KD tree acceleration search generation module final output one building is completed Box array.Bounding box array by reorganization is to unify output method using KD leaf nodes inner bounding volume array to carry out weight It newly organizes and exports, comprise the following steps that
1) it input data and initializes: input Triangular object model bounding box model array and two auxiliary arrays, supplementary number The length of group and the bounding box array of every layer of KD tree are isometric, divide the bounding box array in present node, initialization for assisting Root node is as present node.
2) it terminates subdivision judgement: judging whether present node meets and terminate subdivision condition, current node address position Zhong Bao Whether the bounding box quantity contained, which is less than, terminates subdivision threshold value, if satisfied, step 7) is executed, if not satisfied, executing step 3).
3) calculate best split surface position: point-score or SAH method calculate best split surface and subdivision axis in use.
4) it assists array assignment: being divided using two auxiliary array auxiliary, first reset auxiliary array, then according to best Split surface position and subdivision axis, the bounding box on the left of best split surface, by corresponding position mark in first auxiliary array It is 1;Corresponding position is designated as 1 in second auxiliary array by the bounding box on the right side of best split surface;By best split surface Corresponding position is designated as 1 in two auxiliary arrays by the bounding box of cutting.
5) scanning auxiliary array: calculating the initial memory address when array in present node copies to left and right child node, And obtain location index of each element in left and right child node corresponding address in array.
6) subdivision present node: being left and right two child nodes, the position rope according to obtained in 5) by present node subdivision Draw, present node inner bounding volume array is copied in the corresponding position of left and right node storage address, update left and right child node Relevant information, and entered step left and right child node as present node 2), it is again introduced into the building of KD tree and loops to determine.
7) it forms output array: present node being labeled as leaf node, the array in the address of leaf node storage is added It is recorded in leaf node to output array tail portion, and by address of the array in leaf node in output array, leaves ray for and chase after Track module uses.
8) it terminates subdivision process: after all nodes all terminate subdivision, obtaining the high quality KD that a building is completed Bounding box model array after tree, and reorganization.
During achievement, the index of element, is changed to data being unifiedly stored to video memory no longer in node memory storage array Preliminery application module is only to store up included data address in video memory in node memory in the video memory of the KD tree node preliminery application of this layer Beginning and end.It, will be in the video memory for the KD tree node preliminery application that video memory preliminery application module is this layer when node is leaf node Data copy to output array in, and leaf node store data inside output array in address.Therefore KD can also be saved Set the size of the shared video memory in GPU;Stop subdivision in advance when present node inner bounding volume quantity is greater than and terminates subdivision threshold value Process is not in the case where array is overflowed;Subsequent ray tracing module can merge the data that access leaf node includes, and accelerate Ray tracing process.
The present invention exports the higher KD tree of a quality in a relatively short period of time.KD tree construction is linearly stored in video memory, It is inputted the output array after the root node of KD tree and reorganization as the two-way of ray tracing module, with the side of look-up table Formula uses.
The process that step 1) generates bounding box is completed parallel on GPU, is that each Triangular object model opens up one on GPU A thread calculates respective bounding box, and is organized into a bounding box array according to the arrangement of the sequence of former Triangular object model array.
If the quantity of present node inner bounding volume is less than or equal to Stop_Num in step 2), enters step and 7) stop currently The subdivision of node;If the bounding box quantity in present node is greater than Stop_Num, enter step 3), using present node as father Node partition is at left and right child node.
If face element quantity is greater than threshold value N (N is 256 in this example) in present node in step 3), it is big for regarding present node Node, point-score determines split surface position and subdivision axis, the subdivision axis coordinate parallel for node inner bounding volume longest edge in use Axis, split surface position coordinates are the midpoint coordinates value of node inner bounding volume longest edge, are wrapped since the process needs to acquire in node The boundary of box is enclosed, therefore asks boundary value that can accelerate solution procedure using reduction operation at the end GPU.
If face element quantity is less than threshold value N in present node, present node is regarded as minor node, is cutd open using the determination of SAH algorithm Facet position and subdivision axis.In SAHcostWe need the generation of each candidate split surface of SAH cost function calculation in SAH algorithm Value, selection make cost value SAHcostThat the smallest candidate split surface is as split surface.
Calculate the function of SAH cost are as follows:
Wherein, ctraversalIndicate that ray traverses the cost of present node, SA (V) indicates present node inner bounding volume Surface area, SA (VL) indicate current candidate split surface under left child node inner bounding volume surface area, NLIt indicates in current candidate subdivision The face element quantity of left child node, SA (V are divided under faceR) indicate the surface area of right child node inner bounding volume, NRExpression is divided To the face element quantity of right child node, ChitIndicate that ray and Triangular object model seek the cost value of friendship.
From formula as can be seen that the SAH value calculating of each candidate's split surface independently carries out, so the meter of SAH value Calculation can use GPU and be accelerated, and a thread be distributed for each candidate split surface, for calculating the SAH of candidate's split surface Value, and the smallest SAH value and corresponding candidate split surface are found by reduction algorithm at the end GPU.
After split surface determines, the vertical reference axis of split surface is subdivision axis.
Step 4) two auxiliary arrays are the arrays of two equal lengths of preliminery application in video memory preliminery application module, each Assist the length of array for P times of triangle face-units intermediate cam face element quantity;Therefore in the layer-by-layer building process of KD tree, often The node of layer only needs to access the corresponding position of auxiliary array by address, can realize that the Lothrus apterus between node uses;According to Best split surface+position and subdivision axis, the bounding box on the left of best split surface will corresponding position in first auxiliary array It sets and is designated as 1;Corresponding position is designated as 1 in second auxiliary array by the bounding box on the right side of best split surface;Most preferably cutd open Corresponding position is designated as 1 in two auxiliary arrays by the bounding box of facet cutting;By assisting array, encirclement can be known Which child node bounding box in box array on corresponding position belongs to;
In step 5), auxiliary array is carried out comprising scan operation, original auxiliary array can be become one comprising scan operation A new auxiliary array, the new new element assisted on each of array position are all the positions for being designated as 0 in former auxiliary array from down The sum of the position all elements is set, therefore, the bounding box that can obtain present node by calculating new auxiliary array copies to Correspondence subscript when left and right child node inner bounding volume array, facilitates parallel replicate data;
In step 6), according to the index subscript of the calculated new auxiliary storage of array of step 5), by the packet in present node It encloses box array to copy to respectively on the manipulative indexing position of left and right child node, and is left and right two sons section by present node subdivision Point updates left and right child node relevant information according to the bounding box array being divided in left and right child node, and by left and right sub- section 2) point is entered step respectively as present node, be again introduced into the building of KD tree and loop to determine, up to all present node all into Enter step 7), becomes leaf node.
In step 7), the subdivision of present node is terminated, present node is labeled as leaf node, and will store in the leaf node Address in data be attached to output array end, and by data output array in address field be recorded in the leaf node It is interior, when all present nodes are all marked as leaf node, enter step 8), complete the building of KD tree.
In step 8), after all nodes all terminate subdivision, the high quality KD tree that a building is completed is obtained, and The output array that bounding box array in all leaf nodes is stitched together, the structure and output array for saving KD tree are in video memory In, it is used in ray tracing module as look-up table.The hint value of update child node, the boundary value of child node inner bounding volume, And the face element that child node is included, realize the unified output of leaf node inner bounding volume array.
In the generation module of virtual aperture face, virtual aperture diametric plane is projection of the benchmark on equiphase surface, virtual aperture Face is made of the equal rectangle ray tube in a large amount of apertures, the aperture λ of ray tubes, λ in this examplesTake 0.0625
λT, can be sized according to actual needs, virtual aperture diametric plane is in the end GPU parallel generation, and specific generate walks It is rapid as follows:
It first has to according to current incident angle, at a distance from model and the bounding box model array of input calculates virtually The long L of normal vector and size and width W of aperture plane;
According to the size L*W of virtual aperture diametric plane and the dimension, lambda of ray tubes, calculate the quantity of ray tubeWherein divisionWithOnly retain integer-bit;S is opened up for GPU core functionnumA thread, per thread generate One ray tube, and the ray tube of generation is exported to ray tracing module, according to the index of thread by the sound ray of generation In the video memory of tube bank deposit appropriate address, the reading of ray tracing module is waited.
In ray tracing module, the input of the module is respectively the ray tube of virtual aperture face generation module output and adds The two-way synchronism output of speed search generation module, three tunnel are synchronous input, are each ray of each ray tube on GPU A thread is opened up, parallel to execute ray tracing kernel function, ray tracing process is the traversal KD in the form of binary tree traversal Tree simulates light and is irradiated to the physical process on triangle face-units and generating reflection, and the output of the module is virtual aperture The propagation path of the intersecting point coordinate and ray of every ray and triangle face-units in face.
It during ray tracing, is independent from each other between ray, is that every ray distributes a line on GPU Journey independently executes ray tracing module, and tracking result is stored in corresponding video memory address, and subsequent integration is waited to use.It is going here and there In capable code, we can be used recursive method and realize ray tracing, but GPU does not support recurrence, therefore we are arranged One maximum tracking number Trace_Num, after ray tracing function performs Trace_Num times, current ray will It is automatically stopped tracking, prevents from falling into endless loop.
In ray tracing module, needs to judge whether ray tube that beam splitting occurs, specifically there is following procedure:
Judge whether ray tube meets the principle of beam splitting condition and be: whether four vertex of the rectangle of ray tube It is all irradiated on the same Triangular object model.When four vertex are irradiated on the same Triangular object model, beam splitting is not needed;When When four vertex are not irradiated on the same Triangular object model, ray tube is divided, and ray tube needs to carry out uniform four Equal part is divided into four sub- ray tubes by a ray tube.Again sub- ray tube is tracked, if ray tube division three After secondary, still meet splitting condition, then abandon the sub- ray tube for still meeting splitting condition, no longer progress ray tracing is with after It is continuous to calculate.It is effective sound ray according to reflective information judgement for the ray tube and sub- ray tube no longer divided Tube bank.Ray tube effectively whether judgment principle be: it is same whether the ray that four vertex of ray tube issue is irradiated to On a Triangular object model.If so, being labeled as effective ray tube, reflective information is returned;If it is not, being then labeled as invalid sound ray pipe Beam does not return to reflective information.
Gordon integration module of the invention receives the relevant information of tracing module output, is existed using Gordon integral formula GPU is upper to integrate ray tube parallel, is summed parallel using integral result of the reduction operation to all ray tubes, The output of the module is the algebraical sum of all ray tube integrated values under current angular, and detailed process is as follows:
The present invention is that each effective ray tube distributes a thread integral operation of progress, and integral result is made It is stored in video memory for median, after all effective ray tube integrals finish, using reduction operation to integral result Summation operation is carried out, the TS value under current incident angle is obtained.Time domain physical acoustic applications are in the sound field product of Shooting and bouncing rays method Point formula is;
Wherein, i, j indicate ray tube on virtual aperture diametric plane the i-th row jth column, k be wave number k=2 π/λ, K=2k, Δa'nIndicate the vector on each side of quadrangle, Δ ρ 'nVector Δ ρ ' of any point to reference point in expression integral domainn= (a'n+a'n+1)/2, T indicates r'i,jProjected length in integral domain,RMFor source location vector, r'i,X indicates receiving point to the unit vector of reference point
r'i,j=u'x'+v'y'+w'z'
Wherein x', y', z' are to carry out the three-dimensional planar obtained after projective transformation to luv space coordinate system { O, X, Y, Z } to sit Mark system { M, X', Y', Z'}, from three-dimensional planar coordinate system { the transition matrix Q of M, X', Y', Z'} to original coordinate system { O, X, Y, Z }
Wherein, M indicates that the quadrilateral middle point under original coordinate system is used as the coordinate under three-dimensional planar coordinate system former simultaneously Point, X' indicate point M to the unit vector on any vertex of quadrangle, the unit normal vector n of Z' expression quadrangular plan0, Y'=X' ×Z'。
By the arbitrary point under original coordinate system multiplied by the inverse matrix Q of transition matrix Qinv, that is, the point is completed from original coordinates It is to the coordinate projection transformation under three-dimensional planar coordinate system.
The reflection parameters of all rays are substituted into the formula, according to high-frequency approximation principle, calculate all Triangular object model products The TS value of sound field intensity under current incident angle can be obtained in the algebraical sum of point result.
In interface display module of the invention, by under current incident angle Gordon integration module export integrated value, Benchmark model, the quantity of triangle face-units intermediate cam face element, virtual aperture diametric plane size, virtual aperture diametric plane amount of radiation It is shown on interface, convenient for information inspection and debugging.
Using the coordinate origin of benchmark model as vertex, using sound source position and the line of coordinate origin as the axis inverse time Needle rotates θ (this example θ takes 0.5) degree, changes current incident angle;It rotates clockwise or counterclockwise;The angle, θ rotated every time It can be set as needed, be finally completed the integral calculation to benchmark0-180 degree.
Technical effect of the invention is explained again below by emulation:
Embodiment 7:
It is same that the ray tracing acceleration system and KD leaf nodes inner bounding volume array of KD tree based on GPU unify output method Embodiment 1-6,
Using emulating ray tube being propagated through in target under water based on the ray tracing acceleration system of the KD tree of GPU Journey, and tracking result is subjected to integral summation, the face element that is blocked can also participate in echo strength calculating, therefore can obtain more traditional The more accurate emulation forecast result of method.
Fig. 6 be using the present invention to Fig. 5 emulated as a result, referring to Fig. 6, wherein abscissa is incident angle, indulge sit Mark is the sound field intensity value under current incident angle (unit is decibel);As seen from Figure 6, sound field intensity value is with incidence angle Degree is first to increase the process reduced again from the change of 0-180 degree.When near 0 degree and 180 degree, since submarine head and tail portion are horizontal Sectional area is smaller, therefore the virtual aperture diametric plane generated is also smaller, lower so as to cause the intensity of acoustic wave value of return;With incidence The increase of angle, sound source are gradually located at the kayak body side of submarine, and virtual aperture diametric plane at this time is larger, the ray tube of generation compared with More, sound field intensity value is with regard to bigger, and due to the relationship of submarine surface reflection, the angle that sound field intensity reaches peak value is not 90 degree, but at 110 degree or so.
The face element that is blocked is taken caused by abandoning strategy relative to traditional plate member and improvement plate member calculation method Accuracy decline, the present invention obtain the sound wave for the face element reflection that is blocked using ray tracing technique, have more block for surface Concave surface target, the face element that is blocked can also participate in calculating, therefore can obtain higher computational accuracy using system of the invention.
Fig. 7 is to calculate related data (the GPU model for carrying out emulation generation on card to Fig. 5 in ten pieces of GPU using the present invention Nvidia P100), referring to Fig. 7, wherein abscissa is incident angle, and the ordinate in left side is that ray tracing module enters currently Time-consuming (unit is millisecond) under firing angle degree, the ordinate on right side are that virtual aperture face generation module generates under current angular The quantity of ray tube, dotted line is the calculating used time of ray tracing module of the invention under incidence angles degree, solid line in Fig. 7 For the quantity for the ray tube that virtual aperture face generation module of the invention generates under incidence angles degree.It can be with from Fig. 7 Find out, the quantity that the used time of ray tracing module and virtual aperture face generation module generate ray tube in the present invention is substantially becoming It is linear relationship in gesture, illustrates that ray tracing module time-consuming is only related with the quantity of ray tube, card unrelated with incident angle Existing faster speed when bright present invention building KD tree, and have preferable quality.It needs first to extract relative to traditional plate member Unobstructed face element carries out the process of ray tracing again, and present invention omits the extraction process of unobstructed face element, and accelerate ray The process of tracking, so the speed that the present invention forecasts Target Sound Field Strength Simulation is greatly improved.
In brief, the ray tracing acceleration system and KD leaf nodes inner bounding volume of the KD tree of the invention based on GPU Array unifies output method, solves the problems, such as the sound field intensity Fast Prediction to underwater complex target, and system is connected according to signal Direction successively includes: video memory preliminery application module, data preprocessing module, and KD tree acceleration search generation module, virtual aperture is looked unfamiliar At module, ray tracing module, integration module, interface display module.The present invention is additionally arranged video memory preliminery application module, and in KD tree The process that node partition is accelerated in acceleration search generation module, the invention also provides a kind of KD leaf nodes inner bounding volume numbers The unified output method of group comprising the steps of: input data simultaneously initializes, and terminates subdivision judgement, calculates best split surface position, Array assignment, scanning auxiliary array are assisted, subdivision present node forms output array, terminates subdivision process.The present invention can be right The triangle face-units of given arbitrary model carry out sound field intensity calculating.The present invention is chased after using the ray of the KD tree based on GPU Track acceleration system emulates the communication process of sound ray, and without judging have calculating speed fast to blocking face element, precision is high, right The adaptable feature of different target can be effectively used for the Fast Prediction emulation to submarine target intensity.

Claims (5)

1. a kind of ray tracing acceleration system of the KD tree based on GPU, includes according to signal connection direction and be connected with data and locate in advance Reason module, KD tree generation module, virtual aperture face generation module, ray tracing module, integration module, interface display module, It is characterized in that, video memory preliminery application module is equipped with before data preprocessing module, and the KD tree generation module is that KD tree accelerates to search Rope generation module, each module, which is interconnected together to constitute, realizes ray tracing based on GPU with KD tree acceleration search, to each module point It states as follows:
Video memory preliminery application module: the input of the module is the quantity of the Triangular object model of triangle face-units, in the inside modules root Video memory size shared by Triangular object model bounding box model is calculated according to the quantity of Triangular object model, disposably applies for GPU in advance at the end CPU On video memory space, the video memory space be at least enough Lothrus apterus generate KD tree, the output of the module be the video memory of preliminery application Beginning address;
Data preprocessing module: on GPU, using the video memory of video memory preliminery application module application, to the triangular facet for inputting the module Each Triangular object model of meta-model constructs bounding box parallel, and initializes current incident angle, export bounding box model array and Current incident angle;
KD tree acceleration search generation module: the module is the parallel generation on GPU, accelerates ray tracing process on KD tree, Dynamic uses the video memory of video memory preliminery application module application in a manner of the access of address, accelerates the generation of KD tree, data prediction mould What is exported in block surrounds the input that BOX Model is the module, the two-way synchronism output of the module, all the way for corresponding to the bounding box The KD tree that model array generates, another way are the bounding box model array reorganized, the two-way synchronism output directly as The input of ray tracing module;
Virtual aperture face generation module: the input of the module is the bounding box model array of data preprocessing module output and current Incident angle calculates the boundary value for surrounding BOX Model, and the parallel generation ray tube on GPU in the module, the module Output is the ray tube of several identical sizes;
Ray tracing module: the input of the module is respectively ray tube and the acceleration of virtual aperture diametric plane unit generation module output The two-way synchronism output of generation module is searched for, which is synchronous input, accelerates the ray of ray tube to chase after parallel on GPU Track, ray tracing process are completed by traversal KD tree, and the output of the module is ray tube after traversing KD tree-model Reflective information;
Gordon integration module: Gordon integration module receives the reflective information of tracing module output, is integrated using Gordon public Formula integrates ray tube on GPU parallel, is carried out using integral result of the reduction operation to all ray tubes parallel Summation, the output of the module are the algebraical sum of all ray tube integrated values under current angular;
Interface display module: the integrated value and relevant information of Gordon integration module output are shown on interface, are convenient for information It checks and debugs.
2. the ray tracing acceleration system of the KD tree according to claim 1 based on GPU, which is characterized in that the pre- Shen of video memory Please the pre- first to file of module video memory space size be (DPI)+2F+ (PI)
Wherein, D is the maximal tree depth estimated, and P is the coefficient of expansion, and I is the size of video memory shared by bounding box array in root node, F The size of video memory shared by array is assisted for one, which is the video memory sky that every layer of KD tree is all assigned with that size is (PI) Between, while being to export the video memory space that array application size is (PI).
3. the ray tracing acceleration system of the KD tree according to claim 1 based on GPU, which is characterized in that the KD tree Acceleration search generation module unifies output method using KD leaf nodes inner bounding volume array and generates KD tree, to KD tree acceleration search The submodule being equipped in generation module is described below:
Condition distinguishing submodule: the submodule receives the data of data preprocessing module output first, is judged according to number of elements It determines that present node data are into node partition submodule or to terminate subdivision submodule, terminates subdivision submodule if entered Block then stops the subdivision of present node;If sequentially entering child node array later into node partition submodule and generating son Module, child node information update submodule, finally again return to condition distinguishing submodule and are judged again child node;Repeatedly It loops to determine, successively constructs KD tree construction, until all nodes, which all enter, terminates subdivision submodule;
Node partition face computational submodule: the input of the submodule is the output of condition distinguishing submodule, and the submodule is according to connecing The data received calculate split surface position and the subdivision axis of present node, a father node are split into two child nodes, and will The data that calculated result and the submodule receive are output to child node array and generate submodule;
Child node array generates submodule: the input of the submodule is the output of node partition face computational submodule, according to input, Present node array is divided into two sub- arrays in GPU, two sub- arrays that parallel output generates are respectively to two son sections Point information update submodule;
Child node information update submodule: the submodule is the identical submodule arranged side by side of two structures, and respective input is son section Point array generates a sub- array of submodule output, and the child node of present node updates child node according to the subnumber group received Relevant information, and child node is exported with subnumber group to condition distinguishing submodule as present node and is rejudged;Again It is secondary to enter the building cyclic process of KD tree;
Subdivision terminates submodule: the input of the submodule is the data of condition distinguishing submodule output, and the submodule termination of a block is current The subdivision of node data, and the subnumber group that present node includes is exported into output array, wait whole nodes to stop subdivision Disposable output afterwards.
4. the ray tracing acceleration system of the KD tree according to claim 3 based on GPU, which is characterized in that the sub- section Point array, which generates submodule, to be executed on GPU, is generated the unit being equipped in submodule to son node number group and is described below:
Scanning element: the input of the unit is the output of node partition submodule, and use includes scan operation on GPU, parallel Calculate index of two son node number constituent element elements in present node array;
Array indexing unit: the input of the unit is the index of scanning element output, by the number in the address of present node storage According to being copied to respectively according to index in the address of two child nodes storage, the array for including in present node storage address is divided In the address stored to two child nodes, then two child nodes are exported respectively to condition distinguishing submodule and are sentenced again It is disconnected.
5. a kind of KD leaf nodes inner bounding volume array unifies output method, which is characterized in that KD tree acceleration search generation module Unify output method using KD leaf nodes inner bounding volume array and generate KD tree, comprises the following steps that
1) input data and initialize: input Triangular object model surrounds BOX Model and two auxiliary arrays, assists the length of array Bounding box array with every layer of KD tree is isometric, for the division of auxiliary data, initializes root node as present node;
2) it terminates subdivision judgement: judging whether present node meets and terminate subdivision condition, current node address includes in position Whether bounding box quantity, which is less than, terminates subdivision threshold value, if satisfied, step 7) is executed, if not satisfied, executing step 3);
3) calculate best split surface position: point-score or SAH method calculate best split surface and subdivision axis in use;
4) it assists array assignment: being divided using two auxiliary array auxiliary, according to best split surface position and subdivision axis, be located at most Corresponding position is designated as 1 in first auxiliary array by the bounding box on the left of good split surface;Packet on the right side of best split surface Box is enclosed, corresponding position is designated as 1 in second auxiliary array;By the bounding box of best split surface cutting, in two supplementary numbers Corresponding position is designated as 1 in group;
5) scanning auxiliary array: the starting storage ground when bounding box array in present node copies to left and right child node is calculated Location, and obtain location index of each data in left and right child node corresponding address;
6) subdivision present node: being left and right two child nodes: the location index according to obtained in 5) by present node subdivision, will Present node inner bounding volume array copies in the corresponding position of left and right node storage address, updates left and right child node correlation letter Breath, and entered step left and right child node as present node 2), it is again introduced into the building of KD tree and loops to determine;
7) it forms output array: present node being labeled as leaf node, the bounding box array in the address of leaf node storage is answered It makes in output array, and address of the bounding box array in output array is recorded in leaf node, leave ray tracing mould for Block uses;
8) it terminates subdivision process: after all nodes all terminate subdivision, obtaining the high quality KD tree that a building is completed, with And the bounding box model array after reorganizing.
CN201910025229.3A 2019-01-11 2019-01-11 Ray tracing acceleration system of KD tree on GPU and KD tree output method Active CN109543358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910025229.3A CN109543358B (en) 2019-01-11 2019-01-11 Ray tracing acceleration system of KD tree on GPU and KD tree output method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910025229.3A CN109543358B (en) 2019-01-11 2019-01-11 Ray tracing acceleration system of KD tree on GPU and KD tree output method

Publications (2)

Publication Number Publication Date
CN109543358A true CN109543358A (en) 2019-03-29
CN109543358B CN109543358B (en) 2022-12-06

Family

ID=65834872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910025229.3A Active CN109543358B (en) 2019-01-11 2019-01-11 Ray tracing acceleration system of KD tree on GPU and KD tree output method

Country Status (1)

Country Link
CN (1) CN109543358B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222418A (en) * 2019-06-06 2019-09-10 电子科技大学 A kind of FDTD fast algorithm for estimating target scattering characteristics
CN112307266A (en) * 2019-07-31 2021-02-02 华为技术有限公司 Index model construction method and device
CN112650832A (en) * 2020-12-14 2021-04-13 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN112712581A (en) * 2021-01-12 2021-04-27 山东大学 Ray tracing acceleration method
CN114331801A (en) * 2020-09-30 2022-04-12 想象技术有限公司 Intersection testing for ray tracing
CN115346005A (en) * 2022-10-19 2022-11-15 中国空气动力研究与发展中心计算空气动力研究所 Data structure construction method for object plane grid based on nested bounding box concept
CN117058300A (en) * 2023-08-11 2023-11-14 上海慕灿信息科技有限公司 Method for calculating intersection point of acceleration ray and uncut curved surface based on KD tree

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080043018A1 (en) * 2000-06-19 2008-02-21 Alexander Keller Instant ray tracing
US20100188396A1 (en) * 2009-01-28 2010-07-29 International Business Machines Corporation Updating Ray Traced Acceleration Data Structures Between Frames Based on Changing Perspective
CN108230378A (en) * 2018-01-29 2018-06-29 北京航空航天大学 Processing Algorithm is blocked in a kind of calculating holography based on ray tracing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080043018A1 (en) * 2000-06-19 2008-02-21 Alexander Keller Instant ray tracing
US20100188396A1 (en) * 2009-01-28 2010-07-29 International Business Machines Corporation Updating Ray Traced Acceleration Data Structures Between Frames Based on Changing Perspective
CN108230378A (en) * 2018-01-29 2018-06-29 北京航空航天大学 Processing Algorithm is blocked in a kind of calculating holography based on ray tracing

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222418A (en) * 2019-06-06 2019-09-10 电子科技大学 A kind of FDTD fast algorithm for estimating target scattering characteristics
CN112307266A (en) * 2019-07-31 2021-02-02 华为技术有限公司 Index model construction method and device
CN112307266B (en) * 2019-07-31 2023-08-22 华为云计算技术有限公司 Index model construction method and device
CN114331801A (en) * 2020-09-30 2022-04-12 想象技术有限公司 Intersection testing for ray tracing
CN114331801B (en) * 2020-09-30 2024-04-02 想象技术有限公司 Intersection testing for ray tracing
CN112650832A (en) * 2020-12-14 2021-04-13 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN112650832B (en) * 2020-12-14 2022-09-06 中国电子科技集团公司第二十八研究所 Knowledge correlation network key node discovery method based on topology and literature characteristics
CN112712581A (en) * 2021-01-12 2021-04-27 山东大学 Ray tracing acceleration method
CN115346005A (en) * 2022-10-19 2022-11-15 中国空气动力研究与发展中心计算空气动力研究所 Data structure construction method for object plane grid based on nested bounding box concept
CN117058300A (en) * 2023-08-11 2023-11-14 上海慕灿信息科技有限公司 Method for calculating intersection point of acceleration ray and uncut curved surface based on KD tree
CN117058300B (en) * 2023-08-11 2024-05-03 上海慕灿信息科技有限公司 Method for calculating intersection point of acceleration ray and uncut curved surface based on KD tree

Also Published As

Publication number Publication date
CN109543358B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN109543358A (en) The ray tracing acceleration system and KD tree output method of the upper KD tree of GPU
Zhang et al. An efficient approach to directly compute the exact Hausdorff distance for 3D point sets
US20200160743A1 (en) Method and system for simulating a radar image
CN109003603A (en) Audio recognition method and Related product
CN108663654B (en) 360-degree all-dimensional dynamic direction finding method based on continuous quantum pigeon group
Dong et al. An accelerated SBR for EM scattering from the electrically large complex objects
CN104700447B (en) Light tracing parallel optimization method based on Intel many-core framework
CN111475979A (en) Acoustic target intensity simulation method based on multi-GPU multi-resolution bouncing rays
CN110208769A (en) Ray-tracing procedure and system based on nurbs surface
US6429864B1 (en) Method for traversing a binary space partition or octree and image processor for implementing the method
Zhu et al. Reconstruction of tree crown shape from scanned data
Lischinski et al. Improved techniques for ray tracing parametric surfaces
CN102722653B (en) Ray tracking and accelerating algorithm based on MapReduce
CN106528956B (en) Data interpolations field intensity prediction method based on ray tracing models
CN106646664A (en) GPU-based echo simulation method and system of microwave of human body
CN113190984A (en) Underwater sound field model BELLHOP parallel implementation method
CN115437782A (en) Bellhop3D parallel implementation method of underwater three-dimensional sound field model based on domestic many-core supercomputing
CN112083423B (en) Multi-base sound source high-precision positioning method
CN111814386A (en) Method and system for guiding hypersonic flow field into BP neural network for fine processing
Breglia et al. Ultrafast ray tracing for electromagnetics via kD-tree and BVH on GPU
Zhu et al. Parallel optimization of underwater acoustic models: A survey
Williams et al. Distributed ray tracing using an SIMD processor array
Sitchinava Computational geometry in the parallel external memory model
WO2024119366A1 (en) Method for calculating of sound wave diffraction at straight edge based on path tracking
Siltanen et al. Efficient acoustic radiance transfer method with time-dependent reflections

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant