CN109543358B

CN109543358B - Ray tracing acceleration system of KD tree on GPU and KD tree output method

Info

Publication number: CN109543358B
Application number: CN201910025229.3A
Authority: CN
Inventors: 吴宪云; 王康; 李云松; 赵罡; 苏丽雪; 孙力; 司鹏辉; 郑为; 申珅; 雷杰; 王柯俨; 吕维; 孙乃葳
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2022-12-06
Anticipated expiration: 2039-01-11
Also published as: CN109543358A

Abstract

The invention discloses a ray tracing acceleration system of a KD (K-distance distribution) tree on a GPU (graphics processing unit) and a KD tree output method, which solve the problem of quickly forecasting the sound field intensity of an underwater complex target, wherein the system sequentially comprises the following modules in the signal connection direction: the device comprises a display memory pre-application module, a data preprocessing module, a KD tree accelerated search generation module, a virtual aperture surface generation module, a ray tracing module, an integration module and an interface display module. A video memory pre-application module is additionally arranged, and the node subdivision process is accelerated in a KD tree accelerated search generation module. The KD tree output method comprises the following implementation steps: inputting data; subdivision judgment; calculating a subdivision surface; assigning and scanning an auxiliary array; dividing a current node; and outputting the arrays to complete tree building. The invention can calculate the sound field intensity of any triangular surface element model. The method simulates the propagation process of the sound ray, does not need to judge the shielding surface element, and has high calculation speed, high precision and strong adaptability to different targets. The method is used for rapid forecasting simulation of the underwater target strength.

Description

Ray tracing acceleration system of KD tree on GPU and KD tree output method

Technical Field

The invention belongs to the technical field of acoustics, relates to a simulation technology for underwater acoustics rapid sound field intensity calculation, and particularly relates to a ray tracing acceleration system of a KD (K-ray distribution) tree on a GPU (graphics processing unit) and a KD tree output method, which are used for rapid prediction of an underwater target.

Background

At present, the sound field characteristics of underwater targets under various postures are accurately calculated, and the method has important significance for target detection of underwater weapons such as torpedoes and the like and national defense construction. With the development of torpedo and sonar technologies, it is required to calculate the sound field intensity characteristics of a target with higher accuracy. At present, methods for analyzing echo characteristics of underwater targets mainly comprise a plate element method based on a kirchhoff approximation method and an improved plate element method introduced with a Gordon integral method. In the two methods, under the condition that every two surface elements of the target surface are shielded, complex shielding judgment is carried out, so that the calculation amount is large and the efficiency is low.

Fan Jun, shang Weilin, etc. published in 2012 in the non-patent literature on fluid and structural acoustics, "plate element method for sonar target echo characteristic prediction", proposed a method that introduces the plate element method for calculating radar scattering cross section into sonar target intensity calculation. When the Kirchhoff is applied to approximate the scattering sound field of the target in water, a group of plane plate elements are used for approximating a target curved surface with a complex shape, and the sum of the scattering sound fields of all the plate elements is the approximate value of the total scattering sound field. The integration of single plate element is converted into algebraic sum, so that the calculation speed of the plate element method is increased by many times compared with the common surface element integration method. The method is popularized to target near field and non-rigid surface echo characteristic prediction aiming at the sonar condition. However, in the method, when the coordinates of the plate element space are divided, the integral denominator is zero, so that the target intensity calculated by the element may have a singular value, thereby causing unstable calculation.

Sun Naiwei, li Jianchen, etc., 2016, published in torpedo technology, and in "submarine target intensity forecast simulation based on improved plate element method", a method is proposed that can avoid generation of singular values when calculating target intensity. Aiming at the problem that the calculation result is unstable due to the fact that the integral denominator is possibly zero when the submarine target intensity is calculated by the plate element method, the Gordon integral algorithm is used for forecasting simulation of the submarine target intensity, and the surface element shielding judgment process is simplified aiming at a complex target. Therefore, the calculation result of the method does not have singular values and is more stable. However, due to the judgment of the occlusion surface element, the calculation result is not accurate enough when the method calculates the object with more occlusion.

In the prior art, the forecasting calculation precision of the underwater target is not high enough, and the calculation time cannot meet the real-time requirement.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the ray tracing acceleration system of the KD tree on the GPU with high calculation precision and high calculation speed and the KD tree output method.

The invention firstly relates to a ray tracing acceleration system of a KD tree based on a GPU, which comprises and is connected with a data preprocessing module, an acceleration search generation module, a virtual aperture surface generation module, a ray tracing module, an integration module and an interface display module according to a signal connection direction, and is characterized in that a display memory pre-application module is arranged at the beginning of the system, the acceleration search generation module is the KD tree acceleration search generation module, all the modules are mutually connected to form the KD tree acceleration search based on the GPU so as to realize ray tracing, and the modules are divided into the following parts:

the video memory pre-application module: the input of the module is the number of the triangular surface elements of the triangular surface element model, the size of the video memory occupied by the triangular surface element bounding box model array is calculated in the module according to the number of the triangular surface elements, the CPU end applies for the video memory space on the GPU in advance at one time, the video memory space is at least enough to generate a KD tree without conflict, and the output of the module is the initial address of the pre-applied video memory;

a data preprocessing module: on a GPU, using a video memory applied by a video memory pre-application module, parallelly constructing bounding boxes for each triangular surface element of a triangular surface element model input into the module, initializing the current incident angle, and outputting a bounding box model array;

and a KD tree accelerated search generation module: the module is generated in parallel on a GPU, a ray tracing process is accelerated on a KD tree, a video memory applied by a video memory pre-application module is dynamically used in an address access mode, the generation of the KD tree is accelerated, a bounding box model array output in a data preprocessing module is input to the module, two paths of synchronous output of the module are output, one path of the synchronous output is the KD tree corresponding to the bounding box model array, the other path of the synchronous output is a reorganized bounding box model array, and the two paths of synchronous output are directly used as the input of the ray tracing module;

a virtual aperture surface generation module: the input of the module is a bounding box model array and a current incident angle output by the data preprocessing module, the boundary value of the bounding box model is calculated in the module, and the sound ray tube bundles are generated on the GPU in parallel, and the output of the module is a plurality of sound ray tube bundles with the same size;

a ray tracing module: the input of the module is respectively the two synchronous outputs of the sound ray tube bundle output by the virtual aperture surface generation module and the accelerated search generation module, the three synchronous inputs are synchronous inputs, ray tracing of the sound ray tube bundle is accelerated on a GPU in parallel, the ray tracing process is completed by traversing a KD tree, and the output of the module is reflection information of the sound ray tube bundle after traversing the KD tree;

gordon integration module: the Gordon integration module receives reflection information output by the tracking module, performs parallel integration on the sound ray tube bundles on the GPU by using a Gordon integration formula, performs parallel summation on the integration results of all the sound ray tube bundles by using reduction operation, and outputs the result as the algebraic sum of the integral values of all the sound ray tube bundles at the current angle;

an interface display module: the integral value and the related information output by the Gordon integral module are displayed on an interface, so that the information can be conveniently checked and debugged.

The invention also discloses a method for uniformly outputting the bounding box arrays in the KD leaf node, which is characterized by comprising the following steps of:

1) Inputting data and initializing: inputting a triangular surface element bounding box model array and two auxiliary arrays, wherein the length of the auxiliary arrays is equal to that of the bounding box array of each layer of the KD tree, and the auxiliary arrays are used for assisting in dividing the bounding box arrays in the current node and initializing a root node as the current node;

2) And (3) stopping subdivision judgment: judging whether the current node meets a subdivision termination condition or not, judging whether the number of bounding boxes contained in the address position of the current node is less than a subdivision termination threshold or not, if so, executing a step 7), and if not, executing a step 3);

3) Calculating the optimal splitting surface position: calculating the optimal splitting surface position and the splitting axis by using a midsplit method or an SAH method;

4) And (3) auxiliary array assignment: using two auxiliary arrays to assist in dividing, and marking the corresponding position as 1 in the first auxiliary array according to the position of the optimal subdivision surface and the subdivision axis and the bounding box positioned on the left side of the optimal subdivision surface; bounding boxes to the right of the optimal splitting plane, with the corresponding position marked as 1 in the second auxiliary array; marking the corresponding positions of the bounding boxes divided by the optimal subdivision surface as 1 in the two auxiliary arrays;

5) Scanning the auxiliary array: calculating initial storage addresses when an array in the current node is copied to the left child node and the right child node, and obtaining position indexes of each element in the array in the corresponding addresses of the left child node and the right child node;

6) Dividing the current nodes: dividing the current node into a left sub-node and a right sub-node, copying the bounding box array in the current node into corresponding positions of storage addresses of the left node and the right node according to the position index, updating the relevant information of the left sub-node and the right sub-node, respectively using the left sub-node and the right sub-node as the current node to enter the step 2), and entering the KD tree construction cycle judgment again;

7) Forming an output array: marking the current node as a leaf node, adding an array in the address stored by the leaf node to the tail of the output array, recording the address of the array in the leaf node in the output array in the leaf node, and reserving the leaf node for a ray tracing module;

8) And (3) stopping the subdivision process: and when all the nodes terminate subdivision, obtaining a constructed high-quality KD tree and a reorganized bounding box model array.

The method solves the problems that in the prior art, for a concave target with more shelters on the surface, the sheltered surface element needs to calculate a complex sheltering relation, so that the calculated amount is greatly increased, the calculation efficiency is low, and a technical means is provided for quickly and accurately calculating the sound field intensity of the underwater complex target.

Compared with the prior art, the invention has the technical advantages that:

the calculation precision is high: compared with the precision reduction caused by adopting a discarding strategy on the shielded surface element by the traditional plate element and improved plate element calculating party, the system can obtain higher calculating precision by using the system of the invention for the concave target with more shielding on the surface and using the ray tracing technology to obtain the sound wave reflected by the shielded surface element;

the KD tree is constructed quickly: in the prior art, only the SAH calculation part of the KD tree construction process is accelerated by using the GPU, and complete GPU acceleration is not realized, the parent node is divided into child nodes by using the GPU for acceleration, so that the KD tree construction time on the GPU is reduced, the display memory space enough for constructing the KD tree without conflict is applied once before constructing the KD tree, and dynamic application which is time-consuming in the KD tree construction process is replaced by an address access mode, so that the KD tree construction speed can be accelerated by the tree construction method;

the KD tree structure occupies small memory: because the inside of the KD tree node does not store the bounding box array, only the storage node contains the initial address of the bounding box array in the video memory, and the storage node is accessed according to the address when in use, the KD tree constructed by the invention occupies a smaller video memory.

Drawings

FIG. 1 is a schematic diagram of a ray tracing acceleration system based on a GPU KD tree according to the present invention;

FIG. 2 is a flowchart of a method for uniformly outputting arrays of bounding boxes in KD leaf nodes;

FIG. 3 is a schematic diagram of the internal structure of the accelerated search generation module of the KD tree;

FIG. 4 is a schematic diagram of the bounding box array in the current node being partitioned into left and right child nodes and terminating partitioning;

FIG. 5 is a schematic diagram of a triangular bin model used in example 7;

FIG. 6 is a graph of sound field intensity simulated using the present invention for the model shown in FIG. 5;

FIG. 7 is a graph of the number of bundles of acoustic rays used in the calculation and generated by the ray tracing module simulated on the model of FIG. 5 using the present system.

Detailed Description

The invention is described in detail below with reference to the accompanying drawings

Example 1:

the ray tracing technology is widely applied to the fields of picture rendering, electromagnetic field intensity calculation and the like, can also be applied to rapid and accurate prediction of the sound field intensity of an underwater complex target, and in a traditional underwater target sound field prediction simulation system, plate elements and an improved plate element method are used for performing simulation prediction on the underwater target. According to the traditional technology, for the surface shielding of more targets, a large amount of surface elements need to be shielded and judged, and the shielded surface elements are discarded, so that certain loss of calculation precision is caused. With the continuous development of the GPU technology, it becomes possible to apply some faster algorithms on the GPU. The invention develops research aiming at the current situation, provides a ray tracing acceleration system of a KD (K-dimension) tree based on a GPU (graphics processing unit), which comprises and is connected with a data preprocessing module, an acceleration search generation module, a virtual aperture surface generation module, a ray tracing module, an integration module and an interface display module according to a signal connection direction, and is shown in figure 1, wherein a video memory pre-application module is arranged in front of the data preprocessing module, the acceleration search generation module is the KD tree acceleration search generation module, all the modules of the system are mutually connected to form the KD tree acceleration search based on the GPU to realize ray tracing, and the modules are divided into the following parts:

the video memory pre-application module: the input of the module is the number of the triangular surface elements of the triangular surface element model, the size of the video memory occupied by the triangular surface element bounding box model array is calculated in the module according to the number of the triangular surface elements, the CPU end applies for the video memory space on the GPU in advance at one time, the video memory space is at least enough to generate a KD tree without conflict, and the output of the module is the initial address of the pre-applied video memory.

A data preprocessing module: and on the GPU, using the video memory applied by the video memory pre-application module, parallelly constructing bounding boxes for each triangular surface element of the triangular surface element model input into the module, initializing the current incident angle, and outputting a bounding box model array.

And a KD tree accelerated search generation module: the module is generated in parallel on a GPU, the ray tracing process is accelerated on a KD tree, the video memory applied by a video memory pre-application module is dynamically used in an address access mode, the generation of the KD tree is accelerated, a bounding box model array output in a data preprocessing module is used as the input of the module, two paths of synchronous output of the module are performed, one path of synchronous output is the KD tree corresponding to the bounding box model array, the other path of synchronous output is a reorganized bounding box model array, and the two paths of synchronous output are directly used as the input of the ray tracing module.

A virtual aperture surface generation module: the input of the module is a bounding box model array and a current incident angle output by the data preprocessing module, the boundary value of the bounding box model is calculated in the module, the sound ray tube bundles are generated on the GPU in parallel, and the output of the module is a plurality of sound ray tube bundles with the same size.

A ray tracing module: the tracking module is a ray tracking module, the input of the module is two paths of synchronous outputs of a sound ray tube bundle output by the virtual aperture surface generation module and an accelerated search generation module respectively, the three paths are synchronous inputs, the module accelerates the ray tracking of the sound ray tube bundle on a GPU in parallel, the ray tracking process is completed by traversing a KD tree generated by the KD tree accelerated search generation module by using the sound ray tube bundle generated by the virtual aperture surface module, and the output of the module is reflection information of the sound ray tube bundle after traversing the KD tree model.

Gordon integration module: the Gordon integration module receives the reflection information output by the tracking module, performs parallel integration on the sound ray tube bundles on the GPU by using a Gordon integration formula, performs parallel summation on the integration results of all the sound ray tube bundles by using reduction operation, and outputs the result of the Gordon integration module are the algebraic sum of the integral values of all the sound ray tube bundles at the current angle.

An interface display module: the integration value and the related information output by the Gordon integration module are displayed on an interface, so that the information is convenient to view and debug.

The invention changes the incident angle, and repeatedly calls the virtual aperture surface generation module, the ray tracing module, the integration module and the interface display module to complete the integration calculation and display of the triangular surface element model at 0-180 degrees.

Because the underwater target sound field intensity simulation forecasting system has higher requirements on the speed and the precision of the underwater target sound field intensity simulation forecasting, the method improves the forecasting precision of the traditional method by using a ray tracking technology, accelerates the ray tracking search process by using a KD tree structure, accelerates the construction of the KD tree and the ray tracking process by using a GPU, further improves the ray tracking speed, and thus improves the speed and the precision of the underwater target sound field intensity simulation forecasting. In the construction of the KD tree, arrays in leaf nodes are uniformly organized into output arrays, so that the merging access to data in the ray tracing process is facilitated, and the video memory occupation of the KD tree is also saved.

Example 2:

the ray tracing acceleration system of KD tree based on GPU is the same as embodiment 1, the video memory space size applied by the video memory pre-application module in advance in the invention is (D.P.I) +2F + (P.I)

Wherein, D is the estimated maximum tree depth, P is the expansion coefficient, I is the size of the video memory occupied by the bounding box array in the root node, F is the size of the video memory occupied by an auxiliary array, the module allocates the video memory space of the size (P · I) to each layer of the KD tree, and applies for the video memory space of the size (P · I) for the output array. When the parent node is divided, the bounding box model array of the parent node is copied to the display and storage block belonging to the child node according to the dividing surface and the dividing axis, so that display and storage access conflict cannot occur in the parallel copying process; in addition, the setting of the expansion coefficient also ensures that the triangular surface element bounding box divided by the splitting surface can be safely placed into the two sub-nodes.

In the prior art, the display memory space is dynamically applied in the construction process of the KD tree, and the process is time-consuming. The video memory pre-application module applies for the video memory space of the KD tree without conflict before the KD tree is constructed at one time, and replaces time-consuming dynamic application in the tree construction process in an address access mode, so that the construction speed of the KD tree can be increased.

Example 3:

ray tracing acceleration system based on KD-tree of GPU as in embodiments 1-2,

in the prior art, after the KD tree is constructed, indexes of data are stored in leaf nodes of the KD tree, and since the subdivision is stopped in advance when the number of bounding boxes in a node by the SAH algorithm is greater than a subdivision threshold, a memory required by the indexes of data inside the node is uncertain. In the invention, the address field information of the arrays in the output arrays is only stored in the leaf nodes, and the arrays in the leaf nodes are continuous in the output arrays, so that the access data can be merged in the ray tracing process, and the ray tracing process is accelerated.

The process of generating the KD tree by the KD tree accelerated search generating module is completed on the GPU in parallel, the KD tree accelerated search generating module generates the KD tree by utilizing a KD leaf node inner bounding box array unified output method, referring to fig. 3, and the sub-modules arranged in the KD tree accelerated search generating module are divided into the following steps:

a condition discrimination submodule: the submodule firstly receives the array output by the data preprocessing module, judges and determines whether the current node data enters the node subdivision submodule or the subdivision termination submodule according to the number of elements, and stops subdivision of the current node if the current node data enters the subdivision termination submodule. And if the node subdivision submodule is entered, the sub-node inner bounding box array generation submodule and the sub-node information updating submodule are entered in sequence, and finally the condition judgment submodule is returned to perform judgment again on the sub-node. And then circularly receiving data fed back by the node subdivision sub-module, repeatedly and circularly judging, and constructing a KD tree structure layer by layer until all nodes enter the subdivision sub-module.

A node subdivision surface calculation submodule: the input of the submodule is the output of a condition judging submodule, the submodule calculates the splitting surface position and the splitting axis of the current node according to the received data, divides a father node into two child nodes, the child nodes contain partial information of the father node, and outputs the calculation result and the data received by the submodule to a bounding box array in the child nodes to generate the submodule.

A large node subdivision unit: and calculating the position of the splitting surface and the splitting axis of the current node by using a splitting method according to the input of the node splitting submodule.

A small node subdivision unit: and calculating the position of the splitting surface and the splitting axis of the current node according to the node splitting submodule by using an SAH method.

And (3) generating a sub-module by the bounding box array in the sub-node: the input of the sub-module is the output of the node subdivision surface calculation sub-module, the bounding box array in the current node is divided into two sub-nodes of the current node in the GPU according to the input, and the two generated sub-nodes are output in parallel to the two sub-node information updating sub-modules respectively.

And a child node information updating submodule: the sub-modules are two parallel sub-modules with the same structure, the respective input is a sub-array output by a sub-node inner bounding box array generation sub-module, the sub-node of the current node inherits the clue value of the current node, then the related information of the sub-node is updated according to the received sub-array, and the sub-node is output to the condition judgment sub-module as the current node and the sub-array for re-judgment; and entering the KD tree construction loop process again.

A subdivision termination submodule: the input of the submodule is data output by the conditional judgment submodule, the submodule terminates subdivision of the bounding box array in the current node, outputs the subarray contained in the current node to an output array, and waits for one-time output after all nodes stop subdivision.

The input of the KD tree accelerated search generation module firstly enters a condition judgment submodule inside the KD tree accelerated search generation module, and the condition judgment submodule judges whether the input of the condition judgment submodule is output to a node subdivision submodule or a subdivision termination submodule in the condition judgment submodule; if the output of the condition judging submodule enters a subdivision termination submodule, the subdivision of the current node is stopped; if the output of the condition judgment sub-module enters the node subdivision sub-module, firstly, calculation is carried out in the node subdivision sub-module, then the output of the node subdivision sub-module is used as the input of the sub-node internal bounding box array generation sub-module, the sub-node internal bounding box array generation sub-module accelerates the process of generating sub-arrays on a GPU, two sub-arrays are output and respectively used as the independent input of the condition judgment sub-module, node subdivision is carried out in parallel until the sub-module enters the subdivision termination sub-module.

Example 4:

ray tracing acceleration system based on KD-tree of GPU as in embodiments 1-3,

in the prior art, in the process of splitting parent nodes into child nodes of a KD tree, arrays inside the parent nodes need to be sorted in the direction of a splitting axis, and after sorting, the arrays are copied into left and right child nodes respectively according to the optimal splitting surface position. Due to the high time complexity of the sorting algorithm, the overall construction time is affected. In order to shorten the time for constructing the KD tree, the present invention does not sort the arrays inside the parent node, but only determines that the array inside the parent node is copied to the left child node according to the optimal splitting plane position by using the parallel acceleration characteristic of the GPU? Or copied to the right child node? Or to both the left and right child nodes. The data moving time is greatly shortened.

The sub-node inner bounding box array generation sub-module is executed on a GPU, and the units in the sub-node inner bounding box array generation sub-module are as follows:

a scanning unit: the input of the unit is the output of the node subdivision submodule, the index of the bounding box array elements in the two sub nodes in the current node bounding box array is calculated in parallel by using scanning operation on a GPU, and the index is stored in the position corresponding to the auxiliary array applied by the video memory pre-application module and is output to the array index unit.

Array index unit: the input of the unit is an index output by the scanning unit, the index is parallelly indexed on the GPU, the array in the address stored by the current node is respectively copied into the addresses stored by the two sub-nodes according to the index, the array contained in the address stored by the current node is divided into the addresses stored by the two sub-nodes, and then the two sub-nodes are respectively output to the condition judging sub-module for judging again.

In the prior art, only the SAH calculation part of the KD tree construction process is accelerated by using the GPU, and the process of copying the array from a parent node to a child node is not accelerated on the GPU. In the child node subdivision submodule, the scanning unit is arranged to calculate position indexes when the bounding box arrays in the father node are copied to the left child node and the right child node in advance, the array index unit is used for copying data in the arrays to the left child node and the right child node respectively according to the indexes, the process that the bounding box arrays in the father node are copied to the left child node and the right child node is achieved in parallel on the GPU, and construction of a KD tree is accelerated.

Example 5:

the invention also discloses a method for uniformly outputting an array of bounding boxes in KD leaf nodes, which realizes uniform output of bounding boxes in leaf nodes in a KD tree accelerated search generation module in a KD tree ray tracing acceleration system based on a GPU, the KD tree ray tracing acceleration system based on the GPU is the same as embodiments 1-4,

referring to fig. 2, the method comprises the following steps:

1) Inputting data and initializing: the method comprises the steps of inputting a triangular surface element bounding box model array and two auxiliary arrays, wherein the length of each auxiliary array is equal to that of a bounding box array on each layer of a KD tree, the auxiliary arrays are used for assisting in dividing the bounding box arrays in a current node, and a root node is initialized to serve as the current node.

2) And (3) stopping subdivision judgment: and judging whether the current node meets the subdivision termination condition or not, judging whether the number of bounding boxes contained in the address position of the current node is less than the subdivision termination threshold value or not, if so, executing the step 7), and if not, executing the step 3).

3) Calculating the optimal splitting surface position: the optimal splitting plane and the splitting axis are calculated by using a median method or a surface area heuristic algorithm (SAH).

4) And (3) auxiliary array assignment: using two auxiliary arrays to assist in dividing, and marking the corresponding position as 1 in the first auxiliary array according to the position of the optimal subdivision surface and the subdivision axis and the bounding box positioned on the left side of the optimal subdivision surface; bounding boxes to the right of the optimal splitting plane, with the corresponding position marked as 1 in the second auxiliary array; the bounding box, sliced by the best-split plane, has the corresponding position marked 1 in both auxiliary arrays.

5) Scanning the auxiliary array: and calculating the initial storage addresses when the array in the current node is copied to the left child node and the right child node, and obtaining the position indexes of each element in the array in the corresponding addresses of the left child node and the right child node.

6) Dividing the current nodes: dividing the current node into a left sub-node and a right sub-node, copying the bounding box array in the current node into corresponding positions of storage addresses of the left node and the right node according to the position index obtained in the step 5), updating the relevant information of the left sub-node and the right sub-node, respectively taking the left sub-node and the right sub-node as the current node, entering the step 2), and entering the KD tree construction cycle judgment again.

7) Forming an output array: and marking the current node as a leaf node, adding an array in the address stored by the leaf node to the tail of the output array, and recording the address of the array in the leaf node in the output array in the leaf node for the ray tracing module to use.

Referring to fig. 4, in the present invention, the scanning unit and the array index unit are used on the GPU to accelerate the process of copying the array in the current node into the left and right child nodes; when the child nodes can not be split again, a KD leaf node inner bounding box unified output method is used for outputting data in the nodes in a unified mode. Because the address field information of the data in the output array is only stored in the leaf nodes, and the arrays in the leaf nodes are continuous in the output array, the merged access of the data can be realized in the ray tracing process, and the ray tracing process is accelerated.

In the process of building the tree, the invention does not store the index of the data in the node any more, but stores the array into the video memory applied by the video memory pre-application module in a unified way, and only stores the address field of the array contained in the node in the video memory in the node. When the node is a leaf node, copying the array in the leaf node into the output array, and recording the address field of the array in the leaf node in the output array in the leaf node. The method can save the size of the video memory occupied by the KD tree in the GPU; the subdivision process can be stopped when the number of elements in the bounding box array in the node is larger than the subdivision termination threshold, and the condition of data overflow is avoided; so that the ray intersection process of the subsequent ray tracing module can combine and access the data contained in the leaf node, thereby accelerating the ray tracing process.

The invention outputs a KD tree with higher quality in a shorter time. And linearly storing the KD tree structure in a video memory, and taking the root node of the KD tree and the reorganized output array as two-way input of a ray tracking module for use in a lookup table mode.

A more detailed example is given below to further illustrate the invention:

the ray tracing acceleration system of the KD tree based on the GPU and the method for uniformly outputting bounding boxes in KD leaf nodes are the same as those in embodiments 1 to 5, see fig. 1:

A data preprocessing module: on a GPU, a video memory applied by a video memory pre-application module is used, bounding boxes are parallelly constructed for each triangular surface element of a triangular surface element model input into the module, a thread is distributed for each triangular surface element, and An Axisymmetric (AABB) bounding box of the triangular surface element is calculated, and is hereinafter referred to as a bounding box. And initializing the current incident angle, and outputting the bounding box model array and the current incident angle.

And a KD tree accelerated search generation module: the module is generated in parallel on a GPU, after a KD tree is constructed, the ray tracing process can be greatly accelerated by traversing on the KD tree, in the KD tree generating process, a video memory applied by a video memory pre-application module is dynamically used in an address access mode, the KD tree generation is accelerated, a bounding box model array output in a data preprocessing module is input to the module, two paths of synchronous output of the module are output, one path of the bounding box model array is the KD tree corresponding to the bounding box model array, the other path of the bounding box model array is a reorganized bounding box model array, and the two paths of synchronous output are directly used as the input of the ray tracing module.

A ray tracing module: the input of the module is respectively the two synchronous outputs of the sound ray tube bundle output by the virtual aperture surface generation module and the accelerated search generation module, the three synchronous inputs are synchronous inputs, ray tracing of the sound ray tube bundle is accelerated on the GPU in parallel, the ray tracing process is completed by traversing the KD tree, and the output of the module is reflection information of the sound ray tube bundle after traversing the KD tree.

Gordon integration module: and the Gordon integration module receives the reflection information output by the tracking module, performs parallel integration on the sound ray tube bundles on the GPU by using a Gordon integration formula, performs parallel summation on the integration results of all the sound ray tube bundles by using reduction operation, and the output of the Gordon integration module is the algebraic sum of the integration values of all the sound ray tube bundles at the current angle.

In the video memory pre-application module, the triangular surface element model in this example is a model of a submarine, and as shown in fig. 5, the size of the submarine model is as follows: 62m 7.5m 11m, the model is composed of 147427 nodes, 294850 triangular surface elements. Wherein the triangular surface element has a size of 2 λ _T (in this example,. Lambda. _T =5 cm); the pre-applied video memory space size is (D.P.)I)+2F+(P·I)

Wherein D is the estimated maximum tree depth, D takes 20,P as the expansion coefficient in this example, M takes 10,I as the size of the video memory occupied by the bounding box array in the root node, F is the size of the video memory occupied by an auxiliary array, the module allocates the video memory space with the size (P.I) for each layer of the KD tree, and applies for the video memory space with the size (P.I) for the output array; when a father node is divided, the bounding box model array of the father node is copied to the display and storage blocks belonging to the child nodes according to the division surface and the division axis, so that display and storage access conflicts can not occur in the parallel copying process; in addition, the setting of the expansion coefficient also ensures that the triangular surface element bounding box divided by the splitting surface can be safely placed into the two sub-nodes.

In the data preprocessing module, a thread is opened up for each triangular surface element according to the number of the triangular surface elements of the submarine model, and the thread is used for calculating the coordinate extreme value (X) of each triangular surface element in the three-dimensional space _max ,Y _max ,Z _max ) And (X) _min ,Y _min ,Z _min ) Wherein X is _max ,Y _max ,Z _max Maximum values of three vertexes of the triangular surface element on X, Y and Z axes, X _min ,Y _min ,Z _min The minimum values of three vertexes of the triangular surface element on the X axis, the Y axis and the Z axis are respectively. From the two coordinate extreme values of each triangular surface element, the size and the position of the compact bounding box of each triangular surface element in the space can be determined; initializing the current incident angle to 0 degrees.

The KD tree accelerated search generation module is a KD tree generated by utilizing a KD leaf node inner bounding box array unified output method, and sub-modules arranged in the KD tree accelerated search generation module are as follows:

the condition judging submodule firstly receives data output by the data preprocessing module, and judges and determines whether the current node data enters the node subdivision submodule or stops subdivision submodule according to the number Box _ Num of bounding boxes in the current node; and when the Box _ Num is less than or equal to the Stop subdivision threshold Stop _ Num, entering a Stop subdivision sub-module, wherein the Stop _ Num is 32. And if the subdivision stopping submodule is entered, stopping subdivision of the current node. And when the Box _ Num is smaller than or equal to the Stop subdivision threshold Stop _ Num, entering a node subdivision submodule. And if the node subdivision submodule is entered, the sub-node inner bounding box array generation submodule and the sub-node information updating submodule are entered in sequence, and finally the condition judgment submodule is returned to perform judgment again on the sub-node. And the condition judging submodule circularly receives data fed back by the node subdivision submodule, repeatedly and circularly judges, and constructs a KD tree structure layer by layer until all nodes enter the subdivision stopping submodule.

The input of the node subdivision surface calculation submodule is the output of the condition judgment submodule, the submodule calculates the subdivision surface position and the subdivision axis of the current node according to the received data, divides a father node into two child nodes, the child nodes contain partial information of the father node, and outputs the calculation result and the data received by the submodule to the bounding box array generation submodule in the child nodes.

The node subdivision surface calculation submodule is provided with two units, and the unit used for node subdivision is determined according to the surface element number N input into the submodule.

When the number of surface elements in the Node is larger than a threshold value Big _ Node, a large Node splitting unit is used for splitting the current Node, 256 is taken from the threshold value Big _ Node, the unit determines the position of a splitting surface and a splitting axis by using a splitting method, the splitting axis is a coordinate axis parallel to the longest edge of a bounding box in the Node, the position coordinate of the splitting surface is a midpoint coordinate value of the longest edge of the bounding box in the Node, and the process needs to solve the boundary of the bounding box in the Node, so the solving process can be accelerated by using reduction operation to solve the boundary value at a GPU end.

And when the number of surface elements in the Node is less than or equal to the threshold value Big _ Node, using a small Node splitting unit to split the current Node, and using an SAH algorithm to determine the position and the splitting axis of the splitting surface by the unit. In the SAH algorithm, an SAH cost function is needed to calculate the cost value SAH of each candidate subdivision surface _cost And selecting the cost value SAH _cost The smallest candidate splitting plane is taken as the optimal splitting plane.

The SAH cost function is:

wherein, c _traversal Represents the cost of the ray traversing to the current node, SA (V) represents the surface area of the bounding box within the current node, SA (V) _L ) Representing the surface area of the bounding box within the left child node under the current candidate subdivision plane, N _L Representing the number of bins divided into left child nodes under the current candidate subdivision plane, SA (V) _R ) Representing the surface area of the bounding box within the right child node, N _R Representing the number of bins divided into right child nodes, C _hit Representing the cost value of the ray intersecting the triangular bin.

As can be seen from the formula, the SAH value calculation of each candidate splitting surface is performed independently, so the SAH value calculation can be accelerated by using the GPU, a block is allocated to each candidate splitting surface to calculate the SAH value of the candidate splitting surface, and the minimum SAH value and the corresponding candidate splitting surface are found by a reduction algorithm at the GPU end. And after the splitting surface is determined, the coordinate axis vertical to the splitting surface is the splitting axis.

The input of the bounding box array generation submodule in the sub-node is the output of the node subdivision surface calculation submodule, the bounding box array in the current node is divided into two sub-arrays in the GPU according to the input, and the two generated sub-arrays are output in parallel to the two sub-node information updating submodules respectively; the unit division arranged in the sub-module for generating the bounding box array in the sub-node is as follows:

a scanning unit: the input of the unit is the output of the node subdivision submodule, the index of the bounding box array elements in the two sub nodes in the current node bounding box array is calculated in parallel by using scanning operation on a GPU, and the index is stored in the position corresponding to the auxiliary array applied by the video memory pre-application module.

And a child node information updating submodule: the sub-module is two parallel sub-modules with the same structure, the respective input is a sub-array output by a sub-node inner bounding box array generation sub-module, the sub-node of the current node inherits the relevant information of the current node firstly, then the relevant information of the sub-node is updated according to the received sub-array, and the sub-node is output to the condition judgment sub-module as the current node and the sub-array for re-judgment; and entering the KD tree construction loop process again.

A subdivision termination submodule: the input of the submodule is data output by the conditional judgment submodule, the submodule terminates subdivision of the bounding box array in the current node, outputs the subarray contained in the current node to an output array, and waits for all nodes to stop subdivision and then outputs the subarray at one time.

And the KD tree accelerated search generation module finally outputs a constructed KD tree and the reorganized bounding box array. The reorganized bounding box array is reorganized and output by using a KD leaf node inner bounding box array uniform output method, and the method comprises the following steps:

1) Inputting data and initializing: the method comprises the steps of inputting a triangular surface element bounding box model array and two auxiliary arrays, wherein the length of each auxiliary array is equal to that of a bounding box array on each layer of the KD tree, the auxiliary arrays are used for assisting in dividing the bounding box arrays in a current node, and a root node is initialized to serve as the current node.

3) Calculating the optimal splitting surface position: calculating the optimal splitting surface and the optimal splitting axis by using a midsplit method or an SAH method.

4) And (3) auxiliary array assignment: the method comprises the following steps that two auxiliary arrays are utilized for auxiliary division, the auxiliary arrays are cleared, then a bounding box located on the left side of an optimal subdivision surface is located according to the optimal subdivision surface position and a subdivision axis, and the corresponding position is marked as 1 in the first auxiliary array; bounding boxes to the right of the optimal splitting plane, with the corresponding position marked as 1 in the second auxiliary array; the bounding box, sliced by the best-split plane, has the corresponding position marked 1 in both auxiliary arrays.

8) And (3) stopping the subdivision process: and when all nodes terminate the subdivision, obtaining a constructed high-quality KD tree and a reorganized bounding box model array.

In the process of building the tree, indexes of elements in the array are not stored in the nodes any more, instead, the data are uniformly stored in the video memory pre-applied by the video memory pre-application module for the KD tree nodes of the layer, and only the starting point and the end point of the address of the data contained in the video memory are stored in the nodes. When the node is a leaf node, copying data in the video memory pre-applied for the KD tree node of the layer by the video memory pre-application module into an output array, and storing the address of the data in the output array in the leaf node. Therefore, the size of the video memory occupied by the KD tree in the GPU can be saved; when the number of bounding boxes in the current node is larger than the subdivision termination threshold value, the subdivision process is stopped in advance, and the condition of array overflow cannot occur; the subsequent ray tracing module can merge and access the data contained in the leaf node, and the ray tracing process is accelerated.

The process of generating the bounding boxes in the step 1) is completed in parallel on the GPU, a thread is opened up on the GPU for each triangular surface element to calculate the respective bounding box, and the bounding box array is arranged and organized according to the sequence of the original triangular surface element array.

If the number of bounding boxes in the current node is less than or equal to Stop _ Num in the step 2), entering a step 7) to Stop the subdivision of the current node; and if the number of the bounding boxes in the current node is larger than Stop _ Num, the step 3) is carried out, and the current node is used as a father node to be divided into left and right child nodes.

If the number of the surface elements in the current node is greater than the threshold value N (in this example, N is 256), the current node is considered to be a large node, a subdivision surface position and a subdivision axis are determined by using a subdivision method, the subdivision axis is a coordinate axis parallel to the longest edge of a bounding box in the node, and the position coordinate of the subdivision surface is a midpoint coordinate value of the longest edge of the bounding box in the node.

And if the number of surface elements in the current node is less than the threshold value N, determining the position of the splitting surface and the splitting axis by using an SAH algorithm according to the fact that the current node is a small node. In SAH _cost In the SAH algorithm, an SAH cost function is needed to calculate the cost value of each candidate subdivision surface, and the cost value SAH is selected _cost The smallest candidate splitting plane is taken as the splitting plane.

The function to calculate the SAH cost is:

wherein, c _traversal Representing the cost of the ray traversing to the current node, SA (V) representing the bounding in the current nodeSurface area of the cassette, SA (V) _L ) Representing the surface area of the bounding box within the left child node under the current candidate subdivision plane, N _L Representing the number of bins divided into left child nodes under the current candidate subdivision plane, SA (V) _R ) Representing the surface area of the bounding box within the right child node, N _R Representing the number of bins divided into right child nodes, C _hit Representing the cost value of the ray intersecting the triangular bin.

As can be seen from the formula, the SAH value calculation of each candidate subdivision surface is performed independently, so the SAH value calculation can be accelerated by using the GPU, a thread is allocated to each candidate subdivision surface to calculate the SAH value of the candidate subdivision surface, and the minimum SAH value and the corresponding candidate subdivision surface are found by a reduction algorithm at the GPU end.

And after the splitting surface is determined, the coordinate axis vertical to the splitting surface is the splitting axis.

Step 4), the two auxiliary arrays are two arrays with equal length which are pre-applied in a video memory pre-application module, and the length of each auxiliary array is P times of the number of triangular surface elements in the triangular surface element model; therefore, in the layer-by-layer construction process of the KD tree, the nodes of each layer can be used without conflict only by accessing the corresponding positions of the auxiliary arrays through the addresses; marking the corresponding position as 1 in the first auxiliary array according to the best splitting surface + position and the splitting axis and the bounding box positioned on the left side of the best splitting surface; bounding boxes to the right of the optimal splitting plane, with the corresponding position marked as 1 in the second auxiliary array; marking the corresponding positions of the bounding boxes divided by the optimal subdivision surface as 1 in the two auxiliary arrays; through the auxiliary array, the child node to which the bounding box at the corresponding position in the bounding box array belongs can be known;

in the step 5), scanning operation is carried out on the auxiliary array, the scanning operation changes the original auxiliary array into a new auxiliary array, and a new element at each position in the new auxiliary array is the sum of all elements from the position with the subscript of 0 to the position in the original auxiliary array, so that the corresponding subscript when the bounding box of the current node is copied to the bounding box arrays in the left and right child nodes can be obtained by calculating the new auxiliary array, and the data can be conveniently copied in parallel;

in step 6), according to the index subscript stored in the new auxiliary array calculated in step 5), the bounding box arrays in the current node are respectively copied to the corresponding index positions of the left and right child nodes, the current node is divided into the left and right child nodes, the related information of the left and right child nodes is updated according to the bounding box arrays divided into the left and right child nodes, the left and right child nodes are respectively used as the current node to enter step 2), the KD tree construction cycle judgment is carried out again until all the current nodes enter step 7), and the KD tree construction cycle judgment is carried out to become leaf nodes.

And 7) in the step 7), stopping subdivision of the current node, marking the current node as a leaf node, adding data in the address stored in the leaf node to the tail of the output array, recording the address field of the data in the output array in the leaf node, and entering a step 8) when all the current nodes are marked as the leaf nodes to finish construction of the KD tree.

In step 8), after all nodes are cut off, a constructed high-quality KD tree and an output array formed by splicing bounding box arrays in all leaf nodes are obtained, the structure of the KD tree and the output array are stored in a video memory and used as a lookup table in a ray tracing module. And updating the clue value of the child node, the boundary value of the bounding box in the child node and the surface element contained in the child node, thereby realizing the uniform output of the bounding box array in the leaf node.

In the virtual aperture surface generation module, the virtual aperture surface is the projection of benchmark on the equiphase surface, the virtual aperture surface consists of a large number of rectangular sound ray tube bundles with equal apertures, and the apertures lambda of the sound ray tube bundles _s In this example λ _s Take 0.0625

λ _T The size can be set according to actual needs, the virtual aperture plane is generated in parallel at the GPU end, and the specific generation steps are as follows:

firstly, calculating a normal vector, a dimension length L and a dimension width W of a virtual aperture surface according to a current incident angle, a distance between the current incident angle and a model and an input bounding box model array;

according to the virtual aperture planeSize L x W of and size λ of the sound ray tube bundle _s Calculating the number of the acoustic line bundles

Wherein the division is

And

only integer bits are reserved; tunneling S for GPU-kernel _num And each thread generates a sound ray tube bundle, outputs the generated sound ray tube bundles to the ray tracing module, stores the generated sound ray tube bundles into the video memory of the corresponding address according to the index of the thread, and waits for the reading of the ray tracing module.

In the ray tracing module, the input of the module is respectively two synchronous outputs of a sound ray tube bundle output by the virtual aperture surface generation module and an accelerated search generation module, the three inputs are synchronous inputs, a thread is opened up for each ray of each sound ray tube bundle on a GPU, a ray tracing kernel function is executed in parallel, the ray tracing process is that a KD tree is traversed in a form of traversing a binary tree, a physical process that light rays irradiate on a triangular surface element model and generate reflection is simulated, and the output of the module is the intersection point coordinates of each ray and the triangular surface element model in the virtual aperture surface and the transmission path of the ray.

In the process of ray tracing, rays are mutually independent, a thread is distributed to each ray on the GPU, the ray tracing module is independently executed, the tracing result is stored in a corresponding video memory address, and the use of subsequent integration is waited. In serial codes, ray tracing can be realized by using a recursive method, but a GPU does not support recursion, so that a maximum tracing time Trace _ Num is set, and when a ray tracing function executes the Trace _ Num times, a current ray automatically stops tracing to prevent the ray from falling into a dead loop.

In the ray tracing module, whether ray splitting occurs or not needs to be judged for the sound ray tube bundle, and the following processes are specifically adopted:

the principle of judging whether the sound ray tube bundle meets the ray splitting condition is as follows: whether the four vertexes of the rectangle of the sound ray tube bundle irradiate on the same triangular surface element. When the four vertexes irradiate the same triangular surface element, ray splitting is not needed; when the four vertexes do not irradiate the same triangular surface element, the sound ray tube bundle is split, and the sound ray tube bundle needs to be uniformly quartered, namely, one sound ray tube bundle is equally divided into four sub-sound ray tube bundles. And re-tracking the sub-sound ray tube bundles, if the splitting condition is still met after the sound ray tube bundles are split for three times, discarding the sub-sound ray tube bundles still meeting the splitting condition, and not performing ray tracing and subsequent calculation. For the sound ray tube bundle and the sub sound ray tube bundle which are not split any more, the effective sound ray tube bundle is judged according to the reflection information. The principle of judging whether the sound ray tube bundle is effective or not is as follows: whether rays emitted by four vertexes of the sound ray tube bundle irradiate on the same triangular surface element or not. If yes, marking as an effective sound ray tube bundle and returning reflection information; if not, the acoustic line bundle is marked as invalid, and no reflection information is returned.

The Gordon integration module receives the relevant information output by the tracking module, uses a Gordon integration formula to perform parallel integration on the sound ray tube bundles on the GPU, uses reduction operation to perform parallel summation on the integration results of all the sound ray tube bundles, and the output of the Gordon integration module is the algebraic sum of the integral values of all the sound ray tube bundles under the current angle, and the specific process is as follows:

the invention distributes a thread for each effective sound ray tube bundle to carry out integral operation once, stores the integral result as an intermediate value in the video memory, and uses reduction operation to carry out summation operation on the integral result after all effective sound ray tube bundles are integrated to obtain the TS value under the current incident angle. The integral formula of the sound field of the time domain physical acoustics applied to the bouncing ray method is as follows;

wherein i, j represents the ith row and the jth column of the sound ray tube bundle on the virtual aperture plane, K is wave number K =2 pi/lambda, K =2K,Δa' _n vector, Δ ρ' _n Vector delta rho 'representing any point in integration region to reference point' _n ＝(a' _n +a' _n+1 ) /2,T represents r' _i,j The length of the projection on the integration area,

R _M is a source point position vector, r' _i, x represents the unit vector of the receiving point to the reference point

r' _i,j ＝u'x'+v'y'+w'z'

Wherein, X ', Y ', Z ' is a three-dimensional plane coordinate system { M, X ', Y ', Z ' } obtained by projection transformation of the original space coordinate system { O, X, Y, Z }, and a conversion matrix Q from the three-dimensional plane coordinate system { M, X ', Y ', Z ' } to the original coordinate system { O, X, Y, Z }

Wherein M represents a quadrilateral center point under an original coordinate system and is simultaneously used as a coordinate origin under a three-dimensional plane coordinate system, X 'represents a unit vector from the point M to any vertex of the quadrilateral, and Z' represents a unit normal vector n of a quadrilateral plane ₀ ，Y'＝X'×Z'。

Multiplying any point in the original coordinate system by the inverse matrix Q of the conversion matrix Q _inv And completing the coordinate projection transformation of the point from the original coordinate system to the three-dimensional plane coordinate system.

And substituting the reflection parameters of all rays into the formula, and calculating the algebraic sum of the integral results of all triangular surface elements according to a high-frequency approximation principle to obtain the TS value of the sound field intensity at the current incident angle.

In the interface display module, the integral value output by the Gordon integration module at the current incident angle, the benchmark model, the number of triangular surface elements in the triangular surface element model, the size of the virtual aperture surface and the number of rays of the virtual aperture surface are displayed on the interface, so that information can be conveniently checked and debugged.

The coordinate origin of the benchmark model is taken as a vertex, a connecting line of the sound source position and the coordinate origin is taken as an axis, the coordinate origin is rotated anticlockwise by theta (the theta is 0.5) degrees, and the current incident angle is changed; clockwise or counterclockwise rotation; the angle theta of each rotation can be set according to needs, and the integral calculation of the benchmark0-180 degrees is finally completed.

The technical effects of the present invention are explained again by simulation as follows:

example 7:

the ray tracing acceleration system of the KD-tree based on the GPU and the uniform output method of the bounding box arrays in the KD-leaf nodes are the same as those in embodiments 1 to 6,

a ray tracking acceleration system of a KD tree based on a GPU is used for simulating the propagation process of a sound ray tube bundle on an underwater target, the tracking result is subjected to integral summation, and the shielded surface element can also participate in echo intensity calculation, so that a more accurate simulation forecasting result can be obtained compared with the traditional method.

FIG. 6 is the result of a simulation of FIG. 5 using the present invention, see FIG. 6, where the abscissa is the angle of incidence and the ordinate is the sound field intensity value (in decibels) at the current angle of incidence; as can be seen from fig. 6, the change of the sound field intensity value from 0 to 180 degrees with the incident angle is a process of increasing and then decreasing. At the temperature of 0 degree and 180 degrees, the cross sectional areas of the head part and the tail part of the submarine are smaller, so that the generated virtual aperture surface is smaller, and the returned sound wave strength value is lower; with the increase of the incident angle, the sound source is gradually positioned on the side face of the submarine body, the virtual aperture surface is larger at the moment, the number of generated sound ray bundles is large, the sound field intensity value is larger, and the angle of the sound field intensity reaching the peak value is not 90 degrees but is about 110 degrees due to the reflection relation of the submarine surface.

Compared with the traditional plate element and the improved plate element calculation method, the precision is reduced due to the discarding strategy adopted by the shielded surface element, the method provided by the invention obtains the sound waves reflected by the shielded surface element by using the ray tracing technology, and the shielded surface element can also participate in calculation for the concave surface object with more shielding surfaces, so that the system provided by the invention can obtain higher calculation precision.

Fig. 7 is related data (GPU model is Nvidia P100) generated by using the simulation of fig. 5 on ten GPU computing cards according to the present invention, see fig. 7, where the abscissa is the incident angle, the ordinate on the left side is the consumed time (in milliseconds) of the ray tracing module at the current incident angle, the ordinate on the right side is the number of sound ray bundles generated by the virtual aperture surface generating module at the current angle, and the dotted line in fig. 7 is the number of sound ray bundles generated by the virtual aperture surface generating module at different incident angles when the ray tracing module of the present invention is used for calculation at different incident angles. It can be seen from fig. 7 that the time consumption of the ray tracing module and the number of the sound ray tube bundles generated by the virtual aperture plane generation module in the invention are in a linear relationship in a general trend, which shows that the time consumption of the ray tracing module is only related to the number of the sound ray tube bundles and is not related to the incident angle, and thus the invention has the advantages of higher speed and better quality when constructing the KD tree. Compared with the traditional process that the plate element needs to extract the non-shielding surface element firstly and then perform ray tracing, the method omits the extraction process of the non-shielding surface element and accelerates the ray tracing process, so that the speed of simulation forecasting of the target sound field intensity is greatly improved.

In short, the ray tracing acceleration system of the KD tree based on the GPU and the method for uniformly outputting the bounding box array in the KD leaf node solve the problem of quickly forecasting the sound field intensity of the underwater complex target, and the system sequentially comprises the following components in the signal connection direction: the device comprises a video memory pre-application module, a data preprocessing module, a KD tree accelerated search generation module, a virtual aperture surface generation module, a ray tracing module, an integration module and an interface display module. The invention adds a video memory pre-application module and accelerates the node subdivision process in a KD tree accelerated search generation module, and also provides a KD leaf node inner bounding box array uniform output method, which comprises the following steps: inputting data and initializing, terminating subdivision judgment, calculating the position of an optimal subdivision surface, assigning values of auxiliary arrays, scanning the auxiliary arrays, subdividing current nodes to form output arrays, and terminating subdivision. The method can calculate the sound field intensity of the given triangular surface element model of any model. The method uses the ray tracing acceleration system of the KD tree based on the GPU to simulate the propagation process of sound rays, does not need to judge the occlusion surface elements, has the characteristics of high calculation speed, high precision and strong adaptability to different targets, and can be effectively used for rapid forecasting simulation of the underwater target intensity.

Claims

1. The utility model provides a ray tracing acceleration system of KD tree based on GPU, includes and is connected with data preprocessing module, KD tree generation module, virtual aperture face generation module, ray tracing module, integral module, interface display module according to the signal connection direction, its characterized in that is equipped with the video memory in advance before data preprocessing module and applies for the module, KD tree generation module is KD tree and searches for the generation module with higher speed, and each module interconnect constitutes jointly to realize ray tracing with KD tree is searched for with higher speed based on GPU, divides each module as follows:

the video memory pre-application module: the input of the module is the number of the triangular surface elements of the triangular surface element model, the size of the video memory occupied by the triangular surface element bounding box model is calculated in the module according to the number of the triangular surface elements, the CPU end applies for the video memory space on the GPU in advance at one time, the video memory space is at least enough to generate a KD tree without conflict, and the output of the module is the initial address of the pre-applied video memory;

a data preprocessing module: on a GPU, using a video memory applied by a video memory pre-application module, parallelly constructing bounding boxes for each triangular surface element of a triangular surface element model input into the module, initializing the current incident angle, and outputting a bounding box model array and the current incident angle;

and a KD tree accelerated search generation module: the module is generated in parallel on a GPU, the ray tracing process is accelerated on a KD tree, a video memory applied by a video memory pre-application module is dynamically used in an address access mode, the generation of the KD tree is accelerated, a bounding box model output in a data preprocessing module is input to the module, two paths of synchronous output of the module are output, one path of the synchronous output is the KD tree generated corresponding to the bounding box model array, the other path of the synchronous output is a reorganized bounding box model array, and the two paths of synchronous output are directly used as the input of the ray tracing module;

a virtual aperture surface generation module: the input of the module is a bounding box model array and a current incident angle output by the data preprocessing module, the boundary value of the bounding box model is calculated in the module, a sound ray tube bundle is generated on the GPU in parallel, and the output of the module is a plurality of sound ray tube bundles with the same size;

a ray tracing module: the input of the module is respectively two paths of synchronous output of the sound ray tube bundle output by the virtual aperture surface unit generation module and the accelerated search generation module, the three paths are synchronous input, ray tracing of the sound ray tube bundle is accelerated on a GPU in parallel, the ray tracing process is completed by traversing a KD tree, and the output of the module is reflection information of the sound ray tube bundle after traversing the KD tree model;

gordon integration module: the Gordon integration module receives the reflection information output by the tracking module, performs parallel integration on the sound ray tube bundles on the GPU by using a Gordon integration formula, performs parallel summation on the integration results of all the sound ray tube bundles by using reduction operation, and outputs the Gordon integration module as the algebraic sum of the integration values of all the sound ray tube bundles at the current angle;

2. The system according to claim 1, wherein the size of the pre-requested memory space is (D-P-I) +2F + (P-I)

D is the estimated maximum tree depth, P is an expansion coefficient, I is the size of the video memory occupied by the bounding box array in the root node, F is the size of the video memory occupied by an auxiliary array, the module allocates the video memory space with the size of (P & I) for each layer of the KD tree, and applies for the video memory space with the size of (P & I) for the output array.

3. The ray tracing acceleration system of a KD-tree based on a GPU according to claim 1, wherein the KD-tree accelerated search generation module generates the KD-tree using a KD-tree in-leaf node bounding box array uniform output method, and sub-modules provided in the KD-tree accelerated search generation module are as follows:

a condition discrimination submodule: the submodule firstly receives data output by the data preprocessing module, judges and determines whether the current node data enters the node subdivision submodule or the subdivision termination submodule according to the number of elements, and stops subdivision of the current node if the current node data enters the subdivision termination submodule; if the node subdivision submodule is entered, then the sub-node array generation submodule and the sub-node information updating submodule are entered in sequence, and finally the condition judgment submodule is returned to judge the sub-nodes again; repeatedly and circularly judging, and constructing a KD tree structure layer by layer until all nodes enter a subdivision stopping submodule;

a node subdivision surface calculation submodule: the input of the submodule is the output of a condition judging submodule, the submodule calculates the position of a subdivision surface and a subdivision axis of a current node according to received data, divides a father node into two child nodes, and outputs a calculation result and the data received by the submodule to a child node array generating submodule;

the child node array generation submodule: the input of the sub-module is the output of the node subdivision surface calculation sub-module, the current node array is divided into two sub-arrays in the GPU according to the input, and the two sub-arrays generated by parallel output are respectively output to the two sub-node information updating sub-modules;

and a child node information updating submodule: the sub-module is two parallel sub-modules with the same structure, the respective input is a sub-node array to generate a sub-array output by the sub-module, the sub-node of the current node updates the relevant information of the sub-node according to the received sub-array, and the sub-node is output to the condition judgment sub-module as the current node and the sub-array for re-judgment; entering a KD tree construction cycle process again;

a subdivision termination submodule: the input of the submodule is data output by the conditional judgment submodule, the submodule terminates subdivision of current node data, outputs the sub-array contained in the current node to an output array, and waits for all nodes to stop subdivision and then outputs the sub-array at one time.

4. A ray tracing acceleration system for a GPU-based KD-tree according to claim 3, wherein said child node array generation submodule is executed on the GPU, the elements provided within the child node array generation submodule are as follows:

a scanning unit: the input of the unit is the output of the node subdivision submodule, and the index of two sub-node array elements in the current node array is calculated in parallel by using scanning operation on a GPU;

array index unit: the input of the unit is an index output by the scanning unit, data in an address stored by a current node is respectively copied into addresses stored by two sub-nodes according to the index, an array contained in the address stored by the current node is divided into the addresses stored by the two sub-nodes, and then the two sub-nodes are respectively output to the condition judgment sub-module for re-judgment.

5. A KD tree in-leaf bounding box array uniform output method is characterized in that a KD tree accelerated search generation module generates a KD tree by using the KD tree in-leaf bounding box array uniform output method, and the method comprises the following steps:

1) Inputting data and initializing: inputting a triangular surface element bounding box model and two auxiliary arrays, wherein the length of the auxiliary arrays is equal to that of the bounding box array of each layer of the KD tree, and the auxiliary arrays are used for dividing auxiliary data and initializing a root node as a current node;

2) And (3) stopping subdivision judgment: judging whether the current node meets the subdivision termination condition or not, judging whether the number of bounding boxes contained in the address position of the current node is smaller than the subdivision termination threshold or not, if so, executing a step 7), and if not, executing a step 3);

3) Calculating the optimal splitting surface position: calculating the optimal splitting surface and the optimal splitting axis by using a midsplit method or an SAH method;

5) Scanning the auxiliary array: calculating initial storage addresses when the bounding box array in the current node is copied to the left child node and the right child node, and obtaining position indexes of each data in the corresponding addresses of the left child node and the right child node;

6) Dividing the current nodes: the current node is divided into a left child node and a right child node: copying the bounding box array in the current node into corresponding positions of the storage addresses of the left node and the right node according to the position index obtained in the step 5), updating the relevant information of the left child node and the right child node, respectively taking the left child node and the right child node as the current node, entering the step 2), and entering the KD tree construction cycle judgment again;

7) Forming an output array: marking the current node as a leaf node, copying a bounding box array in an address stored by the leaf node into an output array, and recording the address of the bounding box array in the output array in the leaf node for the ray tracing module to use;