CN115437782A

CN115437782A - Bellhop3D parallel implementation method of underwater three-dimensional sound field model based on domestic many-core supercomputing

Info

Publication number: CN115437782A
Application number: CN202210781061.0A
Authority: CN
Inventors: 张成峰; 魏志强; 贾东宁; 薛家伟; 许佳立; 韩恒敏; 桂琳; 张澜
Original assignee: Qingdao National Laboratory for Marine Science and Technology Development Center
Current assignee: Qingdao National Laboratory for Marine Science and Technology Development Center
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-12-06

Abstract

The application discloses an underwater three-dimensional sound field model Bellhop3D parallel implementation method based on domestic many-core supercomputing. The Bellhop3D parallel implementation method of the domestic many-core-supercomputing-based underwater three-dimensional sound field model comprises the following steps: in a global communication domain, each main core acquires hydrological data; in the global communication domain, each main core calls a slave core connected with the main core to calculate the ocean sound velocity information; partitioning each main core in the global communication domain to form a plurality of sub-communication domains; dividing a sound source calculation task for each sub-communication domain by adopting a sound source task division algorithm; and each sub-communication domain calculates the three-dimensional sound ray propagation track according to the distributed sound source calculation task. According to the method and the device, the division of the sub-communication domains is carried out, so that the calculation tasks are distributed by the multi-mode communication algorithm based on the division of the communication domains, the load balancing calculation and the distributed reading and storage of data are realized, the I/O time is shortened, and the parallel efficiency is improved.

Description

Bellhop3D parallel implementation method of underwater three-dimensional sound field model based on domestic many-core supercomputing

Technical Field

The application relates to the technical field of underwater sound propagation, in particular to a Bellhop3D parallel implementation method of an underwater three-dimensional sound field model based on domestic many-core supercomputing.

Background

The underwater sound propagation modeling theory is one of basic contents of underwater environment research, and has important significance on design and use of modern sonar and deployment of water surface and underwater battle. The conventional sound field propagation models mainly comprise a ray model, a simple normal wave model, a fast field model, a parabolic equation model and the like. Among them, the Bellhop model proposed by Porter et al effectively solves the problem of inaccurate sound field energy in the sound shadow region and the convergence region in the conventional ray method, greatly improves the calculation accuracy and the real-time performance, and has the advantages of high calculation speed, clear physical significance, suitability for parallelization processing and the like.

In recent years, with the deep development of problems of oceanography-acoustic coupling modeling, broadband underwater sound propagation, three-dimensional sound field refinement and simulation and the like, higher requirements are provided for the calculation efficiency of an underwater sound field, the existing common computer cannot meet the actual requirements, and the research on the parallel implementation method of the underwater sound propagation model on the super-computation platform has important practical significance and application prospect. In the United states, high-performance computers are introduced into the field of computing ocean acoustics at the rate of eighties of the last century, and a series of parallel algorithm researches of underwater sound propagation models are also successively developed in units such as national acoustic research institute of Chinese academy of sciences and academy of naval submarines.

The deep research on the problems of broadband underwater sound transmission, earth sound inversion, matching field inversion and positioning, underwater sound field rapid prediction, underwater combat environment simulation and the like has higher requirements on the spatial resolution, speed and accuracy of underwater sound transmission modeling simulation calculation. High-precision spatial resolution requires more memory to store data, the sound field calculation amount multiple is increased, and the calculation time is obviously increased. This requires that methods must be found to increase the speed of computation while developing new three-dimensional underwater acoustic models or improving existing models. Under the condition of meeting the requirement of practical application on the calculation precision, the improvement of the numerical calculation speed of the sound field is important in the field of underwater sound research.

Accordingly, a solution is desired to solve or at least mitigate the above-mentioned deficiencies of the prior art.

Disclosure of Invention

The invention aims to provide a Bellhop3D parallel implementation method of an underwater three-dimensional sound field model based on domestic many-core supercomputing, which is used for at least solving one technical problem.

One aspect of the invention provides a Bellhop3D parallel implementation method for an underwater three-dimensional sound field model based on domestic many-core supercomputing, and the Bellhop3D parallel implementation method for the underwater three-dimensional sound field model based on domestic many-core supercomputing comprises the following steps:

in a global communication domain, each main core acquires hydrological data;

in the global communication domain, each main core calls a slave core connected with the main core to calculate the ocean sound velocity information;

partitioning each main core in the global communication domain to form a plurality of sub-communication domains;

dividing a sound source calculation task for each sub-communication domain by adopting a sound source task division algorithm;

and each sub-communication domain calculates the three-dimensional sound ray propagation track according to the distributed sound source calculation task.

Optionally, in the global communication domain, the invoking, by each master core, slave cores connected thereto to calculate the marine sound speed information by each master core includes:

each master core calls each slave core connected with the master core to calculate the ocean sound velocity by using the SIMD vectorization technology.

Optionally, the partitioning each primary core in the global communication domain, so as to form a plurality of sub-communication domains, includes:

acquiring the number of primary cores which can be used for calculation;

dividing each main core into a plurality of sub-communication domains according to the number of the main cores capable of being used for calculation, distributing calculation tasks to each sub-communication domain, and distributing new process numbers to processes in each sub-communication domain.

Optionally, the dividing the sound source calculation task for each sub-communication domain by using the sound source task division algorithm includes:

acquiring sound source data to be calculated, wherein each sound source data comprises a sound ray propagation distance, a sound source frequency, a sound source depth and the like;

respectively calculating the time weight of each sound source according to the sound ray propagation distance, the sound source frequency and the sound source depth, wherein the time weight of one sound source can be calculated by one sound source data;

calculating an average time weight of the sound sources according to the time weight of each sound source and the number of sub-communication domains which can be allocated;

and dividing a sound source calculation task for each sub-communication domain according to the time weight of each sound source and the average time weight of the sound source.

Optionally, after dividing the sound source calculation task for each sub-communication domain, the dividing the sound source calculation task for each sub-communication domain by using the sound source task division algorithm further includes:

each primary core in each sub-communication domain is assigned a sound ray computation task.

Optionally, the allocating a sound ray computation task to each primary core in each sub-communication domain includes:

respectively calculating the time weight of each sound ray in each sound source;

acquiring the average time weight of the sound rays in the sub-communication domain according to the time weight of each sound ray and the number of the main cores in the sub-communication domain which can be distributed;

and dividing a sound ray calculation task for each main core in each sub-communication domain according to the time weight of each sound ray in each sound source and the mean time weight of the sound rays in the sub-communication domain.

Optionally, the dividing the sound source calculation task for each sub-communication domain according to the time weight of each sound source and the average time weight of the sound source includes:

step 101: arranging the time weights of all sound sources in a reverse order to form a sound source time weight sequence;

step 102: judging the relation between the first value of the sound source time weight sequence and the average time weight, if the first value of the sound source time weight sequence is larger than or equal to the average time weight, performing step 103, and if the first value of the sound source time weight sequence is smaller than the average time weight, performing step 104;

step 103: allocating the sound source data corresponding to the sound source time weight which is more than or equal to the average time weight to a sub-communication domain, moving the allocated sound source time weight out of the sound source time weight sequence, updating the sound source time weight sequence, and performing step 105;

step 104: if the first value of the sound source time weight sequence is smaller than the average time weight, allocating sound source data corresponding to at least two sound source time weights to a sub-communication domain, shifting the allocated sound source time weights out of the sound source time weight sequence, updating the sound source time weight sequence, and performing step 105;

step 105: and re-acquiring the average time weight of the new sound source according to the updated sound source time weight sequence, and repeating the step 102 according to the updated sound source time weight sequence and the average time weight of the new sound source until the sound source data distribution is completed.

Optionally, the step 4 includes:

step 1041: searching the time weight of the sound source closest to the average time weight in the time weight sequence of the sound source, assigning the sound source to a sub-communication domain and moving out of the time weight sequence of the sound source, wherein the average time weight in step 1041 is called a first time weight;

step 1042: after removing the sound source time weight sequence from the sound source time weight sequence which is distributed, re-acquiring the average time weight of a new sound source according to the updated sound source time weight sequence, and subtracting the sound source time weight which is closest to the average time weight from the first time weight so as to acquire the residual time weight;

step 1043: after the remaining time weight is obtained, the sound source time weight closest to the average time weight is continuously searched in the sound source time weight sequence, and the step 1041 is repeated according to the updated sound source time weight sequence and the new average time weight of the sound source until the remaining time weight is 0.

Optionally, dividing the sound ray calculation task for each primary core according to the time weight of each sound ray in each sound source and the average time weight of the sound rays in the sub-communication domain includes:

step 201: arranging the time weights of all sound rays to form a sound ray time weight sequence;

step 202: judging the relation between the first value of the sound ray time weight sequence and the average time weight, if the first value of the sound ray time weight sequence is more than or equal to the average time weight, performing step 203, and if the first value of the sound ray time weight sequence is less than the average time weight, performing step 204;

step 203: separately allocating the sound ray data corresponding to the sound ray time weight which is greater than or equal to the average time weight to a sub-communication domain, shifting the sound ray time weight which is already allocated out of the sound ray time weight sequence, updating the sound ray time weight sequence, and performing step 205;

step 204: if the first value of the sound ray time weight sequence is smaller than the average time weight, assigning the sound ray data corresponding to at least two sound ray time weights to a sub-communication domain, shifting the assigned sound ray time weights out of the sound ray time weight sequence, updating the sound ray time weight sequence, and performing step 205;

step 205: and (3) re-acquiring the average time weight of the new sound ray according to the updated sound ray time weight sequence, and repeating the step (202) according to the updated sound ray time weight sequence and the average time weight of the new sound ray until the sound ray data distribution is completed.

Optionally, the calculating, by each sub-communication domain according to the sound source calculation task allocated thereto, a three-dimensional sound ray propagation trajectory includes:

for each master core in each sub-communication domain, the following steps are performed:

loading a calculation task, so that ocean sound velocity, geological acoustic environment data and elevation data of a corresponding area required by calculation are loaded into each slave core corresponding to the master core;

and reading the ocean sound velocity, the geological acoustic environment data and the elevation data of the corresponding area stored by each slave core through a memory sharing algorithm in the calculation process of each slave core.

Advantageous effects

According to the Bellhop3D parallel implementation method of the domestic many-core-over-computation-based underwater three-dimensional sound field model, the sub-communication domain division is carried out, so that the multi-mode communication algorithm based on the communication domain division is used for distributing the calculation tasks, the load balancing calculation is realized, the distributed reading and storage of data are realized, the I/O time is shortened, and the parallel efficiency is improved.

Drawings

Fig. 1 is a schematic flow diagram of a Bellhop3D parallel implementation method of an underwater three-dimensional sound field model based on domestic many-core supercomputing according to an embodiment of the present application.

Fig. 2 is a schematic view of a sound velocity distribution calculated by the method of the present application.

Fig. 3 is a schematic diagram illustrating the principle of sub-communication domain division by the method of the present application.

Fig. 4 is a schematic diagram illustrating a partitioning and corresponding manner of environmental data in the slave core array memory sharing method according to the present application;

fig. 5 is a schematic diagram illustrating a usage principle of the slave core array memory sharing method according to the present application.

FIG. 6 is a comparison graph of the different parallel scale runtime of the present application and other prior art.

Fig. 7 is a graph comparing the parallel efficiency of the present application with that of other prior art at different parallel scales.

FIG. 8 is a schematic illustration of a global seafloor geological type distribution.

FIG. 9 is a graphical illustration of global marine elevation data.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout the drawings. The described embodiments are a subset of the embodiments in the present application and not all embodiments in the present application. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present application and should not be construed as limiting the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application. Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

The Bellhop3D parallel implementation method of the domestic many-core supercomputing-based underwater three-dimensional sound field model shown in the figure 1 comprises the following steps:

step 1: in a global communication domain, each main core acquires hydrological data;

step 2: in the global communication domain, each main core calls a slave core connected with the main core to calculate the ocean sound velocity information;

and step 3: partitioning each main core in the global communication domain to form a plurality of sub-communication domains;

and 4, step 4: dividing a sound source calculation task for each sub-communication domain by adopting a sound source task division algorithm;

and 5: and each sub-communication domain calculates the three-dimensional sound ray propagation track according to the distributed sound source calculation task.

In this embodiment, in the global communication domain, the invoking, by each master core, the slave core connected thereto to calculate the marine sound speed information includes:

In this embodiment, the hydrological data includes temperature, salinity, depth, geoacoustic environmental data, and elevation data.

In the calculation of the Bellhop3D model, the calculation which takes time is the calculation of the sound velocity of the global ocean, the used environmental data comprises temperature, salinity and depth, and the calculation formula (formula 1) is as follows:

c＝1149.14+4.572T-4.453×10 ^-2 T ² -2.605×10 ^-4 T ³ +7.985×10 ^-6 T ⁴ +1.398(S-35)+1.692×10 ^-3 (S-35) ² +1.603×10 ^-1 D +1.027×10 ^-5 D ² +3.522×10 ^-9 D ³ -3.36×10 ^-12 D ⁴ +(-1.861×10 ^-4 T+7.481×10 ^-6 T ² +4.528×10 ^-8 T ³ )D +(S-35)(-1.124×10 ^-2 T+7.771×10 ^-7 T ² +7.702×10 ^-5 D) +(S-35)(-1.294×10 ^-7 D ² +3.158×10 ^-8 DT+1.589×10 ^-9 DT ² ) +(-2.529×10 ^- ⁷ T+1.856×10 ^-9 T ² )D ² -1.965×10 ^-10 TD ³

wherein T is temperature in degrees Celsius (C), S is salinity, the unit is one thousandth (thousandth), D is the water depth, and the unit is meters (m).

The number of spatial grid points of global data is 3600 (along the precision direction) × 1800 (along the dimension direction) × 62 (along the depth direction), so that the calculation in such a scale needs a lot of time, and the calculation speed of a program can be improved by shortening the time by using SIMD vectorization.

The SIMD vectorization refers to the operation of simultaneously executing a plurality of scalar operations through one instruction, the Shenweizhong kernel supercomputing provides a plurality of expanded data types for vectorization, data can be loaded into a vector register by using a SIMD _ LOAD function, a series of floating point operation functions such as SIMD _ VADDD, SIMD _ VSUBD and SIMD _ VMULD are called to realize the calculation of the ocean sound velocity in the formula (1), and the calculation result is shown in FIG. 2.

Referring to fig. 3, in the present embodiment, partitioning each master core in the global communication domain, so as to form a plurality of sub-communication domains, includes:

acquiring the number of primary cores which can be used for calculation;

and dividing each main core into a plurality of sub-communication domains according to the number of the main cores capable of being used for calculation, distributing calculation tasks to each sub-communication domain, and distributing new process numbers to processes in each sub-communication domain. Specifically, M sub-communication domains are divided according to the number N of computing resources (the number of processes in each sub-communication domain is close, usually about 500), computing tasks are allocated to each sub-communication domain, and a new process number is allocated to each process in each sub-communication domain.

In this embodiment, the dividing the sound source calculation task for each sub-communication domain by using the sound source task division algorithm includes:

In the present embodiment, the time weight of each sound source is obtained by the following method:

specifically, the time weight Ki of each sound source is calculated from the sound ray propagation distance, the sound source frequency, and the sound source depth, and the formula is as follows (formula 2):

K _i ＝l ₁ ×distance+l ₂ ×frequence+l ₃ ×depth；

in the formula, distance is the calculated radius of the sound field, frequency is the frequency of the sound source, depth is the depth of the sea bottom at the position of the sound source, and l ₁ 、l ₂ 、l ₃ For corresponding proportionality coefficient, test the ratio ₁ 、l ₂ 、l ₃ The load balancing effect is better when the values are respectively 0.3, 0.5 and 0.2.

In this embodiment, the average time weight of the sound source is obtained by the following method:

the total weight of time for all sound sources takes the following formula (equation 3):

the average time weight of the sound source adopts the following formula (formula 4):

K _ave ＝K _sum /M；

wherein S is the number of sound sources and M is the number of sub-communication domains.

In this embodiment, dividing the sound source calculation task for each sub-communication domain according to the time weight of each sound source and the average time weight of the sound source includes:

step 101: arranging the time weights of all sound sources in a reverse order to form a sound source time weight sequence; it will be appreciated that the time weight K for each sound source needs to be obtained first _i Total weight of time K _sum ；

In this embodiment, the step 104 includes:

step 1041: searching the time weight of the sound source closest to the average time weight in the time weight sequence of the sound source, assigning the sound source to a sub-communication domain and moving out of the time weight sequence of the sound source, wherein the average time weight in step 41 is called a first time weight;

step 1042: after removing the sound source time weight sequence from the sound source time weight sequence which is distributed completely, re-acquiring the average time weight of a new sound source according to the updated sound source time weight sequence, and subtracting the sound source time weight closest to the average time weight from the first time weight so as to acquire the residual time weight;

For example, assume that there are 6 pieces of sound source information, which are the first sound source information, the second sound source information, the third sound source information, the fourth sound source information, the fifth sound source information, and the sixth sound source information, respectively; there are 4 sub-communication domains to be allocated.

In this embodiment, the global communication domain represents all available master cores and the slave cores that each master core can use. For example, if there are 1000 cores in the shenwei computer, and 300 cores are applied for use, the communication domain formed by the 300 cores is the global communication domain.

Each sound source information corresponds to a time weight, for example, the first time weight corresponding to the first sound source information is 0.9; the first time weight corresponding to the second sound source information is 0.8; the third time weight corresponding to the third sound source information is 0.7; the fourth time weight corresponding to the fourth sound source information is 0.5; the fifth time weight corresponding to the fifth sound source information is 0.4; the sixth time weight corresponding to the sixth sound source information is 0.1.

In this embodiment, there are 4 sub-communication domains, which are a, B, C, and D;

step 101: arranging the time weights of the sound sources in a reverse order to form a sound source time weight sequence, namely 0.9, 0.8, 0.7, 0.5, 0.4 and 0.1;

step 102: determining a relationship between a first value of the time-weighted sequence of sound sources and an average time weight, the average time weight being calculated using the above formula in the present case as: 0.9+0.8+0.7+0.5+0.4+0.1=3.4;

3.4/4=0.85, i.e., in the present case, the average temporal weight is 0.85.

Judging the relation between the first value of the sound source time weight sequence and the average time weight, if the first value of the sound source time weight sequence is more than or equal to the average time weight, performing step 103, and if the first value of the sound source time weight sequence is less than the average time weight, performing step 104;

from the above, it can be seen that the first value is 0.9, which is greater than 0.85, then step 103 is performed;

specifically, the first sound source information corresponding to 0.9 is allocated to one sub-communication domain, for example, to a, and after allocation, the sound source time weight sequence is updated, that is, the new sound source time weight sequence is 0.8, 0.7, 0.5, 0.4, 0.1;

step 105: according to the updated sound source time weight sequence, the average time weight of a new sound source is obtained again, and according to the updated sound source time weight sequence and the average time weight of the new sound source, the step 2 is repeated until the sound source data distribution is completed; for example, the updated sound source time weight sequence is 0.8, 0.7, 0.5, 0.4, 0.1;

the average time weight of the new sound source is obtained according to 0.8, 0.7, 0.5, 0.4, 0.1, namely:

0.8+0.7+0.5+0.4+0.1＝2.5；

2.5/3=0.83, i.e. in the present case the average temporal weight is 0.83.

Repeating step 102, namely judging the relation between the first value of the sound source time weight sequence and the average time weight, if the first value of the sound source time weight sequence is greater than or equal to the average time weight, performing step 103, and if the first value of the sound source time weight sequence is less than the average time weight, performing step 104;

as can be seen from the above, if the first value of the sound source time weight sequence is smaller than the average time weight, step 104 is performed:

if the first value of the sound source time weight sequence is smaller than the average time weight, allocating the sound source data corresponding to at least two sound source time weights to one sub-communication domain (in this embodiment, to B) and shifting the sound source time weights that have been allocated out of the sound source time weight sequence and updating the sound source time weight sequence, and performing step 105;

specifically, step 1041: searching the sound source time weight closest to the average time weight in the sound source time weight sequence (in this embodiment, 0.8 is the closest average time weight of 0.83), assigning the sound source to one sub-communication domain and moving out of the sound source time weight sequence, wherein the average time weight in step 41 is referred to as a first time weight;

step 1042: after removing the sound source time weight sequence from the assigned sound source time weight, according to the updated sound source time weight sequence (after moving out of 0.8, the sound source time weight sequence is 0.7, 0.5, 0.4, 0.1), the remaining time is 0.83-0.8, i.e. equal to 0.03;

step 1043: after the remaining time weight is obtained, the sound source time weight closest to the average time weight is continuously searched in the sound source time weight sequence, and the step 1041 is repeated according to the updated sound source time weight sequence and the new average time weight of the sound source until the remaining time weight is less than or equal to 0. As can be seen from the above embodiments, the sound source time weight sequence is 0.7, 0.5, 0.4, 0.1, and the closest to 0.03 is 0.1, so that the sound source task corresponding to 0.1 is allocated to the sub-communication domain, and at this time, the remaining time weight of the sub-communication domain is already negative after allocation, that is, the allocation of the sub-communication domain (B) is completed.

The allocation is performed according to the allocation method described above, and the remaining sub communication domains C and D are allocated.

In this embodiment, after dividing the sound source calculation task for each sub-communication domain, the dividing the sound source calculation task for each sub-communication domain by using the sound source task division algorithm further includes:

and allocating a sound ray calculation task to each main core in each sub communication domain.

In this embodiment, the allocating a sound ray calculation task to each primary core in each sub-communication domain includes:

respectively calculating the time weight k of each sound ray in each sound source _i ；

Obtaining the average time weight k of sound rays in the sub-communication domain according to the time weight of each sound ray and the number of the main cores in the sub-communication domain capable of being distributed _ave ；

In this embodiment, dividing the sound ray calculation task for each primary core according to the time weight of each sound ray in each sound source and the mean time weight of the sound rays in the sub-communication domain includes:

step 201: arranging the time weights of all sound rays in a reverse order to form a sound ray time weight sequence;

step 202: judging the relation between the first value of the sound ray time weight sequence and the average time weight, if the first value of the sound ray time weight sequence is larger than or equal to the average time weight, performing step 203, and if the first value of the sound ray time weight sequence is smaller than the average time weight, performing step 204;

In this embodiment, the time weight of each sound ray in the sound source is obtained by using the following formula:

k _i ＝K _i /(B·R)；

the average time weight of the sub-communication kernel group is obtained by adopting the following formula:

k _ave ＝k _sum n; wherein the content of the first and second substances,

b is the number of azimuth of the sound source, R is the number of sound rays required to be calculated for the sound source at each azimuth, k _sum Is the total weight of sound ray time in the sub-communication, and n is the number of kernel groups in the sub-communication domain.

It is understood that the assignment of the sound ray data is similar to that of the sound source data, and is not described in detail herein.

In this embodiment, the calculating the three-dimensional sound ray propagation trajectory by each sub-communication domain according to the sound source calculation task allocated to each sub-communication domain includes:

loading a calculation task, so that the marine sound velocity, the geological acoustic environment data and the elevation data of the corresponding region required by calculation are loaded into each slave core corresponding to the master core;

In the embodiment, all processes in the sub-communication domain load the ocean sound velocity, the geological acoustic environment data (sediment layer density, sediment layer sound velocity, sediment layer attenuation coefficient and the like) and the elevation data of the corresponding area according to the calculation task. In the process, a slave core reads corresponding data by using a memory sharing algorithm, calculates a three-dimensional sound line propagation track, and writes a calculation result into a main memory by using a double-buffer communication mechanism based on DMA in the calculation process;

by analyzing the Bellhop3D model, the part which takes the longest time is the three-dimensional sound ray propagation track. The sound ray tracking uses the sound source position as a starting point, calculates the position of the next propagation track point of the sound ray according to the set azimuth angle, pitch angle and marine environment data of the current point, and judges whether the sound ray is propagated to the sea surface and the seabed. The model repeats the steps until the propagation distance of the sound ray exceeds a preset value or the propagation track point of the sound ray exceeds a preset value, and the process takes a lot of time.

In the prior art, the Bellhop3D model has three layers of nested loops related to the sound source position, the sound ray propagation azimuth angle, and the number of sound rays in each azimuth, and the serial algorithm sequentially calculates the maximum value of the number of sound sources, the maximum value of the number of sound sources in each azimuth, and the maximum value of the number of sound rays in each azimuth, which are required to be calculated, until all the sound rays in each azimuth of all the sound sources are calculated. Because the front and back association of the single three-dimensional sound ray propagation path can not be split before and after the calculation, the parallel task division of the sound source level is realized by using the MPI sub-communication domain, the parallel task division of the azimuth angle and the sound ray level is realized by using the process in the sub-communication domain, and finally the parallel calculation of the sound ray is realized by using Athread on the kernel.

By adopting the MPI + Athread process level-thread level two-level parallel acceleration method, the track calculation can be performed in parallel, the calculation speed is improved, and the calculation time is shortened.

In the present embodiment, in the three-dimensional sound field calculation, the sound ray does not propagate only according to the preset azimuth, horizontal refraction propagation to adjacent azimuths may occur, and the refraction angle may even exceed 90 °. At this time, the amount of data is very large.

Referring to fig. 3 to 5, the LDM space of the shenwei-taihu lake only with 64KB of light unit secondary cores cannot store all marine environmental data in the sound ray propagation track calculation process, so the present application proposes a method for reading marine sound velocity, geospatial environmental data and elevation data of corresponding regions stored in each secondary core by using a memory sharing algorithm. The division and corresponding manner of the environmental data in the sharing scheme is shown in fig. 4, and marine environmental data with a radius of about 300 kilometers (32 grid points) is selected by taking the sound source position as a center, and is divided into 64 data blocks of 8 × 8, and the data blocks are in one-to-one correspondence with 64 slave cores. As shown in FIG. 5, when a slave core CPE is requested _i,j Slave core CPE requiring acquisition purpose _m,n Transit slave core CPE when data in _i,n Forwarding data request commands to slave core CPE _m,n Then, the data is forwarded to the requesting slave core CPE _i,j The one-time data request process includes two row register communications and two column register communications. The communication delay of the light slave internuclear register of the Shenwei Taihu lake is only 10 clock cycles, the transmission rate is very fast, and the communication delay can be realizedSo as to avoid extra overhead and deadlock problems.

Specifically, the data acquisition process is as shown in fig. 4: (1) requesting slave core CPE located in i row and j column _i,j In the calculation process, a certain block of data needs to be acquired, and data serial numbers (0-63) are acquired according to longitude and latitude coordinates of the data, so that the slave core CPE for storing the data is acquired _m,n A row number m and a column number n; (2) the slave cores located in the same row of the requesting slave core and the same column of the destination slave core are named transit slave CPEs _i,n (ii) a (3) Requesting slave nuclear CPE _i,j Issuing data request commands to a transit slave core CPE via line register communication _i,n Transit slave CPE _m,n Slave core CPE forwarding requests through column register communication _m,n (ii) a (4) Purpose is from nuclear CPE _i,j Sending data packets to transit slave core CPE via column register communication _i,n Transit slave CPE _i,n Sending data to requesting slave core CPE via line register communication _i,j 。

In the embodiment, the memory sharing scheme of the slave core array based on register communication ensures that the calculation of the propagation paths between different sound rays does not interfere with each other. The parallel solution of the method for the sound ray tracing calculation is to realize the sound source and azimuth angle level parallel by using MPI between the main cores and realize the sound ray level parallel by using Athread on the auxiliary cores. The program divides a calculation task for each core group according to calculation resources, a main core in the core group is responsible for acquiring sound source position information and relevant environment data which need to be calculated, a secondary core is responsible for calculating sound ray propagation tracks, and a complete sound ray propagation path is responsible for one secondary core.

In this embodiment, after each sub-communication domain calculates a three-dimensional sound ray propagation trajectory according to the sound source calculation task allocated to each sub-communication domain, the method for realizing Bellhop3D parallelism based on domestic many-core supercomputing underwater three-dimensional sound field model further includes:

the sub-communication domain main process collects the calculation results of other slave processes, and applies Gauss, cap shapes and the like to correct the sound field and calculate the propagation loss;

and the main process in the sub communication domain stores the calculation result in a distributed manner.

The invention will be briefly described below with reference to the accompanying drawings and examples.

In order to verify the performance of the method, 88 sound sources are arranged, 20 frequencies are arranged at each sound source position, 3600 azimuths of the sound source of each frequency are divided, 320 sound lines are calculated at each azimuth, parallel efficiency tests are respectively carried out on 128.7-2059.2 ten thousand cores, the calculation scale, the acceleration ratio, the parallel efficiency and the like used in actual measurement are shown in table 1, the time is shown in fig. 5, and the parallel efficiency is shown in fig. 6. As can be seen from FIG. 6, the speed-up ratio of the method of the present invention increases with the increase of the parallel scale, and the parallel efficiency decreases with the increase of the parallel scale. When the parallel size exceeds 1029.6 ten thousand cores, the parallel efficiency drop begins to get faster. The parallel efficiency is 69.4% when the maximum parallel scale is 2059.2 ten thousand cores, which shows that the method of the invention still has good expansibility, and the method is proved to have good parallel performance under the condition of large-scale parallel.

TABLE 1 actual measured parallel efficiency

In this embodiment, the technical scheme of the application is based on a domestic many-core supercomputing platform, a process-level and thread-level two-level parallel scheme of a Bellhop3D three-dimensional underwater acoustic propagation model is implemented by using a master core and a slave core, global measured hydrological data (temperature, salt, depth and the like), geological acoustic data as shown in fig. 7 and marine elevation data as shown in fig. 8 are coupled, a global marine acoustic velocity is computed by using an SMID vectorization technology, a slave core array memory sharing scheme based on register communication is used for realizing shared storage of sound field environment data, a sound ray decoupling algorithm is used for realizing parallel computation of sound ray propagation tracks, and a double-buffer communication technology of DMA is used for realizing data transmission between the slave core and a master memory. In addition, a multi-mode communication algorithm based on communication domain division is used, so that load balance of calculation is realized, distributed reading and storage of data are realized, I/O time is shortened, and parallel efficiency is improved.

Bellhop3D underwater sound propagation model

In the present embodiment, the theoretical basis of the current mathematical model of underwater sound propagation is the wave equation, and a hyperbolic second-order linear partial differential equation related to time is generally adopted:

in the formula

Is the laplacian, Φ is the potential function, c is the speed of sound, and t is time.

Suppose a harmonious solution of the potential function Φ to

Φ＝φe ^-iωt …………………………………………(5)

Where φ is a time-independent potential function, ω =2 π f is the source angular frequency, and f is the frequency.

According to equation (5), equation (4) can be simplified to a time-independent Helmholtz equation;

according to the difference of phi solutions, the underwater sound propagation problem is mainly researched by using two methods, namely a fluctuation theory and a ray theory: the fluctuation theory researches the change of the amplitude and the phase of the acoustic signal in the sound field, such as a normal wave model, a fast field model, a parabolic equation model and the like; ray theory approximates sound waves to sound beams at high frequencies, and studies the change of sound intensity along with the sound beams in a sound field, such as a ray theory model.

If the Gaussian beam needs to be expanded to three dimensions, small-amplitude modification needs to be carried out on an (Nx 2D) weak three-dimensional algorithm, a sector emitted by a three-dimensional sound ray has an azimuth angle and a pitch angle, and the equation is as follows:

where c (x, y, x) is the speed of sound of the ocean and (x(s), y(s), z (s)) is the propagation trajectory of the sound ray, which originates from the sound source location (xs, ys, zs), with angles of incidence and azimuth angles α and β, respectively.

The gaussian beam is constructed around a central sound ray with coordinates (s, m, n) centered on the sound ray. s is the arc length along the ray and (m, n) is the normal distance from the field point to the central ray. This coordinate system has a regular area around the central sound ray, where there is a well-defined sound-ray centered coordinate at the receiving point. In the following two normal vector directions pointing to the sound ray, (m, n) is defined as the normal distance:

wherein

The beam formation near the center ray requires integration of a set of auxiliary equations

Where V is the speed of sound curvature matrix over two normal vectors

The derivatives cmm, etc. represent the derivatives in the normal vector direction, and these derivative terms can be rewritten with the derivative of the speed of sound:

it can be seen from the P-Q differential equation how the sound ray is disturbed (by moving the sound source position or changing the sound ray angle) due to the change in the initial condition of the sound ray. To obtain a gaussian beam, the initial conditions are applied:

where the beams controlling the initial beamwidth in both normal directions of the sound ray are well known, typically complex, their real and imaginary parts allow independent control of beamwidth and beam curvature. Integrating these equations along the sound ray yields a gaussian beam as shown below:

here, Q is expressed by a residue of inversion of Q, and | Q | represents a determinant of Q.

In the prior art, the serial algorithm of the Bellhop3D model consumes a long time, and mainly comprises the steps of calculating the speed of sound of a global ocean, tracking a three-dimensional sound ray track and storing a calculation result. Aiming at the contents, the method improves the computing speed by using a two-stage parallel scheme based on a master core process stage and a slave core process stage, shortens the time by using distributed storage, and mainly comprises the following main contents:

(1) The SMID vectorization technology is used for realizing parallel accelerated calculation of global ocean sound velocity;

(2) The method comprises the steps of realizing shared storage of sound field environment data by using a slave core array memory sharing scheme based on register communication, realizing parallel accelerated calculation of sound ray propagation tracks by using a sound ray decoupling algorithm, and accelerating data transmission between a slave core and a main memory by using a DMA double-buffer communication technology;

(3) And a multi-mode communication algorithm based on communication domain division is used for distributing calculation tasks, load balancing calculation is realized, distributed reading and storage of data are realized, I/O time is shortened, and parallel efficiency is improved.

In the embodiment, the key for improving the acceleration performance of the domestic many-core processor is to reduce or hide the communication overhead of the slave core. The slave core can directly and discretely access the main memory by using gld/gst with the delay of hundreds of clock cycles, and can also access the main memory in batch by a DMA mode with the delay of tens of clock cycles. The double buffering mechanism refers to that if the slave core needs to perform multiple rounds of read-write operations, a storage space twice as large as communication data is applied to the local space of the slave core so as to store two copies of data with the same size and buffered for each other: in addition to the communication process of the data read in for the first time, when the slave core uses one piece of buffer data to perform the data calculation of the current round, another piece of buffer data can be used to perform the communication of the data read in the next round. When using double buffering techniques, the algorithm needs to define double buffer identifiers and use the answer words to determine the position. By using the DMA-based double-buffer communication technology, the communication overhead can be well hidden when being smaller than the calculation overhead, thereby improving the acceleration effect of the slave core.

Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, it is intended that all such modifications and alterations be included within the scope of this invention as defined in the appended claims.

Claims

1. A Bellhop3D parallel implementation method of an underwater three-dimensional sound field model based on domestic many-core supercomputing is characterized by comprising the following steps of:

in a global communication domain, each main core acquires hydrological data;

in a global communication domain, each main core calls a slave core connected with the main core to calculate ocean sound velocity information;

2. The method for realizing the Bellhop3D parallel underwater three-dimensional sound field model based on the domestic many-core supercomputing as claimed in claim 1, wherein in the global communication domain, each master core respectively calls a slave core connected with each master core to calculate the marine sound velocity information comprises:

3. The method for realizing the Bellhop3D parallel underwater three-dimensional sound field model based on the domestic many-core supercomputing as claimed in claim 2, wherein the partitioning each main core in the global communication domain so as to form a plurality of sub-communication domains comprises:

acquiring the number of main cores capable of being used for calculation;

and dividing each main core into a plurality of sub-communication domains according to the number of the main cores capable of being used for calculation, distributing calculation tasks to each sub-communication domain, and distributing new process numbers to processes in each sub-communication domain.

4. The domestic many-core-supercomputing-based underwater three-dimensional sound field model Bellhop3D parallel implementation method as claimed in claim 3, wherein said dividing the sound source calculation tasks for each sub-communication domain by using the sound source task division algorithm comprises:

acquiring sound source data to be calculated, wherein each sound source data comprises a sound ray propagation distance, a sound source frequency and a sound source depth;

calculating the average time weight of the sound sources according to the time weight of each sound source and the number of the distributed sub communication domains;

5. The method for realizing the Bellhop3D parallel implementation of the domestic many-core supercomputing-based underwater three-dimensional sound field model according to claim 4, wherein after dividing the sound source calculation task for each sub-communication domain, the dividing the sound source calculation task for each sub-communication domain by using the sound source task division algorithm further comprises:

6. The method for realizing the Bellhop3D parallel underwater three-dimensional sound field model based on the domestic many-core supercomputing as claimed in claim 5, wherein the allocating the sound ray computation task to each main core in each sub-communication domain comprises:

7. The method for realizing the Bellhop3D parallel implementation of the domestic many-core supercomputing-based underwater three-dimensional sound field model according to claim 6, wherein the dividing of the sound source calculation task for each sub-communication domain according to the time weight of each sound source and the average time weight of the sound source comprises:

step 103: allocating the sound source data corresponding to the sound source time weight greater than or equal to the average time weight to a sub-communication domain, moving the allocated sound source time weight out of the sound source time weight sequence, updating the sound source time weight sequence, and performing step 105;

step 105: and re-acquiring the average time weight of the new sound source according to the updated sound source time weight sequence, and repeating the step 102 according to the updated sound source time weight sequence and the new average time weight of the sound source until the sound source data distribution is completed.

8. The method for realizing Bellhop3D parallel of the domestic many-core-supercomputing-based underwater three-dimensional sound field model according to claim 7, wherein the step 4 comprises the following steps:

step 1042: after removing the sound source time weight sequence from the sound source time weight sequence which is distributed completely, re-acquiring the average time weight of a new sound source according to the updated sound source time weight sequence, and subtracting the sound source time weight closest to the average time weight from the first time weight so as to acquire a residual time weight;

step 1043: after the remaining time weight is obtained, the time weight of the sound source closest to the average time weight is continuously searched in the time weight sequence of the sound source, and the step 1041 is repeated according to the updated time weight sequence of the sound source and the new average time weight of the sound source until the remaining time weight is less than or equal to 0.

9. The method for realizing the Bellhop3D parallel underwater three-dimensional sound field model based on the domestic many-core supercomputing as claimed in claim 8, wherein dividing the sound ray calculation task for each primary core according to the time weight of each sound ray in each sound source and the mean time weight of the sound rays in the sub-communication domain comprises:

step 205: and re-acquiring the new average time weight of the sound ray according to the updated sound ray time weight sequence, and repeating the step 202 according to the updated sound ray time weight sequence and the new average time weight of the sound ray until the sound ray data distribution is completed.

10. The method for realizing the Bellhop3D parallel implementation of the domestic many-core supercomputing-based underwater three-dimensional sound field model as claimed in claim 9, wherein the step of calculating the three-dimensional sound ray propagation trajectory by each of the sub-communication domains according to the assigned sound source calculation task comprises:

loading a calculation task, so that ocean sound velocity, geological acoustic environment data and elevation data of a corresponding region required by calculation are loaded into each slave core corresponding to the master core;