CN110780842A - Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture - Google Patents

Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture Download PDF

Info

Publication number
CN110780842A
CN110780842A CN201911025256.7A CN201911025256A CN110780842A CN 110780842 A CN110780842 A CN 110780842A CN 201911025256 A CN201911025256 A CN 201911025256A CN 110780842 A CN110780842 A CN 110780842A
Authority
CN
China
Prior art keywords
calculation
vector register
subdomains
complex
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911025256.7A
Other languages
Chinese (zh)
Inventor
刘钊
吕小敬
邹明松
李锦薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Hengding Super Computing Center Co Ltd
Original Assignee
Wuxi Hengding Super Computing Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Hengding Super Computing Center Co Ltd filed Critical Wuxi Hengding Super Computing Center Co Ltd
Priority to CN201911025256.7A priority Critical patent/CN110780842A/en
Publication of CN110780842A publication Critical patent/CN110780842A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/4806Computations with complex numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/4806Computations with complex numbers
    • G06F7/4812Complex multiplication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements

Abstract

The invention provides a ship three-dimensional acoustic-elastic simulation calculation parallel optimization method based on a Howey framework, belongs to the field of algorithm optimization, and provides an optimization method based on two-dimensional wet surface element parallel by improving one-dimensional wet surface element parallel in ship three-dimensional acoustic-elastic simulation calculation, so that the communication time is hidden, and the parallel operation efficiency of ship three-dimensional acoustic-elastic simulation calculation software on a Howey framework computer is comprehensively improved.

Description

Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture
Technical Field
The invention relates to an optimization method of an algorithm, in particular to an optimization method of large-scale parallel computing based on a Howey framework.
Background
The optical super computer of Shenwei Taihu lake comprises 40960 SW26010 heterogeneous multi-core processors and 20480 computing board nodes, 10649600 computing cores are total, the peak performance of the system is 125.4PFlops, and the TOP500 leaders of Cicada Union have been continuously conducted four times, so that the capability of large-scale parallel computing processing is provided.
The ship three-dimensional acoustic-elastic analysis theory and method researches the coupling vibration of the elastic floating body and the water medium and the problems of acoustic radiation, acoustic scattering and acoustic propagation caused by the coupling vibration. On the basis, the developed three-dimensional water elastic Acoustic analysis software THAFTS-acoustics of the ship can realize the unified calculation and analysis of the vibration transmission in the ship and the underwater radiation sound field of the ship, and has good engineering applicability.
The research of the three-dimensional acoustic elastomechanics has wide application requirements and development prospects in a series of engineering problems of improving the motion performance and safety of ships, controlling the vibration noise of the ships, improving the underwater stealth performance and the like. In 1970, Wu established a two-dimensional hydro-elasto-mechanical theory to simplify the hull structure into non-uniform Euler beams or Timoshenko beams. Price and Wu combine the structure dynamics theory with the three-dimensional ship motion potential flow theory, propose the generalized fluid-solid coupling boundary condition, and creatively develop the three-dimensional hydroelasticity theory suitable for any three-dimensional deformable body in the waves to bear. The Dolichen et al develops a zero-navigational-speed three-dimensional pulse source Green function rapid calculation method and establishes a perfect numerical calculation method of a three-dimensional navigation hull linear hydroelasticity frequency domain analysis theory. On the basis of a three-dimensional hydro-elastic theory and a program, the Zhongsong and the like develop a ship three-dimensional acoustic-elastic theory with fast speed, sea surface and seabed boundary influence and develop a set of complete numerical simulation software capable of solving the problem of complex ship structure low and medium frequency band acoustic-elasticity.
The three-dimensional acoustic-elastic theory and the software function are increasingly perfected, so that the capability of improving the software computing capability, perfecting the software computing complex structure and the function of the complex marine underwater acoustic channel environment and realizing multi-working-condition and large-task computing is urgent. In recent years, high-performance computing is developed vigorously, for example, a high-performance computing theory and massive computing resources of a supercomputer are combined to perform parallel upgrading and optimization on the existing program, so that the large-scale high-efficiency computing performance of software is improved, and the method becomes a research subject with important application value.
Three-dimensional acoustoelastic computation relates to multi-field coupling, multiple physical quantities and multi-core segments, and a single parallel mode cannot meet the efficient parallel of all computation hot spots at all, so that a multi-layer and multi-type heterogeneous parallel model is constructed by combining the computational resources and the system architecture of 'Shenwei-Taihu lake light' according to the characteristics of a software algorithm, a mixed parallel mode combining data parallel and task parallel is supported, the parallelism of a program is expanded, the load balance on each parallel layer is ensured, and the ultrahigh computation performance of a many-core processor is fully exerted.
The three-dimensional acoustoelastic software comprises three modules: flxbd, hycof, hyelas. The flxbd module preprocesses input data to generate data required by the generalized hydrodynamic coefficient calculation module hycof; the hycof module calculates the source intensity and the speed potential by calculating the Green function and the partial derivative thereof to obtain parameters such as a hydrodynamic coefficient and the like; and the hyelas module solves the generalized fluid-solid coupling kinetic equation according to the hydrodynamic parameters to generate data required by post-processing. The generalized hydrodynamic coefficient calculation modules hycof and hyelas have large calculation amount, only one-dimensional wet surface element parallelism is realized at present, the program parallelism efficiency is low, the wet surface element/modal square dependence function or higher dependence function cannot be met, and the reverse acceleration condition can occur when the program parallelism exceeds 64 processes.
Disclosure of Invention
In order to solve the technical problems and fully play the capacity of multi-process large-scale parallel computation, the invention provides a parallel optimization method of ship three-dimensional acoustic-elastic simulation computation based on a Howey framework, which aims to solve the problems that in the prior art, three-dimensional acoustic-elastic software is low in parallel efficiency and even has reverse acceleration when the number of processes exceeds 64.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
the invention provides a ship three-dimensional acoustoelastic simulation calculation parallel optimization method based on a Shenwei architecture, which comprises the following steps:
the method comprises the steps of totally m-n processes, dividing a calculation domain row into a plurality of calculation sub-domains, forming calculation sub-domain rows positioned in the same row, and forming calculation sub-domain columns positioned in the same column;
circularly marking the calculation subdomains of each line according to the sequence of 0-m-1, after marking m-1, if the calculation subdomains of the line are not completely marked, continuously marking again according to the sequence of 0-m-1 until the calculation subdomains of the line are completely marked, wherein the marked number is the column number of the calculation subdomains;
circularly marking the calculation subdomains of each column according to the sequence of 0-n-1, after marking n-1, if the calculation subdomains of the column are not completely marked, continuously marking again according to the sequence of 0-n-1 until the calculation subdomains of the row are completely marked, wherein the marked number is the row number of the calculation subdomains;
numbering the processes according to the sequence of 0-m × n-1, wherein the number corresponding to the process is the process number of the process; the sum of the row number m of a calculation subdomain and the column number of the calculation subdomain is a process number which is responsible for processing the process of the calculation subdomain, and all processes carry out parallel calculation on the process which is responsible for calculating the subdomain;
after the process completes the calculation of one calculation subdomain, the calculation result of the last calculation subdomain is communicated to other processes while the next calculation subdomain is calculated;
wherein, one core group corresponds to one process;
the core group comprises a plurality of slave cores and a master core, wherein the slave cores are responsible for the calculation of the processes, and the master core is responsible for the communication of the processes;
m>1,n>1。
according to the ship three-dimensional acoustoelastic simulation calculation parallel optimization method based on the Howey architecture, preferably, a row of calculation subdomains are in the left-to-right direction according to the sequence of 0-m-1, and a column of calculation subdomains are in the top-to-bottom direction according to the sequence of 0-n-1; and the calculation subdomains corresponding to the calculation of one process are sequentially calculated according to the sequence from left to right and then from top to bottom.
The invention provides a ship three-dimensional acoustic-elastic simulation calculation parallel optimization method based on a Shenwei architecture, which is characterized in that m and n are the minimum absolute values of the difference of two factors in all m x n decompositions. The invention provides a ship three-dimensional acoustoelastic simulation calculation parallel optimization method based on a Shenwei architecture, which is characterized by further comprising a plurality of 256-bit vector registers, wherein one vector register can store two complex numbers;
when complex matrix operation is carried out:
taking two complex groups which need to be operated, wherein each complex group comprises two complex numbers which need to be operated, and storing a first complex number in each group into a vector register which is a first vector register; storing the second complex number of each group into another vector register in an order corresponding to the first complex number, the vector register being a second vector register; the storage sequence of each complex number in the first vector register and the second vector register is stored according to the sequence that the imaginary part is in front of the real part and the real part is behind the real part;
separating real parts and imaginary parts of four complex numbers in the two vector registers, recombining the four complex numbers, wherein the real part of a first complex number in the two complex number groups forms a first real part group, the imaginary part forms a first imaginary part group, the real part of a second complex number in the two complex number groups forms a second real part group, and the imaginary part forms a second imaginary part group; the second virtual part group and the first real part group are sequentially stored in a vector register, and the vector register is a third vector register; the first real part group and the second real part group are sequentially stored in a vector register, and the vector register is a fourth vector register; the second real part group and the first imaginary part group are sequentially stored in a vector register, and the vector register is a fifth vector register; the first imaginary part group and the second imaginary part group are sequentially stored in a vector register, and the vector register is a sixth vector register;
performing multiplication operation on data stored in the third vector register and the fourth vector register, and storing the result into a seventh vector register; performing multiplication operation on data stored in the fifth vector register and the sixth vector register, negating the multiplication operation result of the second imaginary part group and the first imaginary part group, and storing the result into the eighth vector register;
and performing accumulation operation on the data stored in the seventh vector register and the eighth vector register.
The technical scheme has the following advantages or beneficial effects:
the invention provides a parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on a Howey architecture, which improves one-dimensional wet surface element parallelism in the ship three-dimensional acoustic-elastic simulation calculation, and provides an optimization method based on two-dimensional wet surface element parallelism, so that communication time is hidden, and parallel operation efficiency of ship three-dimensional acoustic-elastic simulation calculation software on a computer of the Howey architecture is comprehensively improved.
Drawings
The invention and its features, aspects and advantages will become more apparent from reading the following detailed description of non-limiting embodiments with reference to the accompanying drawings. Like reference symbols in the various drawings indicate like elements. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
Fig. 1 is a schematic flow chart of a parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on the shenwei architecture according to embodiment 1 of the present invention; (ii) a
Fig. 2 is another schematic flow chart of a parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on the shenwei architecture according to embodiment 1 of the present invention;
fig. 3 is a schematic view of a computation domain partition of a parallel optimization method for ship three-dimensional acoustic-elastic simulation computation based on the shenwei architecture according to embodiment 1 of the present invention;
fig. 4 is a schematic calculation sub-domain distribution diagram of a parallel optimization method of ship three-dimensional acoustic-elastic simulation calculation based on the naval architecture, provided by embodiment 1 of the present invention;
fig. 5 is a complex storage diagram of a parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on the shenwei architecture provided in embodiment 1 of the present invention.
Detailed Description
The invention will be further described with reference to the following drawings and specific examples, which are not intended to limit the invention thereto.
Example 1:
the three-dimensional acoustoelastic software comprises three modules: flxbd, hycof, hyelas. The flxbd module preprocesses input data to generate data required by the generalized hydrodynamic coefficient calculation module hycof; the hycof module calculates the source intensity and the speed potential by calculating the Green function and the partial derivative thereof to obtain parameters such as a hydrodynamic coefficient and the like; and the hyelas module solves the generalized fluid-solid coupling kinetic equation according to the hydrodynamic parameters to generate data required by post-processing. The generalized hydrodynamic coefficient calculation modules hycof and hyelas have large calculation amount, only one-dimensional wet surface element parallelism is realized at present, the program parallelism efficiency is low, the wet surface element/modal square dependence function or higher dependence function cannot be met, and the reverse acceleration condition can occur when the program parallelism exceeds 64 processes. One-dimensional wet surface element parallel can simply realize the function of linear dependence of calculated amount on the number of wet surface elements, however, the function effect on square dependence or higher dependence on the wet surface elements is not good, such as calculating green root partial derivatives VIN (IXX, IX) and solving source strength SV (IXX, MODE), and along with the increase of the number of wet surface elements and solving modal number, the program parallel efficiency is reduced. Taking the example of solving the SV by the VIN, the same column of VIN is distributed in different processes, the main process needs to communicate to collect the column principal element row number, complete the exchange between the column principal element row and the current row, and after the processing is completed, still needs to communicate with other processes. When the parallel scale is small, the parallel scheme has a certain parallel acceleration effect, but as the number of processes is increased, the communication traffic is increased sharply, the acceleration effect is worse and worse, and even reverse acceleration occurs. In order to solve the problem of parallel existence of one-dimensional wet surface elements, the parallel optimization method of ship three-dimensional acoustic-elastic simulation calculation based on the Shenwei architecture, provided by the invention, has m × n processes, and comprises the following steps: as shown in figure 1 of the drawings, in which,
s101: dividing the calculation domain row into a plurality of calculation sub-domains, forming calculation sub-domain rows positioned in the same row, and forming calculation sub-domain columns positioned in the same column;
s102, circularly marking the calculation subdomains of each line according to the sequence of 0-m-1, and if the calculation subdomains of the line are not completely marked after the line is marked to m-1, continuously marking the calculation subdomains of the line according to the sequence of 0-m-1 again until the calculation subdomains of the line are completely marked, wherein the marked number is the column number of the calculation subdomains;
s103: circularly marking the calculation subdomains of each column according to the sequence of 0-n-1, after marking n-1, if the calculation subdomains of the column are not completely marked, continuously marking again according to the sequence of 0-n-1 until the calculation subdomains of the row are completely marked, wherein the marked number is the row number of the calculation subdomains;
s104: numbering the processes according to the sequence of 0-m × n-1, wherein the number corresponding to the process is the process number of the process; the sum of the row number m of a calculation subdomain and the column number of the calculation subdomain is a process number which is responsible for processing the process of the calculation subdomain, and all processes carry out parallel calculation on the process which is responsible for calculating the subdomain;
s105: after the process completes the calculation of one calculation subdomain, the calculation result of the last calculation subdomain is communicated to other processes while the next calculation subdomain is calculated;
wherein, one core group corresponds to one process;
the core group comprises a plurality of slave cores and a master core, the core group is responsible for the calculation of the process, and the master core is responsible for the communication of the process;
m>1,n>1。
in this embodiment, although the calculation process is described with reference to fig. 3 and 4, but not limiting to the present invention, step S101 is to divide the calculation domain into two-dimensional calculation sub-domains, as shown in fig. 3, which are staggered by 7 × 7 rows, and each calculation sub-domain includes NNM columns and NNB row data; in steps S102 to S103, respectively identifying a calculation sub-domain, a row process, and a column process, where m is NPCOL equal to 3 and n is NPROW equal to 2, so that the column number is 0 to 2, and the row number is 0 to 1; step S104, the calculation subdomains are allocated to corresponding processes for calculation, the calculation subdomains are allocated to the processes according to the process numbers with the same row number m + column number, for convenience of explanation, as shown in FIG. 4, a schematic diagram myrow for allocating the calculation subdomains according to the process numbers is corresponding to the row numbers, the mycol is corresponding to the column numbers, the myid is corresponding to the process numbers, the myid is respectively 0-5 total 6 process numbers, and all the calculation subdomains corresponding to one myid in the diagram are all the regions needing to be calculated; step S106, after one calculation subdomain is calculated, the next calculation subdomain is calculated and simultaneously communicated with other processes, and the result of the last calculation subdomain is transmitted to other processes, so that the communication time consumption is hidden in the calculation process, and the calculation efficiency is improved. In order to ensure load balancing, maximally utilize all computing resources, and avoid computing resource vacancy, m and n both need to be greater than 1, for convenience of understanding, and to illustrate the disadvantages of other manners, in this example, if n is 1, then m is 6, then the process numbers corresponding to each row are 0, 1, 2, 3, 4, 5, and 0, then finally process 0 needs to calculate one more column of computing subdomains, which causes load imbalance, and by adopting the partitioning manner in this example, the last extra column can be shared by process 0 and process 3, so that the calculation amount of each process can be balanced, load balancing is promoted, and the computing efficiency is improved.
In order to improve the parallel efficiency, the calculation subdomains in one row are in the left-to-right direction according to the sequence of 0-m-1, and the calculation subdomains in one column are in the top-to-bottom direction according to the sequence of 0-n-1; and the calculation subdomains corresponding to the calculation of one process are sequentially calculated according to the sequence from left to right and then from top to bottom. In this example, when the calculation starts, 6 processes respectively calculate 3 × 2 calculation subdomains at the upper left corner of the calculation domain, and when the calculation of (0, 0) calculation subdomain is completed, process 0 calculates the next (0, 0) calculation subdomain and communicates the calculation result to processes 1, 2 and 3 in the same row and column; and the processes 1, 2 and 3 receive the calculation result communicated from the process 0, update the local data of the calculation subdomain, calculate the calculation subdomain, and communicate with the processes in the same row and column when calculating the next calculation subdomain after calculation is finished. The calculation sequence from left to right and from top to bottom can ensure that all processes are in a calculation state in the most time of calculating the calculation domain, thereby fully improving the parallel efficiency and reducing the operation time.
In order to further ensure load balance and fully utilize computing resources, m and n are the minimum absolute value of the difference between two factors in all the decompositions of m and n, so that m and n are equal or close to each other as much as possible, computing tasks (computing sub-domains) distributed by all processes can be balanced, and balance is further ensured.
The core section of the hycof module in the three-dimensional acoustic-elastic software is mainly complex matrix operation, and the SIMD programming does not have a matched complex number expansion data type, so that a plurality of instructions are needed to process the real part and the imaginary part of a complex number during calculation, and the calculation resources are wasted. Therefore, the ship three-dimensional acoustoelastic simulation calculation parallel optimization method based on the Shenwei architecture provided by embodiment 1 of the present invention further includes a plurality of 256-bit vector registers, and one vector register can store two complex numbers;
when performing complex matrix operation, as shown in fig. 2:
s201, two complex groups which need to be subjected to the same operation are selected, each complex group comprises two complex numbers which need to be subjected to the operation, and a first complex number in each group is stored into a vector register which is a first vector register; storing the second complex number of each group into another vector register in an order corresponding to the first complex number, the vector register being a second vector register; the storage sequence of each complex number in the first vector register and the second vector register is stored according to the sequence that the imaginary part is in front of the real part and the real part is behind the real part;
s202, separating real parts and imaginary parts of four complex numbers in two vector registers, recombining, wherein the real part of the first complex number in two complex number groups forms a first real part group, the imaginary part forms a first imaginary part group, the real part of the second complex number in two complex number groups forms a second real part group, and the imaginary part forms a second imaginary part group; the second virtual part group and the first real part group are sequentially stored in a vector register, and the vector register is a third vector register; the first real part group and the second real part group are sequentially stored in a vector register, and the vector register is a fourth vector register; the second real part group and the first imaginary part group are sequentially stored in a vector register, and the vector register is a fifth vector register; the first imaginary part group and the second imaginary part group are sequentially stored in a vector register, and the vector register is a sixth vector register;
s203, performing multiplication operation on the data stored in the third vector register and the fourth vector register, and storing the result into a seventh vector register; performing multiplication operation on data stored in the fifth vector register and the sixth vector register, negating the multiplication operation result of the second imaginary part group and the first imaginary part group, and storing the result into the eighth vector register;
and S204, performing accumulation operation on the data stored in the seventh vector register and the eighth vector register.
A complex formula, a1 × B1 ═ a0_ R0 _ R-a0_ I × B0_ I) + (a0_ R _ B0_ I + a0_ I × B0_ R) I, a2 × B2 ═ a1_ R1 _ R-a1_ I × B1_ I) + (a1_ R _ B1_ I + a1_ I × B1_ R) I. In this example, two sets of complex numbers, a1 × b1 and a2 × b2, were calculated.
Step S201 reads two groups of complex numbers, and rearranges the storage sequence of the complex numbers in the register, the imaginary part of the complex numbers is stored at the front, the real part of the complex numbers is stored at the back, the specific storage sequence of the arrangement sequence is shown in FIG. 5, V1 represents a first vector register, V2 represents a second vector register; for convenience of multiplication, four registers (a third vector register, a fourth vector register, a fifth vector register and a sixth vector register) are used for separating real parts and imaginary parts of two groups of complex numbers, the separation result is shown in fig. 5, V3 represents the third vector register, V4 represents the fourth vector register, V5 represents the fifth vector register, V6 represents the sixth vector register, the third vector register is multiplied by the fourth vector register, so that (a1_ R1 _ I, A0_ R0 _ I, A1_ R1 _ R, A0_ R _ B0_ R) can be obtained and stored in the seventh register, the fifth vector register is multiplied by the sixth vector register, so that (a1_ I _ B1_ R, A0_ I _ B0_ I R, A1_ I1 _ B I, A0_ I _ B0_ I) can be obtained and inverted B8656 _ I _ 86I, A0 can be obtained, finally, the obtained (A1_ I B1_ R, A0_ I B0_ R, -A1_ I B1_ I, -A0_ I B0_ I) is stored in the eighth register; step S204 adds the seventh register and the eighth register to obtain (a1_ R _ B1_ I + a1_ I _ B1_ R, A0_ R _ B0_ I + a0_ I _ B0_ R, A1_ R _ B1_ R-a1_ I _ B1_ I, A0_ R _ B0_ R-1 _ I _ B1_ I), where a1_ R _ B1_ I + a1_ I _ B1_ R is an imaginary part of a1 a _ B1, a1_ R _ B1_ R _ a1_ I _ B1_ 1 a 1B 1 a 1B 1 a 1B 1 a 1B 1 a 1B 1 a 1B. It can be seen that by setting a 256-bit vector register, multiplication operation can be performed on two groups of complex numbers at one time, and the real part and imaginary part of the complex numbers do not need to be separated into multiple multiplication instructions for calculation, and one instruction can complete the multiplication operation of the real part and the imaginary part of two real numbers, so that the instruction number of the complex number operation is reduced, the calculation time of the complex number operation is saved, and the calculation efficiency of the three-dimensional acoustic-elastic software is improved.
Those skilled in the art will appreciate that variations may be implemented by those skilled in the art in combination with the prior art and the above-described embodiments, and will not be described in detail herein. Such variations do not affect the essence of the present invention and are not described herein.
While the preferred embodiments of the invention have been described above, the invention is not limited to the specific embodiments described above, wherein equipment and structures not described in detail are understood to be practiced in a manner common to those of skill in the art; it will be understood by those skilled in the art that various changes and modifications may be made, or equivalents may be modified, without departing from the spirit of the invention without departing from the scope of the invention. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention without departing from the technical solution of the present invention.

Claims (4)

1. A ship three-dimensional acoustic-elastic simulation calculation parallel optimization method based on a Shenwei architecture is characterized by comprising the following steps:
the method comprises the steps of totally m-n processes, dividing a calculation domain row into a plurality of calculation sub-domains, forming calculation sub-domain rows positioned in the same row, and forming calculation sub-domain columns positioned in the same column;
circularly marking the calculation subdomains of each line according to the sequence of 0-m-1, after marking m-1, if the calculation subdomains of the line are not completely marked, continuously marking again according to the sequence of 0-m-1 until the calculation subdomains of the line are completely marked, wherein the marked number is the column number of the calculation subdomains;
circularly marking the calculation subdomains of each column according to the sequence of 0-n-1, after marking n-1, if the calculation subdomains of the column are not completely marked, continuously marking again according to the sequence of 0-n-1 until the calculation subdomains of the row are completely marked, wherein the marked number is the row number of the calculation subdomains;
numbering the processes according to the sequence of 0-m × n-1, wherein the number corresponding to the process is the process number of the process; the sum of the row number m of a calculation subdomain and the column number of the calculation subdomain is a process number which is responsible for processing the process of the calculation subdomain, and all processes carry out parallel calculation on the process which is responsible for calculating the subdomain;
after the process completes the calculation of one calculation subdomain, the calculation result of the last calculation subdomain is communicated to other processes while the next calculation subdomain is calculated;
wherein, one core group corresponds to one process;
the core group comprises a plurality of slave cores and a master core, the core group is responsible for the calculation of the process, and the master core is responsible for the communication of the process;
m>1,n>1。
2. the parallel optimization method for three-dimensional acoustic-elastic simulation calculation of ship based on Howey architecture as claimed in claim 1,
the calculation subdomains in one row are in the left-to-right direction according to the sequence of 0-m-1, and the calculation subdomains in one column are in the top-to-bottom direction according to the sequence of 0-n-1; and the calculation subdomains corresponding to the calculation of one process are sequentially calculated according to the sequence from left to right and then from top to bottom.
3. The parallel optimization method for three-dimensional acoustic-elastic simulation calculation of ship based on Shenwei architecture as claimed in claim 2, wherein m and n are the minimum absolute value of the difference between two factors in all the m x n decompositions.
4. The parallel optimization method for three-dimensional acoustic-elastic simulation calculation of ship based on Shenwei architecture as claimed in claim 1, further comprising a plurality of 256-bit vector registers, wherein one vector register can store two complex numbers;
when complex matrix operation is carried out:
taking two complex groups which need to be operated, wherein each complex group comprises two complex numbers which need to be operated, and storing a first complex number in each group into a vector register which is a first vector register; storing the second complex number of each group into another vector register in an order corresponding to the first complex number, the vector register being a second vector register; the storage sequence of each complex number in the first vector register and the second vector register is stored according to the sequence that the imaginary part is in front of the real part and the real part is behind the real part;
separating real parts and imaginary parts of four complex numbers in the two vector registers, recombining the four complex numbers, wherein the real part of a first complex number in the two complex number groups forms a first real part group, the imaginary part forms a first imaginary part group, the real part of a second complex number in the two complex number groups forms a second real part group, and the imaginary part forms a second imaginary part group; the second virtual part group and the first real part group are sequentially stored in a vector register, and the vector register is a third vector register; the first real part group and the second real part group are sequentially stored in a vector register, and the vector register is a fourth vector register; the second real part group and the first imaginary part group are sequentially stored in a vector register, and the vector register is a fifth vector register; the first imaginary part group and the second imaginary part group are sequentially stored in a vector register, and the vector register is a sixth vector register;
performing multiplication operation on data stored in the third vector register and the fourth vector register, and storing the result into a seventh vector register; performing multiplication operation on data stored in the fifth vector register and the sixth vector register, negating the multiplication operation result of the second imaginary part group and the first imaginary part group, and storing the result into the eighth vector register;
and performing accumulation operation on the data stored in the seventh vector register and the eighth vector register.
CN201911025256.7A 2019-10-25 2019-10-25 Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture Pending CN110780842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911025256.7A CN110780842A (en) 2019-10-25 2019-10-25 Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911025256.7A CN110780842A (en) 2019-10-25 2019-10-25 Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture

Publications (1)

Publication Number Publication Date
CN110780842A true CN110780842A (en) 2020-02-11

Family

ID=69386773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911025256.7A Pending CN110780842A (en) 2019-10-25 2019-10-25 Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture

Country Status (1)

Country Link
CN (1) CN110780842A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368484A (en) * 2020-03-19 2020-07-03 山东大学 Cosmic N-body numerical simulation optimization method and system based on Shenwei architecture

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040822A1 (en) * 2009-08-17 2011-02-17 International Business Machines Corporation Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
CN102637124A (en) * 2012-03-22 2012-08-15 中国电子科技集团公司第五十八研究所 Device and method for parallel processing of radix 4 FFT (fast Fourier transform) algorithm
CN103699516A (en) * 2014-01-13 2014-04-02 中国人民解放军国防科学技术大学 Single instruction multiple data (SIMD)-based parallel fast fourier transform/inverse fast fourier transform (FFT/IFFT) butterfly operation method and SIMD-based parallel FFT/IFFT butterfly operation device in vector processor
CN103713314A (en) * 2012-09-28 2014-04-09 中国石油化工股份有限公司 Pre-stack time migration parallel processing method
CN104156271A (en) * 2014-08-01 2014-11-19 浪潮(北京)电子信息产业有限公司 Method and system for balancing cooperative computing cluster load
CN104537125A (en) * 2015-01-28 2015-04-22 中国人民解放军国防科学技术大学 Remote-sensing image pyramid parallel building method based on message passing interface
CN104969215A (en) * 2013-03-13 2015-10-07 高通股份有限公司 Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
CN106897163A (en) * 2017-03-08 2017-06-27 郑州云海信息技术有限公司 A kind of algebra system method for solving and system based on KNL platforms
CN109791488A (en) * 2016-10-01 2019-05-21 英特尔公司 For executing the system and method for being used for the fusion multiply-add instruction of plural number
CN110188462A (en) * 2019-05-29 2019-08-30 无锡恒鼎超级计算中心有限公司 LBM algorithm optimization method based on martial prowess framework
CN110211235A (en) * 2019-05-14 2019-09-06 河海大学 Ore Drawing for Computer Simulation method based on parallel RCB three-dimensional potential function discrete element

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040822A1 (en) * 2009-08-17 2011-02-17 International Business Machines Corporation Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
CN102637124A (en) * 2012-03-22 2012-08-15 中国电子科技集团公司第五十八研究所 Device and method for parallel processing of radix 4 FFT (fast Fourier transform) algorithm
CN103713314A (en) * 2012-09-28 2014-04-09 中国石油化工股份有限公司 Pre-stack time migration parallel processing method
CN104969215A (en) * 2013-03-13 2015-10-07 高通股份有限公司 Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
CN103699516A (en) * 2014-01-13 2014-04-02 中国人民解放军国防科学技术大学 Single instruction multiple data (SIMD)-based parallel fast fourier transform/inverse fast fourier transform (FFT/IFFT) butterfly operation method and SIMD-based parallel FFT/IFFT butterfly operation device in vector processor
CN104156271A (en) * 2014-08-01 2014-11-19 浪潮(北京)电子信息产业有限公司 Method and system for balancing cooperative computing cluster load
CN104537125A (en) * 2015-01-28 2015-04-22 中国人民解放军国防科学技术大学 Remote-sensing image pyramid parallel building method based on message passing interface
CN109791488A (en) * 2016-10-01 2019-05-21 英特尔公司 For executing the system and method for being used for the fusion multiply-add instruction of plural number
CN106897163A (en) * 2017-03-08 2017-06-27 郑州云海信息技术有限公司 A kind of algebra system method for solving and system based on KNL platforms
CN110211235A (en) * 2019-05-14 2019-09-06 河海大学 Ore Drawing for Computer Simulation method based on parallel RCB three-dimensional potential function discrete element
CN110188462A (en) * 2019-05-29 2019-08-30 无锡恒鼎超级计算中心有限公司 LBM algorithm optimization method based on martial prowess framework

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368484A (en) * 2020-03-19 2020-07-03 山东大学 Cosmic N-body numerical simulation optimization method and system based on Shenwei architecture
CN111368484B (en) * 2020-03-19 2022-04-15 山东大学 Cosmic N-body numerical simulation optimization method and system based on Shenwei architecture

Similar Documents

Publication Publication Date Title
KR102443546B1 (en) matrix multiplier
KR102316670B1 (en) computational accelerator
CN108241890B (en) Reconfigurable neural network acceleration method and architecture
CN107704916B (en) Hardware accelerator and method for realizing RNN neural network based on FPGA
US8595280B2 (en) Apparatus and method for performing multiply-accumulate operations
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN104317768B (en) Matrix multiplication accelerating method for CPU+DSP (Central Processing Unit + Digital Signal Processor) heterogeneous system
CN107085562B (en) Neural network processor based on efficient multiplexing data stream and design method
CN106294278B (en) Adaptive hardware for dynamic reconfigurable array computing system is pre-configured controller
CN109657794B (en) Instruction queue-based distributed deep neural network performance modeling method
CN105468439A (en) Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
CN104346318B (en) Matrix Multiplication accelerated method towards general multi-core DSP
CN108710943B (en) Multilayer feedforward neural network parallel accelerator
CN110780842A (en) Parallel optimization method for ship three-dimensional acoustic-elastic simulation calculation based on Shenwei architecture
CN111079078A (en) Lower triangular equation parallel solving method for structural grid sparse matrix
CN107368459B (en) Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication
KR20220064337A (en) Processor for fine-grain sparse integer and floating-point operations
JP7377869B2 (en) Pipelined matrix multiplication in graphics processing units
Chen et al. The parallel algorithm implementation of matrix multiplication based on ESCA
CN110059809B (en) Computing device and related product
Kobayashi et al. Towards a low-power accelerator of many FPGAs for stencil computations
CN112446007A (en) Matrix operation method, operation device and processor
CN111104765B (en) Gas dynamic algorithm optimization method based on Shenwei architecture
Zeng et al. Optimizing frequency domain implementation of CNNs on FPGAs
JP2023542261A (en) Systolic array cell with multiple accumulators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200211

RJ01 Rejection of invention patent application after publication