CN115291949B

CN115291949B - Accelerated computing device and accelerated computing method for computational fluid dynamics

Info

Publication number: CN115291949B
Application number: CN202211171216.5A
Authority: CN
Inventors: 龚艳琼; 刘必慰; 赵玉新; 黄东昌; 郭阳; 江豪龙; 赖雯; 王洁; 杨益斌
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-12-20
Anticipated expiration: 2042-09-26
Also published as: CN115291949A

Abstract

The application relates to computational fluid dynamics and computer technology field, in particular to an accelerated computing device and an accelerated computing method for computational fluid dynamics. The accelerated computing device includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the data transmission channels are arranged among the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Description

Accelerated computing device and accelerated computing method for computational fluid dynamics

Technical Field

The present application relates to the field of computational fluid dynamics and computer technologies, and in particular, to an accelerated computing apparatus and an accelerated computing method for computational fluid dynamics.

Background

Computational Fluid Dynamics (CFD) is a basic control equation for solving Fluid mechanics by using a computer, and the flow characteristics of a complex flow field can be simulated relatively easily and accurately. In the discretization method of numerical computation analysis simulation using CFD, the finite difference method is a typical method in the numerical solution, and different difference computation formats can be combined by combining the time and space difference formats. However, the geometric shapes, numerical methods, physical and chemical models and the like required by the current CFD are increasingly complex and fine, and extremely high requirements are put forward for large-scale calculation. And for the accurate simulation of real flow, the calculated amount is huge, and especially for the accurate numerical simulation of a full-size model, the current calculation capability still cannot be achieved.

In order to accelerate the CFD calculation, various researchers in various countries carry out a lot of researches, and the development of CFD is effectively promoted. However, due to the limited computing power of computers, it still faces a serious challenge to implement CFD high performance on general purpose computing hardware.

Disclosure of Invention

In view of the above, it is necessary to provide a computational fluid dynamics-oriented accelerated computing apparatus and an accelerated computing method that have a simple hardware structure, are computationally efficient, and are flexible and programmable.

A computational fluid dynamics-oriented acceleration computing device, the acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field.

The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel.

Further, the differential operation unit further includes:

and the instruction controller is used for controlling the address of the instruction to be executed.

And the instruction memory is used for storing instructions to be executed.

And the plurality of general registers are used for storing register data.

And the arithmetic logic operation unit is used for carrying out logic operation on the operand.

Further, the instruction controller includes: an adder for adding one and a multiplexer for selecting one from two.

Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and writing back.

In the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;

in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers.

In the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump.

And in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

Further, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit.

The four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.

Furthermore, the number of the general registers in the differential operation unit is 60, and the width of the general registers is 64 bits.

Further, the instruction set used by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm.

The instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.

Further, the instruction set is divided into three types according to the difference of operands, including: register type, immediate type, and hybrid type.

Further, the set of instructions is classified according to their function, the set of instructions comprising: control class instructions, arithmetic class instructions, and data movement class instructions.

Wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data migration class instruction comprises the following steps: and (5) moving the command.

An accelerated calculation method for computational fluid dynamics is used for realizing accelerated calculation for computational fluid dynamics by adopting the accelerated calculation device; the method comprises the following steps:

and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.

And determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number.

All the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units.

Setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.

And operating the calculation program, and outputting the difference operation results of all nodes in the flow field at different moments.

In the above computational fluid dynamics-oriented acceleration calculation apparatus and acceleration calculation method, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Drawings

FIG. 1 is a schematic diagram of an accelerated computational device oriented to computational fluid dynamics in one embodiment;

FIG. 2 is a diagram of a computational fluid dynamics oriented arithmetic unit in one embodiment;

FIG. 3 is a schematic diagram of a computational fluid dynamics oriented transport channel in one embodiment;

FIG. 4 is a schematic flow chart of a computational fluid dynamics oriented acceleration calculation method according to another embodiment;

FIG. 5 is a schematic structural diagram of an accelerated computing device for solving two-dimensional linear convection equations for computational fluid dynamics in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, there is provided a computational fluid dynamics oriented acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field. Wherein, the differential operation units (DPE) are designed according to the calculation characteristics of the finite differential method widely used for calculating fluid mechanics, and the number of the differential operation units is determined according to the condition that a fluid mechanics equation of the fluid mechanics problem to be solved is expanded on the space.

The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together and completing differential operation of all nodes in a flow field in parallel. A plurality of special arithmetic units can be combined together through a transmission channel, the size of a combined differential arithmetic unit array is changed according to the characteristics of a computational fluid dynamics algorithm, and different differential calculation formats can be realized by changing instructions in an instruction memory, so that the method has the advantages of flexibility and programmability.

The instruction set is designed according to the characteristic that computational fluid dynamics are calculated by using a finite difference method, and the instruction set is the minimum set capable of completing the difference operation. The instruction set is in a 16-bit RISC instruction set encoding format, and the instruction memory is 128 multiplied by 16 bits, namely all instruction encoding is 16-bit equal in length.

In the above computational fluid dynamics-oriented acceleration calculation apparatus, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Further, the differential operation unit further includes: the instruction controller is used for controlling the address of the instruction to be executed; the instruction memory is used for storing instructions to be executed; a plurality of general purpose registers (gr) for storing register data; and the arithmetic logic operation unit is used for performing logic operation on the operands.

The hardware structure is simple, a data memory is eliminated on the hardware structure, and each DPE only comprises four parts, namely an instruction controller, an instruction memory, a general register and an arithmetic logic operation unit.

Further, the command controller includes: a self-adding one adder and an alternative multiplexer.

Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and write-back; in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register; in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers; in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump; and in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

In a specific embodiment, as shown in fig. 2, the differential operation unit DPE oriented to computational fluid dynamics includes four parts, namely an instruction controller, an instruction memory, a general purpose register and an arithmetic logic operation unit, wherein the instruction controller is used for controlling an address of an instruction to be executed, the instruction controller includes an adder for adding one and a multiplexer for selecting one from two, the instruction memory is used for storing the instruction to be executed, the general purpose register is used for storing register data, the bit width is 32 bits, the depth is 60, the arithmetic logic operation unit is used for performing a logic operation on an operand, and the execution of an instruction includes four clock cycles, namely, address IF, decode ID, execution EX and write-back WB, and the specific process is as follows:

1) Address (IF):

in the address stage, an instruction is read from an instruction memory according to the PC value and is sent to an instruction register (ID _ ir), and meanwhile, the value of the PC in the next period is set, and the instructions can be executed in sequence or jump to a specific address for execution.

2) Decoding (ID):

the decode stage decodes the instruction, extracts the corresponding operands from the instruction according to the instruction function (i.e., the opcode), and places the extracted operands in register a (reg _ a) and register B (reg _ B).

3) Execution (EX):

the EX stage operates reg _ a and reg _ B in the arithmetic logic unit ALU according to the instruction function, stores the operation result in a register C (reg _ C), sets the flag register flag to 0 or 1 according to the instruction function and the operation result, and is used for the instruction controller to determine whether the next instruction is executed sequentially or jumped, i.e. whether the value of the next clock cycle PC is a value in reg _ C or self-increment.

4) Write-back (WB):

the WB stage determines whether to modify the register value and how to modify it according to the instruction function and the EX stage result, and if so, stores the reg _ C value into the corresponding position of the general register. This stage is only valid for instructions that need to modify register values.

Furthermore, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit; the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter the transmission channel through the communication register nearest to the adjacent differential operation units for data communication.

Specifically, as shown in fig. 3. The special differential operation unit (DPE) is provided with four Communication registers (cr) and is positioned at four edges of the DPE to form four transmission channels, the four Communication registers (cr 0-cr 4) are connected with an internal general register 55 (gr 55), the gr58 is used for storing numerical values required by the DPE to perform differential operation, an adjacent DPE enters the transmission channel through the Communication register nearest to the adjacent DPE for data Communication, and the bit width of the Communication register is 32 bits. Four input ports are added for each DPE and are connected with internal general registers 56-59 (gr 56-gr 59), four output ports are added and are respectively connected with four communication register registers, for example, DPE _ X _ Y, wherein _ X _ Y represents the position of the DPE, and DPE _ X _ Y is correspondingly connected with four adjacent DPEs through the added input and output ports to acquire data of adjacent points and transmit the data to the adjacent points.

Further, the instruction set adopted by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm; the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.

Further, the instruction set is divided into three types according to the operand, including: register type (R type), immediate type (I type), and blend type (RI type). The computational fluid dynamics oriented instruction set encoding format is shown in table 1.

TABLE 1 instruction set encoding format for computational fluid dynamics

Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control instructions, operation instructions and data transfer instructions; wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data move class instruction comprises: and (5) moving the command.

Specifically, 3 general classes of 14 strip machine instructions are set and implemented for computational fluid dynamics algorithm characteristics. Wherein, the control class instruction: null instructions (NOP), stop instructions (HALT), branch jump instructions (BZ, BNZ, BN, BNN); operation class instructions: a fixed point immediate addition instruction (ADDI), a fixed point immediate subtraction instruction (SUBI), a fixed point comparison instruction (CMP), an immediate jump instruction (JUMPI), a floating point register addition instruction (ADDF), a floating point register subtraction instruction (SUBF), a floating point register multiplication instruction (MULF); data move class instruction: move instruction (MOV). The specific format and operation of the computational fluid dynamics oriented instruction set is shown in table 2.

TABLE 2 instruction set specific Format and operation List for computational fluid dynamics

In one embodiment, as shown in fig. 4, a computational fluid dynamics-oriented acceleration calculation method is provided, which is used for implementing computational fluid dynamics-oriented acceleration calculation by using any one of the acceleration calculation apparatuses described above; the method comprises the following steps:

step 400: and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.

Step 402: and determining the number of required differential operation units and transmission channels according to an iterative calculation formula, and iterating in time to determine the program cycle number.

Step 404: all the differential operation units are combined together through a transmission channel, and initialization data are set and stored in a general register of the differential operation units.

Step 406: setting instructions and an execution sequence in an instruction memory according to an iterative calculation formula and program cycle times to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.

Step 408: and operating a calculation program, and outputting differential operation results of all nodes in the flow field at different moments.

In a collective embodiment, firstly, a special differential operation unit (DPE) and an instruction set are designed according to the calculation characteristics of a finite difference method widely used in computational fluid mechanics; secondly, setting a transmission channel (Chanel) for the operation unit, so that data can be transmitted between adjacent differential operation units for differential operation; and finally, combining a plurality of differential operation units through a transmission channel to complete differential operation of all nodes in the flow field in parallel. As shown in fig. 5, taking a two-dimensional linear convection equation as an example, the specific process is as follows:

(1) And according to the characteristics of the two-dimensional linear convection equation, the two-dimensional linear convection equation is expanded in space to determine the number of needed DPE units and transmission channels, and iteration is performed in time to determine the program cycle number.

The expression of the two-dimensional linear convection equation is:

by finite difference methods, in which the difference is forward in time, backward in space, and deformed to obtain

The velocity calculation formula of (c):

wherein the content of the first and second substances,

which is indicative of the current flow field velocity,

is the flow field velocity at the next moment,

as a matter of time, the time is,

is a two-dimensional spatial coordinate and is,

、

and

for discretization step length, and for a known constant, setting the flow field velocity at the initial moment as:

the boundary conditions are as follows:

therefore, the flow field speed at any time can be calculated iteratively.

Is provided with

6400 (80 rows x 80 columns) DPEs are used to calculate the velocity of all discrete points in the two-dimensional space at a time, for a total of 100 iterations.

(2) 6400 ((0-79) row x (0-79) column) arithmetic units (DPE) are combined together through a transmission channel, and initialization data is set and stored in a general register as follows:

6400 ((0-79) row x (0-79) column) arithmetic units (DPEs) are combined together through a transmission channel, a flow field speed at an initial time is stored in gr1, values of gr1 in DPEs of 19 th to 39 th rows 19 to 39 th rows 39 are set to be 2' (32 ' is converted to binary single-precision floating point is 32 b0 \ 10000000 \ 0000000000000000000, and the same applies below), values of gr1 in the rest DPEs are 1 (32 ' b0 \ 01111111 \ u 00000000000000000000000), and flow field speeds stored in gr2 of all DPEs are set to be 1 ' (32 ' b0 \ 01111111 \ u 000000000000000000000000000)

（

) The gr3 storage count of all DPEs is 32'd100 (the count does not require floating point arithmetic). In addition, two registers of gr56 and gr57 of all columns of row 0, all columns of row 79, all rows of column 0 and all rows of column 79 are constrained to be 1 (32' b0 \ u 01111111 \ u 00000000000000000000000), because the differential characteristic of the two-dimensional linear convection equation only uses the data of the upper level, only two communication registers of cr2 and cr3 are used here, and cr0 and cr1 are not used.

(3) The instructions of the instruction memory are set, and pseudo codes set by the instructions in the instruction memory are shown in table 3.

Table 3: pseudo code for instruction setup in instruction memory

(4) And operating the program, and outputting a result: and outputting the difference operation results of all the nodes in the flow field at different moments.

In summary, the computational fluid dynamics-oriented acceleration computing device and method of the present invention have the advantages of simple hardware structure, high computational energy efficiency, flexibility, programmability, etc., and can accelerate computational fluid dynamics. Taking the implementation of the two-dimensional linear convection equation as an example, 6400 DPE operation units are combined, the program length is 22 instructions (the cycle length is 19), the running time is 1908 clock cycles, and each DPE uses 11 general registers and 2 communication registers. Aiming at the characteristics of the CFD algorithm, the size of the combined DPE array is changed, and different differential calculation formats can be realized by changing the instructions in the instruction memory, so that the method is flexible and programmable.

It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An accelerated computing device oriented to computational fluid dynamics, the accelerated computing device comprising: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field;

the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel;

the differential operation unit also comprises a plurality of general registers for storing register data;

the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit;

2. An accelerated computing apparatus according to claim 1, wherein the differential operation unit further comprises:

the instruction controller is used for controlling the address of the instruction to be executed;

the instruction memory is used for storing instructions to be executed;

3. The accelerated computing device of claim 2, wherein the command controller comprises: a self-adding one adder and an alternative multiplexer.

4. The apparatus of claim 2, wherein the differential arithmetic unit executes an instruction comprising an address, decode, execute, and write back four clock cycles;

in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers;

in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump;

and in the write-back stage, judging whether the value of the general register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.

5. The device of claim 2, wherein the number of the general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.

6. An accelerated computing apparatus according to claim 1, wherein the instruction set employed by the differential operation unit is set according to a characteristic of a computational fluid dynamics algorithm;

the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instructions are as follows: the opcode is 4 bits in length and there are two operands, both 6 bits in length.

7. The accelerated computing apparatus of claim 6, wherein the instruction set is classified into three types according to operand, comprising: register type, immediate type, and hybrid type.

8. The accelerated computing apparatus of claim 6, wherein the set of instructions is classified according to a function of the instruction, the set of instructions comprising: control instructions, operation instructions and data moving instructions;

wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data moving instruction comprises the following steps: and (5) moving the command.

9. A computational fluid dynamics-oriented acceleration computing method for implementing computational fluid dynamics-oriented acceleration computing using the acceleration computing device according to any one of claims 1 to 8; the method comprises the following steps:

expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula;

determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number;

all the differential operation units are combined together through the data transmission channel, and initialization data are set and stored in a general register of the differential operation units;

setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used for realizing different differential calculation formats;