CN115291949B - Accelerated computing device and accelerated computing method for computational fluid dynamics - Google Patents

Accelerated computing device and accelerated computing method for computational fluid dynamics Download PDF

Info

Publication number
CN115291949B
CN115291949B CN202211171216.5A CN202211171216A CN115291949B CN 115291949 B CN115291949 B CN 115291949B CN 202211171216 A CN202211171216 A CN 202211171216A CN 115291949 B CN115291949 B CN 115291949B
Authority
CN
China
Prior art keywords
instruction
differential operation
register
instructions
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211171216.5A
Other languages
Chinese (zh)
Other versions
CN115291949A (en
Inventor
龚艳琼
刘必慰
赵玉新
黄东昌
郭阳
江豪龙
赖雯
王洁
杨益斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211171216.5A priority Critical patent/CN115291949B/en
Publication of CN115291949A publication Critical patent/CN115291949A/en
Application granted granted Critical
Publication of CN115291949B publication Critical patent/CN115291949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Abstract

The application relates to computational fluid dynamics and computer technology field, in particular to an accelerated computing device and an accelerated computing method for computational fluid dynamics. The accelerated computing device includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the data transmission channels are arranged among the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.

Description

Accelerated computing device and accelerated computing method for computational fluid dynamics
Technical Field
The present application relates to the field of computational fluid dynamics and computer technologies, and in particular, to an accelerated computing apparatus and an accelerated computing method for computational fluid dynamics.
Background
Computational Fluid Dynamics (CFD) is a basic control equation for solving Fluid mechanics by using a computer, and the flow characteristics of a complex flow field can be simulated relatively easily and accurately. In the discretization method of numerical computation analysis simulation using CFD, the finite difference method is a typical method in the numerical solution, and different difference computation formats can be combined by combining the time and space difference formats. However, the geometric shapes, numerical methods, physical and chemical models and the like required by the current CFD are increasingly complex and fine, and extremely high requirements are put forward for large-scale calculation. And for the accurate simulation of real flow, the calculated amount is huge, and especially for the accurate numerical simulation of a full-size model, the current calculation capability still cannot be achieved.
In order to accelerate the CFD calculation, various researchers in various countries carry out a lot of researches, and the development of CFD is effectively promoted. However, due to the limited computing power of computers, it still faces a serious challenge to implement CFD high performance on general purpose computing hardware.
Disclosure of Invention
In view of the above, it is necessary to provide a computational fluid dynamics-oriented accelerated computing apparatus and an accelerated computing method that have a simple hardware structure, are computationally efficient, and are flexible and programmable.
A computational fluid dynamics-oriented acceleration computing device, the acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field.
The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel.
Further, the differential operation unit further includes:
and the instruction controller is used for controlling the address of the instruction to be executed.
And the instruction memory is used for storing instructions to be executed.
And the plurality of general registers are used for storing register data.
And the arithmetic logic operation unit is used for carrying out logic operation on the operand.
Further, the instruction controller includes: an adder for adding one and a multiplexer for selecting one from two.
Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and writing back.
In the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;
in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers.
In the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump.
And in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
Further, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit.
The four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.
Furthermore, the number of the general registers in the differential operation unit is 60, and the width of the general registers is 64 bits.
Further, the instruction set used by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm.
The instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.
Further, the instruction set is divided into three types according to the difference of operands, including: register type, immediate type, and hybrid type.
Further, the set of instructions is classified according to their function, the set of instructions comprising: control class instructions, arithmetic class instructions, and data movement class instructions.
Wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data migration class instruction comprises the following steps: and (5) moving the command.
An accelerated calculation method for computational fluid dynamics is used for realizing accelerated calculation for computational fluid dynamics by adopting the accelerated calculation device; the method comprises the following steps:
and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.
And determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number.
All the differential operation units are combined together through the transmission channel, and initialization data are set and stored in a general register of the differential operation units.
Setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.
And operating the calculation program, and outputting the difference operation results of all nodes in the flow field at different moments.
In the above computational fluid dynamics-oriented acceleration calculation apparatus and acceleration calculation method, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.
Drawings
FIG. 1 is a schematic diagram of an accelerated computational device oriented to computational fluid dynamics in one embodiment;
FIG. 2 is a diagram of a computational fluid dynamics oriented arithmetic unit in one embodiment;
FIG. 3 is a schematic diagram of a computational fluid dynamics oriented transport channel in one embodiment;
FIG. 4 is a schematic flow chart of a computational fluid dynamics oriented acceleration calculation method according to another embodiment;
FIG. 5 is a schematic structural diagram of an accelerated computing device for solving two-dimensional linear convection equations for computational fluid dynamics in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in FIG. 1, there is provided a computational fluid dynamics oriented acceleration computing device comprising: and the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in the flow field. Wherein, the differential operation units (DPE) are designed according to the calculation characteristics of the finite differential method widely used for calculating fluid mechanics, and the number of the differential operation units is determined according to the condition that a fluid mechanics equation of the fluid mechanics problem to be solved is expanded on the space.
The differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together and completing differential operation of all nodes in a flow field in parallel. A plurality of special arithmetic units can be combined together through a transmission channel, the size of a combined differential arithmetic unit array is changed according to the characteristics of a computational fluid dynamics algorithm, and different differential calculation formats can be realized by changing instructions in an instruction memory, so that the method has the advantages of flexibility and programmability.
The instruction set is designed according to the characteristic that computational fluid dynamics are calculated by using a finite difference method, and the instruction set is the minimum set capable of completing the difference operation. The instruction set is in a 16-bit RISC instruction set encoding format, and the instruction memory is 128 multiplied by 16 bits, namely all instruction encoding is 16-bit equal in length.
In the above computational fluid dynamics-oriented acceleration calculation apparatus, the acceleration calculation apparatus includes: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and completing node differential operation in a flow field; the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining all the differential operation units together and completing differential operation of all nodes in a flow field in parallel. The device has simple hardware structure, the transmission channel is arranged between the differential operation units for differential calculation, a large amount of time delay caused by data transmission through the global memory is reduced, meanwhile, the data memory is removed, the use of calculation resources is reduced to a great extent, and the device also has the advantages of flexibility and programmability.
Further, the differential operation unit further includes: the instruction controller is used for controlling the address of the instruction to be executed; the instruction memory is used for storing instructions to be executed; a plurality of general purpose registers (gr) for storing register data; and the arithmetic logic operation unit is used for performing logic operation on the operands.
The hardware structure is simple, a data memory is eliminated on the hardware structure, and each DPE only comprises four parts, namely an instruction controller, an instruction memory, a general register and an arithmetic logic operation unit.
Further, the command controller includes: a self-adding one adder and an alternative multiplexer.
Furthermore, the differential operation unit executes an instruction and comprises four clock cycles of address fetching, decoding, executing and write-back; in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register; in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers; in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump; and in the write-back stage, judging whether the value of the register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
In a specific embodiment, as shown in fig. 2, the differential operation unit DPE oriented to computational fluid dynamics includes four parts, namely an instruction controller, an instruction memory, a general purpose register and an arithmetic logic operation unit, wherein the instruction controller is used for controlling an address of an instruction to be executed, the instruction controller includes an adder for adding one and a multiplexer for selecting one from two, the instruction memory is used for storing the instruction to be executed, the general purpose register is used for storing register data, the bit width is 32 bits, the depth is 60, the arithmetic logic operation unit is used for performing a logic operation on an operand, and the execution of an instruction includes four clock cycles, namely, address IF, decode ID, execution EX and write-back WB, and the specific process is as follows:
1) Address (IF):
in the address stage, an instruction is read from an instruction memory according to the PC value and is sent to an instruction register (ID _ ir), and meanwhile, the value of the PC in the next period is set, and the instructions can be executed in sequence or jump to a specific address for execution.
2) Decoding (ID):
the decode stage decodes the instruction, extracts the corresponding operands from the instruction according to the instruction function (i.e., the opcode), and places the extracted operands in register a (reg _ a) and register B (reg _ B).
3) Execution (EX):
the EX stage operates reg _ a and reg _ B in the arithmetic logic unit ALU according to the instruction function, stores the operation result in a register C (reg _ C), sets the flag register flag to 0 or 1 according to the instruction function and the operation result, and is used for the instruction controller to determine whether the next instruction is executed sequentially or jumped, i.e. whether the value of the next clock cycle PC is a value in reg _ C or self-increment.
4) Write-back (WB):
the WB stage determines whether to modify the register value and how to modify it according to the instruction function and the EX stage result, and if so, stores the reg _ C value into the corresponding position of the general register. This stage is only valid for instructions that need to modify register values.
Furthermore, the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit; the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter the transmission channel through the communication register nearest to the adjacent differential operation units for data communication.
Specifically, as shown in fig. 3. The special differential operation unit (DPE) is provided with four Communication registers (cr) and is positioned at four edges of the DPE to form four transmission channels, the four Communication registers (cr 0-cr 4) are connected with an internal general register 55 (gr 55), the gr58 is used for storing numerical values required by the DPE to perform differential operation, an adjacent DPE enters the transmission channel through the Communication register nearest to the adjacent DPE for data Communication, and the bit width of the Communication register is 32 bits. Four input ports are added for each DPE and are connected with internal general registers 56-59 (gr 56-gr 59), four output ports are added and are respectively connected with four communication register registers, for example, DPE _ X _ Y, wherein _ X _ Y represents the position of the DPE, and DPE _ X _ Y is correspondingly connected with four adjacent DPEs through the added input and output ports to acquire data of adjacent points and transmit the data to the adjacent points.
Furthermore, the number of the general registers in the differential operation unit is 60, and the width of the general registers is 64 bits.
Further, the instruction set adopted by the differential operation unit is set according to the characteristics of the computational fluid dynamics algorithm; the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instruction is in a three-address format, the length of an operation code is 4 bits, and two operands with the lengths of 6 bits are provided.
Further, the instruction set is divided into three types according to the operand, including: register type (R type), immediate type (I type), and blend type (RI type). The computational fluid dynamics oriented instruction set encoding format is shown in table 1.
TABLE 1 instruction set encoding format for computational fluid dynamics
Figure 958208DEST_PATH_IMAGE001
Further, the instruction set is classified according to the function of the instruction, and the instruction set includes: control instructions, operation instructions and data transfer instructions; wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data move class instruction comprises: and (5) moving the command.
Specifically, 3 general classes of 14 strip machine instructions are set and implemented for computational fluid dynamics algorithm characteristics. Wherein, the control class instruction: null instructions (NOP), stop instructions (HALT), branch jump instructions (BZ, BNZ, BN, BNN); operation class instructions: a fixed point immediate addition instruction (ADDI), a fixed point immediate subtraction instruction (SUBI), a fixed point comparison instruction (CMP), an immediate jump instruction (JUMPI), a floating point register addition instruction (ADDF), a floating point register subtraction instruction (SUBF), a floating point register multiplication instruction (MULF); data move class instruction: move instruction (MOV). The specific format and operation of the computational fluid dynamics oriented instruction set is shown in table 2.
TABLE 2 instruction set specific Format and operation List for computational fluid dynamics
Figure 536824DEST_PATH_IMAGE002
In one embodiment, as shown in fig. 4, a computational fluid dynamics-oriented acceleration calculation method is provided, which is used for implementing computational fluid dynamics-oriented acceleration calculation by using any one of the acceleration calculation apparatuses described above; the method comprises the following steps:
step 400: and (3) expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula.
Step 402: and determining the number of required differential operation units and transmission channels according to an iterative calculation formula, and iterating in time to determine the program cycle number.
Step 404: all the differential operation units are combined together through a transmission channel, and initialization data are set and stored in a general register of the differential operation units.
Step 406: setting instructions and an execution sequence in an instruction memory according to an iterative calculation formula and program cycle times to obtain a calculation program; the instructions in the instruction memory are used to implement different differential calculation formats.
Step 408: and operating a calculation program, and outputting differential operation results of all nodes in the flow field at different moments.
In a collective embodiment, firstly, a special differential operation unit (DPE) and an instruction set are designed according to the calculation characteristics of a finite difference method widely used in computational fluid mechanics; secondly, setting a transmission channel (Chanel) for the operation unit, so that data can be transmitted between adjacent differential operation units for differential operation; and finally, combining a plurality of differential operation units through a transmission channel to complete differential operation of all nodes in the flow field in parallel. As shown in fig. 5, taking a two-dimensional linear convection equation as an example, the specific process is as follows:
(1) And according to the characteristics of the two-dimensional linear convection equation, the two-dimensional linear convection equation is expanded in space to determine the number of needed DPE units and transmission channels, and iteration is performed in time to determine the program cycle number.
The expression of the two-dimensional linear convection equation is:
Figure 503643DEST_PATH_IMAGE003
by finite difference methods, in which the difference is forward in time, backward in space, and deformed to obtain
Figure 212973DEST_PATH_IMAGE004
The velocity calculation formula of (c):
Figure 84983DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 701910DEST_PATH_IMAGE006
which is indicative of the current flow field velocity,
Figure 523235DEST_PATH_IMAGE007
is the flow field velocity at the next moment,
Figure 685357DEST_PATH_IMAGE008
as a matter of time, the time is,
Figure 592133DEST_PATH_IMAGE009
is a two-dimensional spatial coordinate and is,
Figure 481592DEST_PATH_IMAGE010
Figure 423003DEST_PATH_IMAGE011
and
Figure 254562DEST_PATH_IMAGE012
for discretization step length, and for a known constant, setting the flow field velocity at the initial moment as:
Figure 648634DEST_PATH_IMAGE013
the boundary conditions are as follows:
Figure 341784DEST_PATH_IMAGE014
therefore, the flow field speed at any time can be calculated iteratively.
Is provided with
Figure 606543DEST_PATH_IMAGE015
6400 (80 rows x 80 columns) DPEs are used to calculate the velocity of all discrete points in the two-dimensional space at a time, for a total of 100 iterations.
(2) 6400 ((0-79) row x (0-79) column) arithmetic units (DPE) are combined together through a transmission channel, and initialization data is set and stored in a general register as follows:
6400 ((0-79) row x (0-79) column) arithmetic units (DPEs) are combined together through a transmission channel, a flow field speed at an initial time is stored in gr1, values of gr1 in DPEs of 19 th to 39 th rows 19 to 39 th rows 39 are set to be 2' (32 ' is converted to binary single-precision floating point is 32 b0 \ 10000000 \ 0000000000000000000, and the same applies below), values of gr1 in the rest DPEs are 1 (32 ' b0 \ 01111111 \ u 00000000000000000000000), and flow field speeds stored in gr2 of all DPEs are set to be 1 ' (32 ' b0 \ 01111111 \ u 000000000000000000000000000)
Figure 107538DEST_PATH_IMAGE016
Figure 988906DEST_PATH_IMAGE017
) The gr3 storage count of all DPEs is 32'd100 (the count does not require floating point arithmetic). In addition, two registers of gr56 and gr57 of all columns of row 0, all columns of row 79, all rows of column 0 and all rows of column 79 are constrained to be 1 (32' b0 \ u 01111111 \ u 00000000000000000000000), because the differential characteristic of the two-dimensional linear convection equation only uses the data of the upper level, only two communication registers of cr2 and cr3 are used here, and cr0 and cr1 are not used.
(3) The instructions of the instruction memory are set, and pseudo codes set by the instructions in the instruction memory are shown in table 3.
Table 3: pseudo code for instruction setup in instruction memory
Figure 485747DEST_PATH_IMAGE018
(4) And operating the program, and outputting a result: and outputting the difference operation results of all the nodes in the flow field at different moments.
In summary, the computational fluid dynamics-oriented acceleration computing device and method of the present invention have the advantages of simple hardware structure, high computational energy efficiency, flexibility, programmability, etc., and can accelerate computational fluid dynamics. Taking the implementation of the two-dimensional linear convection equation as an example, 6400 DPE operation units are combined, the program length is 22 instructions (the cycle length is 19), the running time is 1908 clock cycles, and each DPE uses 11 general registers and 2 communication registers. Aiming at the characteristics of the CFD algorithm, the size of the combined DPE array is changed, and different differential calculation formats can be realized by changing the instructions in the instruction memory, so that the method is flexible and programmable.
It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. An accelerated computing device oriented to computational fluid dynamics, the accelerated computing device comprising: the special differential operation units are used for executing a program designed by adopting an instruction set according to the fluid mechanics problem to be solved and finishing node differential operation in a flow field;
the differential operation unit comprises a plurality of transmission channels, and the transmission channels are used for combining adjacent differential operation units together to complete differential operation of all nodes in a flow field in parallel;
the differential operation unit also comprises a plurality of general registers for storing register data;
the number of transmission channels included in the differential operation unit is 4, and the transmission channels are obtained by configuring a communication register in the differential operation unit;
the four communication registers are positioned at four edges of the differential operation unit; the four communication registers are connected with the same internal general register; and the adjacent differential operation units enter a transmission channel through the communication register nearest to the adjacent differential operation units to carry out data communication.
2. An accelerated computing apparatus according to claim 1, wherein the differential operation unit further comprises:
the instruction controller is used for controlling the address of the instruction to be executed;
the instruction memory is used for storing instructions to be executed;
and the arithmetic logic operation unit is used for carrying out logic operation on the operand.
3. The accelerated computing device of claim 2, wherein the command controller comprises: a self-adding one adder and an alternative multiplexer.
4. The apparatus of claim 2, wherein the differential arithmetic unit executes an instruction comprising an address, decode, execute, and write back four clock cycles;
in the address stage, reading an instruction from an instruction memory according to the value of an instruction controller, and sending the instruction into an instruction register;
in the decoding stage, the instruction sent into the instruction register is decoded, corresponding operands are extracted from the instruction according to the operation code, and the two extracted operands are placed into two temporary registers;
in the execution stage, two temporary registers are operated in the arithmetic logic operation unit according to the operation code, the operation result is stored in a third temporary register, and the value of a flag register is set according to the instruction function and the operation result; the value of the flag register is used for the instruction controller to judge whether the next instruction is executed in sequence or jump;
and in the write-back stage, judging whether the value of the general register needs to be modified or not according to the operation code and the operation result, and if so, storing the value of a third temporary register in the corresponding position of the general register.
5. The device of claim 2, wherein the number of the general purpose registers in the differential operation unit is 60, and the width of the general purpose registers is 64 bits.
6. An accelerated computing apparatus according to claim 1, wherein the instruction set employed by the differential operation unit is set according to a characteristic of a computational fluid dynamics algorithm;
the instruction set adopted by the differential operation unit is in a 16-bit RISC instruction set encoding format, wherein the instructions are as follows: the opcode is 4 bits in length and there are two operands, both 6 bits in length.
7. The accelerated computing apparatus of claim 6, wherein the instruction set is classified into three types according to operand, comprising: register type, immediate type, and hybrid type.
8. The accelerated computing apparatus of claim 6, wherein the set of instructions is classified according to a function of the instruction, the set of instructions comprising: control instructions, operation instructions and data moving instructions;
wherein: the control class instructions include: null instructions, stop instructions, branch jump instructions; the operation class instruction comprises: a fixed-point immediate addition instruction, a fixed-point immediate subtraction instruction, a fixed-point comparison instruction, an immediate skip instruction, a floating-point register addition instruction, a floating-point register subtraction instruction, and a floating-point register multiplication instruction; the data moving instruction comprises the following steps: and (5) moving the command.
9. A computational fluid dynamics-oriented acceleration computing method for implementing computational fluid dynamics-oriented acceleration computing using the acceleration computing device according to any one of claims 1 to 8; the method comprises the following steps:
expanding in time and space by adopting a finite difference method according to a fluid equation to be solved to obtain an iterative calculation formula;
determining the number of required differential operation units and transmission channels according to the iterative calculation formula, and iterating in time to determine the program cycle number;
all the differential operation units are combined together through the data transmission channel, and initialization data are set and stored in a general register of the differential operation units;
setting instructions and an execution sequence in an instruction memory according to the iterative calculation formula and the program cycle number to obtain a calculation program; the instructions in the instruction memory are used for realizing different differential calculation formats;
and operating the calculation program, and outputting the difference operation results of all nodes in the flow field at different moments.
CN202211171216.5A 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics Active CN115291949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211171216.5A CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211171216.5A CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Publications (2)

Publication Number Publication Date
CN115291949A CN115291949A (en) 2022-11-04
CN115291949B true CN115291949B (en) 2022-12-20

Family

ID=83833618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211171216.5A Active CN115291949B (en) 2022-09-26 2022-09-26 Accelerated computing device and accelerated computing method for computational fluid dynamics

Country Status (1)

Country Link
CN (1) CN115291949B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787376A (en) * 2004-12-07 2006-06-14 奥特拉股份有限公司 Techniques for implementing hardwired decoders in differential input circuits
CN102842222A (en) * 2012-08-30 2012-12-26 西北工业大学 FPGA (Field Programmable Gate Array) online prediction control method based on Phillips macroscopic traffic flow model
CN102930730A (en) * 2012-11-19 2013-02-13 西安费斯达自动化工程有限公司 Online traffic bottleneck prediction control method based on FPGA and improved Phillips model
CN104639310A (en) * 2014-12-31 2015-05-20 东华大学 Method for detecting capacity of SHA-1 algorithm for resisting attack of differential fault
CN105264779A (en) * 2013-01-22 2016-01-20 阿尔特拉公司 Data compression and decompression using simd instructions
JP2018136255A (en) * 2017-02-23 2018-08-30 セイコーエプソン株式会社 Physical quantity sensor, electronic equipment and mobile object
CN111797045A (en) * 2016-12-21 2020-10-20 艾尔默斯半导体股份公司 Method for initializing a differential two-wire data bus and method for transmitting data
CN112098273A (en) * 2020-08-14 2020-12-18 山东大学 Near-field dynamics-based permeation grouting process simulation method and system
CN112818494A (en) * 2021-02-10 2021-05-18 西北工业大学 Functional gradient flow pipe modal and response analysis method based on differential quadrature method
CN112842312A (en) * 2021-02-01 2021-05-28 上海交通大学 Heart rate sensor and self-adaptive heartbeat lock ring system and method thereof
WO2021245101A1 (en) * 2020-06-05 2021-12-09 Politecnico Di Milano A computing platform for preventing side channel attacks
CN113935258A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Computational fluid dynamics acceleration method, device, equipment and storage medium
WO2022046761A1 (en) * 2020-08-26 2022-03-03 Tpe Midstream Llc Configurable fluid compression apparatus, control, and associated methods
CN115049529A (en) * 2021-03-08 2022-09-13 上海联影医疗科技股份有限公司 Image gradient determination method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RO118150B1 (en) * 1999-11-24 2003-02-28 Ack S.R.L. System for metering the fluid flow
US6925467B2 (en) * 2002-05-13 2005-08-02 Innopath Software, Inc. Byte-level file differencing and updating algorithms
JP5704246B2 (en) * 2011-09-21 2015-04-22 富士通株式会社 Object motion analysis apparatus, object motion analysis method, and object motion analysis program
DE102019112792A1 (en) * 2019-05-15 2020-11-19 Leistritz Pumpen Gmbh Method for determining a flow volume of a fluid conveyed by a pump
CN112099762B (en) * 2020-09-10 2024-03-12 上海交通大学 Synergistic processing system and method for rapidly realizing SM2 cryptographic algorithm

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787376A (en) * 2004-12-07 2006-06-14 奥特拉股份有限公司 Techniques for implementing hardwired decoders in differential input circuits
CN102842222A (en) * 2012-08-30 2012-12-26 西北工业大学 FPGA (Field Programmable Gate Array) online prediction control method based on Phillips macroscopic traffic flow model
CN102930730A (en) * 2012-11-19 2013-02-13 西安费斯达自动化工程有限公司 Online traffic bottleneck prediction control method based on FPGA and improved Phillips model
CN105264779A (en) * 2013-01-22 2016-01-20 阿尔特拉公司 Data compression and decompression using simd instructions
CN104639310A (en) * 2014-12-31 2015-05-20 东华大学 Method for detecting capacity of SHA-1 algorithm for resisting attack of differential fault
CN111797045A (en) * 2016-12-21 2020-10-20 艾尔默斯半导体股份公司 Method for initializing a differential two-wire data bus and method for transmitting data
JP2018136255A (en) * 2017-02-23 2018-08-30 セイコーエプソン株式会社 Physical quantity sensor, electronic equipment and mobile object
WO2021245101A1 (en) * 2020-06-05 2021-12-09 Politecnico Di Milano A computing platform for preventing side channel attacks
CN112098273A (en) * 2020-08-14 2020-12-18 山东大学 Near-field dynamics-based permeation grouting process simulation method and system
WO2022046761A1 (en) * 2020-08-26 2022-03-03 Tpe Midstream Llc Configurable fluid compression apparatus, control, and associated methods
CN112842312A (en) * 2021-02-01 2021-05-28 上海交通大学 Heart rate sensor and self-adaptive heartbeat lock ring system and method thereof
CN112818494A (en) * 2021-02-10 2021-05-18 西北工业大学 Functional gradient flow pipe modal and response analysis method based on differential quadrature method
CN115049529A (en) * 2021-03-08 2022-09-13 上海联影医疗科技股份有限公司 Image gradient determination method, device, equipment and storage medium
CN113935258A (en) * 2021-10-15 2022-01-14 北京百度网讯科技有限公司 Computational fluid dynamics acceleration method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种自主设计的面向E级高性能计算的异构融合加速器;刘胜等;《计算机研究与发展》;20210615;全文 *
基于SW26010处理器的FT程序的性能优化;陶小涵等;《计算机科学》;20190415;全文 *

Also Published As

Publication number Publication date
CN115291949A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
Golub et al. Scientific computing: an introduction with parallel computing
Wong et al. ρ-VEX: A reconfigurable and extensible softcore VLIW processor
Ren et al. FPGA acceleration of the pair-HMMs forward algorithm for DNA sequence analysis
CN103221916A (en) Performing a multiply-multiply-accumulated instruction
KR20100075588A (en) Apparatus and method for performing magnitude detection for arithmetic operations
EP2372587B1 (en) Apparatus and method for simulating a reconfigurable processor
Marta et al. A methodology for the development of discrete adjoint solvers using automatic differentiation tools
TWI724545B (en) Apparatus and method for image processing
CN102081513A (en) Method for performing instruction optimization on column confusion process in advanced encryption standard (AES) encryption algorithm and instruction set processor
Cho et al. FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks
Jacob et al. Accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs
CN115291949B (en) Accelerated computing device and accelerated computing method for computational fluid dynamics
JP4477959B2 (en) Arithmetic processing device for broadcast parallel processing
JP7324754B2 (en) Add instruction with vector carry
Xu et al. Bitslice vectors: A software approach to customizable data precision on processors with SIMD extensions
Bai Progress in the numerical solution of the nonsymmetric eigenvalue problem
Hilewitz et al. Bit matrix multiplication in commodity processors
Conceição et al. Efficient emulation of quantum circuits on classical hardware
Lei et al. FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic
WO2021250392A1 (en) Mixed-element-size instruction
Nodine et al. I/O overhead and parallel VLSI architectures for lattice computations
CN110914800B (en) Register-based complex processing
CN103677735B (en) A kind of data processing equipment and digital signal processor
Arnold Improved DNA-sticker arithmetic: tube-encoded-carry, Logarithmic Number System and Monte-Carlo methods
Jungeblut et al. Resource efficiency of hardware extensions of a 4-issue VLIW processor for elliptic curve cryptography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant