CN106527999A

CN106527999A - Hardware acceleration device and acceleration method for solving differential equations

Info

Publication number: CN106527999A
Application number: CN201611088172.4A
Authority: CN
Inventors: 姚小城; 殷进勇; 刘煜; 王洋; 吴建鲁; 李毅; 陶峥嵘; 董海祥; 王永; 李小亮
Original assignee: 716th Research Institute of CSIC
Current assignee: 716th Research Institute of CSIC
Priority date: 2016-12-01
Filing date: 2016-12-01
Publication date: 2017-03-22

Abstract

The invention discloses a hardware acceleration device for solving differential equations. The device is composed of an input-output caching module, a series-parallel conversion module and a Runge-Kutta hardware acceleration module. The input-output caching module is mainly used for caching direct interactive data with a main control, and all data between an acceleration unit and the main control interacts via an input-output cache. The series-parallel conversion module is mainly used for performing multiple parallel outputs on the read data input to the cache according to needs, and converting the parallel output data of the hardware acceleration module into series data. The Runge-Kutta hardware acceleration module is mainly used for implementing hardware acceleration for solving differential equations. The hardware acceleration device for solving differential equations realizes generality of hardware acceleration for solving the differential equations by putting forwarding a general computing architecture, combining the partial reconfigurability of an FPGA (Field Programmable Gate Array) and flexibly configuring a core computing unit, and thus effectively meets the application requirement for quickly solving various differential equations.

Description

A kind of hardware accelerator and accelerated method for solving the differential equation

Technical field

The present invention relates to a kind of accelerator for solving the differential equation, particularly one kind are asked for the Runge Kutta differential equation The hardware accelerator and accelerated method of solution.

Background technology

ODE is research natural science and the motion of things, object and phenomenon, evolution and change rule in social science The mathematical theory the most basic and method of rule.Many principles in each field can be described as appropriate ODE, such as The law of motion of newton, the law of universal gravitation, law of conservation of energy, population development rule, disease propagation, market clearing price Variation tendency etc..In most cases, the analytical expression for finding complicated ODE is extremely difficult, therefore conventional approximate solution Method obtains result of calculation, and number crunching numerical solution is generally used in engineering.Quick with computer science and technology sends out Exhibition, classical ordinary differential numerical computation method experienced a revaluation, screening, transformation and the process innovated, and emerged in large numbers and permitted Many new ideas, new problem and new departures of computer potentiality can be played.Its operation efficiency is improved by the way of devices at full hardware accelerates A study hotspot is become.

But associated description is there is no in prior art.

The content of the invention

It is an object of the invention to provide a kind of accelerator and accelerated method for solving the differential equation, accelerates the differential equation The solution of group.

The technical solution for realizing the object of the invention is：A kind of accelerator for solving the differential equation, the device is by defeated Enter output buffer module, serioparallel exchange module and the hardware accelerator based on Long Gekuta to constitute.Input into/output from cache module It is mainly used in the caching with master control direct interaction data, all data between accelerator module and master control are slow by input and output Deposit interaction.String simultaneously/parallel serial conversion module is mainly responsible for the data of the reading of input-buffer various parallel outputs as needed, The parallel output data of hardware accelerator is converted to into serial data.Mainly it is responsible for based on the hardware accelerator of Long Gekuta Realize the hardware-accelerated of differential equation.

This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can with reference to FPGA local Reconstruction property, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.

The read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, by what is read Data parallel is connected with the hardware acceleration unit based on Long Gekuta, and the module data input bit is a width of 32, outputs data bits 4096 bit wides of wide maximum support, specifically can be configured according to demand.Parallel serial conversion module is mainly by result of calculation with 32 Position bit wide output the write operation of controlled output caching.

A kind of calculation process for solving the hardware accelerator of the differential equation is as follows：

A) it is first determined whether being differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into 1 Rank differential equation group；If first-order system, then step b) is directly carried out；

B) 1 rank differential equation group is solved using 4 classical rank Long Gekuta methods；

C) realize for the hardware of 4 rank Long Gekuta of b) step, depth is carried out to 4 subiterations processes in each iteration Degree optimization, main iteration adopt identical hardware configuration, and the sub- iteration in main iteration adopts identical hardware configuration, antithetical phrase to change every time Realization parallel hardware architecture to greatest extent in generation；

D) it is for the purpose in reaching b), complex to solving radical sign, trigonometric function solution, division, traversal lookup etc. Algorithm, is carried out using parallel arithmetic element, and adopts various optimization means, and such as division increases each iteration step length, triangle letter Number adopts two steps to merge, compares for the constant table of some needs traversals adopts the mode of coding that traversal is substituted in the way of tabling look-up Behaviour etc..The maximum delay path in every subiterations is found, is reduced during calculation delay by increasing degree of parallelism, it is other non- Maximum delay path needs the compromise for carrying out resource and speed.

Due to accelerating module and the communication of primary processor take it is larger, using by whole algorithm devices at full hardwareization run, reduce Interaction, primary processor only need to for parameter initialization value, step-length and number control signal to be sent to accelerating module, accelerating module Calculating intermediate data is then carried out by dma mode, successfully will communication when accounting for than control in relatively low level, its communication account for when ratio with The increase of algorithmic variable number, computation complexity and iterations and reduce.

Compared with prior art, remarkable advantage of the invention is：1) present invention realizes Systems of Ordinary Differential Equations using hardware Accelerate, with higher speed-up ratio；2) present invention proposes general interface structure so that the invention can meet different parameters Amount, the solution demand of the differential equation group of variable number；3) core calculations accelerator module borrows FPGA reconfigurable functions, for difference Equation group, reconstruct the core calculations unit, improve the versatility of resource utilization and invention.

Below in conjunction with the accompanying drawings the present invention is described in further detail.

Description of the drawings

Fig. 1 is a kind of accelerator composition schematic diagram for solving the differential equation.

Fig. 2 is serioparallel exchange and parallel serial conversion module schematic diagram, wherein figure (a) is serioparallel exchange module schematic diagram, is schemed (b) It is parallel serial conversion module schematic diagram.

Fig. 3 is the hardware acceleration unit calculation flow chart based on Long Gekuta.

Specific embodiment

With reference to Fig. 1, a kind of hardware accelerator for solving the differential equation, by input into/output from cache module, is gone here and there and is turned Mold changing block and the hardware accelerator based on Long Gekuta are constituted.Input into/output from cache module is mainly used in and master control direct interaction The caching of data, all data between accelerator module and master control are interacted by input into/output from cache.String simultaneously/parallel-serial conversion mould Block is mainly responsible for by the data of the reading of input-buffer various parallel outputs as needed, by the parallel defeated of hardware accelerator Go out data and be converted to serial data.Hardware accelerator based on Long Gekuta is mainly responsible for the hardware for realizing differential equation Accelerate.This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can be with reference to FPGA local again Structure characteristic, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.

With reference to Fig. 2, the read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, The data parallel for reading is connected with the hardware acceleration unit based on Long Gekuta, the module data input bit is a width of 32, defeated Go out data bit width 4096 bit wides of maximum support, specifically can be configured according to demand.Parallel serial conversion module will mainly be calculated As a result the write operation for being cached with 32 bit wide outputs controlled output.

A kind of specifically, hardware accelerator for solving the differential equation of the invention, including input buffer module, Output buffer module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta；Input-buffer mould The data bit width cached in block, output buffer module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module, Output buffer module is connected with parallel serial conversion module, output and the hardware acceleration unit based on Long Gekuta of serioparallel exchange module Connected, there is provided the differential equation calculates required all data, based on the hardware acceleration unit of Long Gekuta output and and string turn Mold changing block is connected, by result of calculation output to parallel serial conversion module；

Wherein input buffer module by serioparallel exchange module and enters row information based on the hardware acceleration unit of Long Gekuta Interaction, output buffer module carries out information exchange by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta, defeated Enter cache module, output buffer module for the caching of interaction data between main control unit, added based on the hardware of Long Gekuta All data between fast unit and main control unit are interacted by input into/output from cache module；

The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, and string turns The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by mold changing block；

Hardware accelerator based on Long Gekuta is used to realize the hardware-accelerated of differential equation.

The input buffer module includes input FIFO and control register group two parts, and input FIFO is passed for data Pass, control register group be used for control signal transmission, control register by host computer assignment, by the hardware based on Long Gekuta Accelerator module is resetted.

Described output buffer module includes output FIFO and status register group two parts, and output FIFO is used to calculate knot Fruit data transfer, status register group are used for the transmission of status signal, and status register is by based on the hardware-accelerated of Long Gekuta Unit carries out assignment, is resetted by host computer.

The serioparallel exchange module carries out read operation control to the input FIFO of input buffer module according to control signal, reads The data for taking carry out parallel output, while changing to control signal, generation is based on the hardware acceleration unit of Long Gekuta Control signal, while carrying out reset operation to control signal.

The parallel serial conversion module carries out write operation control to the output FIFO of output caching according to control signal, to state Register carries out assignment.

Using classical quadravalence Long Gekutafa, all calculating are adopted the hardware acceleration unit based on Long Gekuta Single precision/double-precision floating point is calculated, and for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware and accelerates Unit, the FPU Float Point Unit of support include addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, its core Restructural scheme of the differential equation part using FPGA, can be completed to different differential sides by parameter setting and reconstruct Journey is solved.

The data input bit wide of serioparallel exchange module is 32, and output data bit wide is maximum to support 4096 bit wides；And string turns Mold changing block is the write operation that result of calculation is exported simultaneously controlled output cache module with 32 bit wides.

The calculating intermediate data of the hardware acceleration unit based on Long Gekuta is transmitted by dma mode.

It is a kind of based on the above-mentioned accelerated method for solving the hardware accelerator of the differential equation, comprise the following steps：

Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to First order differential equation system, then execution step 2；If first-order system, then direct execution step 2；

Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system；Wherein main iteration adopts phase Same hardware configuration, using identical hardware configuration, sub- iteration is using parallel to greatest extent for the sub- iteration in main iteration every time Hardware configuration is realized.During sub- iteration, the maximum delay path in every subiterations is found, reduced by increasing degree of parallelism During calculation delay.

Using hardware, the present invention realizes that Systems of Ordinary Differential Equations accelerates, with higher speed-up ratio.

With reference to specific embodiment, the present invention will be further described.

Embodiment 1

With reference to Fig. 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows：

As shown in figure 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows：

Claims

1. a kind of hardware accelerator for solving the differential equation, it is characterised in that including input buffer module, output caching Module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta；Input buffer module, output The data bit width cached in cache module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module, and output is slow Storing module is connected with parallel serial conversion module, and the output of serioparallel exchange module is connected with the hardware acceleration unit based on Long Gekuta, All data needed for the differential equation is calculated, the output and parallel serial conversion module based on the hardware acceleration unit of Long Gekuta are provided It is connected, by result of calculation output to parallel serial conversion module；

Wherein input buffer module carries out information exchange with the hardware acceleration unit based on Long Gekuta by serioparallel exchange module, Output buffer module carries out information exchange, input-buffer by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta The caching of module, output buffer module for interaction data between main control unit, the hardware acceleration unit based on Long Gekuta All data between main control unit are interacted by input into/output from cache module；

The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, parallel-serial conversion mould The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by block；

2. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the input is delayed Storing module includes input FIFO and control register group two parts, and input FIFO is used for data transfer, and control register group is used for The transmission of control signal, control register are resetted by the hardware acceleration unit based on Long Gekuta by host computer assignment.

3. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described output Cache module includes output FIFO and status register group two parts, and output FIFO is transmitted for calculation result data, and state is posted Storage group is used for the transmission of status signal, and status register carries out assignment by the hardware acceleration unit based on Long Gekuta, by upper Position machine is resetted.

4. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the string simultaneously turns Mold changing root tuber carries out read operation control according to control signal to the input FIFO of input buffer module, and the data of reading carry out parallel defeated Go out, while changing to control signal, produce the control signal of the hardware acceleration unit based on Long Gekuta, while to control Signal carries out reset operation.

5. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described and go here and there and turn Mold changing root tuber carries out write operation control to the output FIFO of output caching according to control signal, carries out assignment to status register.

6. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described based on dragon Using classical quadravalence Long Gekutafa, all calculating adopt single precision/double-precision floating point to the hardware acceleration unit of Ge Kuta Calculate, for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware accelerator module, the floating-point fortune of support Calculating unit includes addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, and the differential equation part of its core is adopted With the restructural scheme of FPGA, can be completed to different differential equations by parameter setting and reconstruct.

7. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that serioparallel exchange mould The data input bit wide of block is 32, and output data bit wide is maximum to support 4096 bit wides；Parallel serial conversion module is by result of calculation The write operation of simultaneously controlled output cache module is exported with 32 bit wides.

8. the hardware accelerator for solving the differential equation according to claim 6, it is characterised in that described based on dragon The calculating intermediate data of the hardware acceleration unit of Ge Kuta is transmitted by dma mode.

9. a kind of accelerated method for solving the hardware accelerator of the differential equation based on described in claim 1, its feature It is to comprise the following steps：

Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into single order Differential equation group, then execution step 2；If first-order system, then direct execution step 2；

Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system；Wherein main iteration adopts identical Hardware configuration, using identical hardware configuration, sub- iteration is using Parallel Hardware to greatest extent for the sub- iteration in main iteration every time Structure is realized.

10. accelerated method according to claim 9, it is characterised in that during step 2 neutron iteration, find every subiterations In maximum delay path, by increasing degree of parallelism reducing during calculation delay.