CN106527999A - Hardware acceleration device and acceleration method for solving differential equations - Google Patents
Hardware acceleration device and acceleration method for solving differential equations Download PDFInfo
- Publication number
- CN106527999A CN106527999A CN201611088172.4A CN201611088172A CN106527999A CN 106527999 A CN106527999 A CN 106527999A CN 201611088172 A CN201611088172 A CN 201611088172A CN 106527999 A CN106527999 A CN 106527999A
- Authority
- CN
- China
- Prior art keywords
- module
- output
- hardware
- differential equation
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1673—Details of memory controller using buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/13—Differential equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
Abstract
The invention discloses a hardware acceleration device for solving differential equations. The device is composed of an input-output caching module, a series-parallel conversion module and a Runge-Kutta hardware acceleration module. The input-output caching module is mainly used for caching direct interactive data with a main control, and all data between an acceleration unit and the main control interacts via an input-output cache. The series-parallel conversion module is mainly used for performing multiple parallel outputs on the read data input to the cache according to needs, and converting the parallel output data of the hardware acceleration module into series data. The Runge-Kutta hardware acceleration module is mainly used for implementing hardware acceleration for solving differential equations. The hardware acceleration device for solving differential equations realizes generality of hardware acceleration for solving the differential equations by putting forwarding a general computing architecture, combining the partial reconfigurability of an FPGA (Field Programmable Gate Array) and flexibly configuring a core computing unit, and thus effectively meets the application requirement for quickly solving various differential equations.
Description
Technical field
The present invention relates to a kind of accelerator for solving the differential equation, particularly one kind are asked for the Runge Kutta differential equation
The hardware accelerator and accelerated method of solution.
Background technology
ODE is research natural science and the motion of things, object and phenomenon, evolution and change rule in social science
The mathematical theory the most basic and method of rule.Many principles in each field can be described as appropriate ODE, such as
The law of motion of newton, the law of universal gravitation, law of conservation of energy, population development rule, disease propagation, market clearing price
Variation tendency etc..In most cases, the analytical expression for finding complicated ODE is extremely difficult, therefore conventional approximate solution
Method obtains result of calculation, and number crunching numerical solution is generally used in engineering.Quick with computer science and technology sends out
Exhibition, classical ordinary differential numerical computation method experienced a revaluation, screening, transformation and the process innovated, and emerged in large numbers and permitted
Many new ideas, new problem and new departures of computer potentiality can be played.Its operation efficiency is improved by the way of devices at full hardware accelerates
A study hotspot is become.
But associated description is there is no in prior art.
The content of the invention
It is an object of the invention to provide a kind of accelerator and accelerated method for solving the differential equation, accelerates the differential equation
The solution of group.
The technical solution for realizing the object of the invention is:A kind of accelerator for solving the differential equation, the device is by defeated
Enter output buffer module, serioparallel exchange module and the hardware accelerator based on Long Gekuta to constitute.Input into/output from cache module
It is mainly used in the caching with master control direct interaction data, all data between accelerator module and master control are slow by input and output
Deposit interaction.String simultaneously/parallel serial conversion module is mainly responsible for the data of the reading of input-buffer various parallel outputs as needed,
The parallel output data of hardware accelerator is converted to into serial data.Mainly it is responsible for based on the hardware accelerator of Long Gekuta
Realize the hardware-accelerated of differential equation.
This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can with reference to FPGA local
Reconstruction property, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
The read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, by what is read
Data parallel is connected with the hardware acceleration unit based on Long Gekuta, and the module data input bit is a width of 32, outputs data bits
4096 bit wides of wide maximum support, specifically can be configured according to demand.Parallel serial conversion module is mainly by result of calculation with 32
Position bit wide output the write operation of controlled output caching.
A kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
A) it is first determined whether being differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into 1
Rank differential equation group;If first-order system, then step b) is directly carried out;
B) 1 rank differential equation group is solved using 4 classical rank Long Gekuta methods;
C) realize for the hardware of 4 rank Long Gekuta of b) step, depth is carried out to 4 subiterations processes in each iteration
Degree optimization, main iteration adopt identical hardware configuration, and the sub- iteration in main iteration adopts identical hardware configuration, antithetical phrase to change every time
Realization parallel hardware architecture to greatest extent in generation;
D) it is for the purpose in reaching b), complex to solving radical sign, trigonometric function solution, division, traversal lookup etc.
Algorithm, is carried out using parallel arithmetic element, and adopts various optimization means, and such as division increases each iteration step length, triangle letter
Number adopts two steps to merge, compares for the constant table of some needs traversals adopts the mode of coding that traversal is substituted in the way of tabling look-up
Behaviour etc..The maximum delay path in every subiterations is found, is reduced during calculation delay by increasing degree of parallelism, it is other non-
Maximum delay path needs the compromise for carrying out resource and speed.
Due to accelerating module and the communication of primary processor take it is larger, using by whole algorithm devices at full hardwareization run, reduce
Interaction, primary processor only need to for parameter initialization value, step-length and number control signal to be sent to accelerating module, accelerating module
Calculating intermediate data is then carried out by dma mode, successfully will communication when accounting for than control in relatively low level, its communication account for when ratio with
The increase of algorithmic variable number, computation complexity and iterations and reduce.
Compared with prior art, remarkable advantage of the invention is:1) present invention realizes Systems of Ordinary Differential Equations using hardware
Accelerate, with higher speed-up ratio;2) present invention proposes general interface structure so that the invention can meet different parameters
Amount, the solution demand of the differential equation group of variable number;3) core calculations accelerator module borrows FPGA reconfigurable functions, for difference
Equation group, reconstruct the core calculations unit, improve the versatility of resource utilization and invention.
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Description of the drawings
Fig. 1 is a kind of accelerator composition schematic diagram for solving the differential equation.
Fig. 2 is serioparallel exchange and parallel serial conversion module schematic diagram, wherein figure (a) is serioparallel exchange module schematic diagram, is schemed (b)
It is parallel serial conversion module schematic diagram.
Fig. 3 is the hardware acceleration unit calculation flow chart based on Long Gekuta.
Specific embodiment
With reference to Fig. 1, a kind of hardware accelerator for solving the differential equation, by input into/output from cache module, is gone here and there and is turned
Mold changing block and the hardware accelerator based on Long Gekuta are constituted.Input into/output from cache module is mainly used in and master control direct interaction
The caching of data, all data between accelerator module and master control are interacted by input into/output from cache.String simultaneously/parallel-serial conversion mould
Block is mainly responsible for by the data of the reading of input-buffer various parallel outputs as needed, by the parallel defeated of hardware accelerator
Go out data and be converted to serial data.Hardware accelerator based on Long Gekuta is mainly responsible for the hardware for realizing differential equation
Accelerate.This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can be with reference to FPGA local again
Structure characteristic, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
With reference to Fig. 2, the read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading,
The data parallel for reading is connected with the hardware acceleration unit based on Long Gekuta, the module data input bit is a width of 32, defeated
Go out data bit width 4096 bit wides of maximum support, specifically can be configured according to demand.Parallel serial conversion module will mainly be calculated
As a result the write operation for being cached with 32 bit wide outputs controlled output.
A kind of specifically, hardware accelerator for solving the differential equation of the invention, including input buffer module,
Output buffer module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta;Input-buffer mould
The data bit width cached in block, output buffer module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module,
Output buffer module is connected with parallel serial conversion module, output and the hardware acceleration unit based on Long Gekuta of serioparallel exchange module
Connected, there is provided the differential equation calculates required all data, based on the hardware acceleration unit of Long Gekuta output and and string turn
Mold changing block is connected, by result of calculation output to parallel serial conversion module;
Wherein input buffer module by serioparallel exchange module and enters row information based on the hardware acceleration unit of Long Gekuta
Interaction, output buffer module carries out information exchange by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta, defeated
Enter cache module, output buffer module for the caching of interaction data between main control unit, added based on the hardware of Long Gekuta
All data between fast unit and main control unit are interacted by input into/output from cache module;
The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, and string turns
The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by mold changing block;
Hardware accelerator based on Long Gekuta is used to realize the hardware-accelerated of differential equation.
The input buffer module includes input FIFO and control register group two parts, and input FIFO is passed for data
Pass, control register group be used for control signal transmission, control register by host computer assignment, by the hardware based on Long Gekuta
Accelerator module is resetted.
Described output buffer module includes output FIFO and status register group two parts, and output FIFO is used to calculate knot
Fruit data transfer, status register group are used for the transmission of status signal, and status register is by based on the hardware-accelerated of Long Gekuta
Unit carries out assignment, is resetted by host computer.
The serioparallel exchange module carries out read operation control to the input FIFO of input buffer module according to control signal, reads
The data for taking carry out parallel output, while changing to control signal, generation is based on the hardware acceleration unit of Long Gekuta
Control signal, while carrying out reset operation to control signal.
The parallel serial conversion module carries out write operation control to the output FIFO of output caching according to control signal, to state
Register carries out assignment.
Using classical quadravalence Long Gekutafa, all calculating are adopted the hardware acceleration unit based on Long Gekuta
Single precision/double-precision floating point is calculated, and for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware and accelerates
Unit, the FPU Float Point Unit of support include addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, its core
Restructural scheme of the differential equation part using FPGA, can be completed to different differential sides by parameter setting and reconstruct
Journey is solved.
The data input bit wide of serioparallel exchange module is 32, and output data bit wide is maximum to support 4096 bit wides;And string turns
Mold changing block is the write operation that result of calculation is exported simultaneously controlled output cache module with 32 bit wides.
The calculating intermediate data of the hardware acceleration unit based on Long Gekuta is transmitted by dma mode.
It is a kind of based on the above-mentioned accelerated method for solving the hardware accelerator of the differential equation, comprise the following steps:
Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to
First order differential equation system, then execution step 2;If first-order system, then direct execution step 2;
Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system;Wherein main iteration adopts phase
Same hardware configuration, using identical hardware configuration, sub- iteration is using parallel to greatest extent for the sub- iteration in main iteration every time
Hardware configuration is realized.During sub- iteration, the maximum delay path in every subiterations is found, reduced by increasing degree of parallelism
During calculation delay.
Using hardware, the present invention realizes that Systems of Ordinary Differential Equations accelerates, with higher speed-up ratio.
With reference to specific embodiment, the present invention will be further described.
Embodiment 1
With reference to Fig. 1, a kind of hardware accelerator for solving the differential equation, by input into/output from cache module, is gone here and there and is turned
Mold changing block and the hardware accelerator based on Long Gekuta are constituted.Input into/output from cache module is mainly used in and master control direct interaction
The caching of data, all data between accelerator module and master control are interacted by input into/output from cache.String simultaneously/parallel-serial conversion mould
Block is mainly responsible for by the data of the reading of input-buffer various parallel outputs as needed, by the parallel defeated of hardware accelerator
Go out data and be converted to serial data.Hardware accelerator based on Long Gekuta is mainly responsible for the hardware for realizing differential equation
Accelerate.This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can be with reference to FPGA local again
Structure characteristic, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
With reference to Fig. 2, the read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading,
The data parallel for reading is connected with the hardware acceleration unit based on Long Gekuta, the module data input bit is a width of 32, defeated
Go out data bit width 4096 bit wides of maximum support, specifically can be configured according to demand.Parallel serial conversion module will mainly be calculated
As a result the write operation for being cached with 32 bit wide outputs controlled output.
With reference to Fig. 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
As shown in figure 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
A) it is first determined whether being differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into 1
Rank differential equation group;If first-order system, then step b) is directly carried out;
B) 1 rank differential equation group is solved using 4 classical rank Long Gekuta methods;
C) realize for the hardware of 4 rank Long Gekuta of b) step, depth is carried out to 4 subiterations processes in each iteration
Degree optimization, main iteration adopt identical hardware configuration, and the sub- iteration in main iteration adopts identical hardware configuration, antithetical phrase to change every time
Realization parallel hardware architecture to greatest extent in generation;
D) it is for the purpose in reaching b), complex to solving radical sign, trigonometric function solution, division, traversal lookup etc.
Algorithm, is carried out using parallel arithmetic element, and adopts various optimization means, and such as division increases each iteration step length, triangle letter
Number adopts two steps to merge, compares for the constant table of some needs traversals adopts the mode of coding that traversal is substituted in the way of tabling look-up
Behaviour etc..The maximum delay path in every subiterations is found, is reduced during calculation delay by increasing degree of parallelism, it is other non-
Maximum delay path needs the compromise for carrying out resource and speed.
Due to accelerating module and the communication of primary processor take it is larger, using by whole algorithm devices at full hardwareization run, reduce
Interaction, primary processor only need to for parameter initialization value, step-length and number control signal to be sent to accelerating module, accelerating module
Calculating intermediate data is then carried out by dma mode, successfully will communication when accounting for than control in relatively low level, its communication account for when ratio with
The increase of algorithmic variable number, computation complexity and iterations and reduce.
Claims (10)
1. a kind of hardware accelerator for solving the differential equation, it is characterised in that including input buffer module, output caching
Module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta;Input buffer module, output
The data bit width cached in cache module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module, and output is slow
Storing module is connected with parallel serial conversion module, and the output of serioparallel exchange module is connected with the hardware acceleration unit based on Long Gekuta,
All data needed for the differential equation is calculated, the output and parallel serial conversion module based on the hardware acceleration unit of Long Gekuta are provided
It is connected, by result of calculation output to parallel serial conversion module;
Wherein input buffer module carries out information exchange with the hardware acceleration unit based on Long Gekuta by serioparallel exchange module,
Output buffer module carries out information exchange, input-buffer by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta
The caching of module, output buffer module for interaction data between main control unit, the hardware acceleration unit based on Long Gekuta
All data between main control unit are interacted by input into/output from cache module;
The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, parallel-serial conversion mould
The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by block;
Hardware accelerator based on Long Gekuta is used to realize the hardware-accelerated of differential equation.
2. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the input is delayed
Storing module includes input FIFO and control register group two parts, and input FIFO is used for data transfer, and control register group is used for
The transmission of control signal, control register are resetted by the hardware acceleration unit based on Long Gekuta by host computer assignment.
3. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described output
Cache module includes output FIFO and status register group two parts, and output FIFO is transmitted for calculation result data, and state is posted
Storage group is used for the transmission of status signal, and status register carries out assignment by the hardware acceleration unit based on Long Gekuta, by upper
Position machine is resetted.
4. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the string simultaneously turns
Mold changing root tuber carries out read operation control according to control signal to the input FIFO of input buffer module, and the data of reading carry out parallel defeated
Go out, while changing to control signal, produce the control signal of the hardware acceleration unit based on Long Gekuta, while to control
Signal carries out reset operation.
5. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described and go here and there and turn
Mold changing root tuber carries out write operation control to the output FIFO of output caching according to control signal, carries out assignment to status register.
6. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described based on dragon
Using classical quadravalence Long Gekutafa, all calculating adopt single precision/double-precision floating point to the hardware acceleration unit of Ge Kuta
Calculate, for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware accelerator module, the floating-point fortune of support
Calculating unit includes addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, and the differential equation part of its core is adopted
With the restructural scheme of FPGA, can be completed to different differential equations by parameter setting and reconstruct.
7. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that serioparallel exchange mould
The data input bit wide of block is 32, and output data bit wide is maximum to support 4096 bit wides;Parallel serial conversion module is by result of calculation
The write operation of simultaneously controlled output cache module is exported with 32 bit wides.
8. the hardware accelerator for solving the differential equation according to claim 6, it is characterised in that described based on dragon
The calculating intermediate data of the hardware acceleration unit of Ge Kuta is transmitted by dma mode.
9. a kind of accelerated method for solving the hardware accelerator of the differential equation based on described in claim 1, its feature
It is to comprise the following steps:
Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into single order
Differential equation group, then execution step 2;If first-order system, then direct execution step 2;
Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system;Wherein main iteration adopts identical
Hardware configuration, using identical hardware configuration, sub- iteration is using Parallel Hardware to greatest extent for the sub- iteration in main iteration every time
Structure is realized.
10. accelerated method according to claim 9, it is characterised in that during step 2 neutron iteration, find every subiterations
In maximum delay path, by increasing degree of parallelism reducing during calculation delay.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611088172.4A CN106527999A (en) | 2016-12-01 | 2016-12-01 | Hardware acceleration device and acceleration method for solving differential equations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611088172.4A CN106527999A (en) | 2016-12-01 | 2016-12-01 | Hardware acceleration device and acceleration method for solving differential equations |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106527999A true CN106527999A (en) | 2017-03-22 |
Family
ID=58353906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611088172.4A Pending CN106527999A (en) | 2016-12-01 | 2016-12-01 | Hardware acceleration device and acceleration method for solving differential equations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106527999A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000215057A (en) * | 1998-10-06 | 2000-08-04 | Texas Instr Inc <Ti> | Data processor, electronic communication device, information processing method, and method for using speedup of hardware |
CN102693342A (en) * | 2012-05-24 | 2012-09-26 | 哈尔滨工程大学 | Parameter selecting method for restraining sound wave energy in strong nonlinear medium |
CN105260333A (en) * | 2015-09-24 | 2016-01-20 | 福州瑞芯微电子股份有限公司 | Accelerated processing method and device for audio signal |
-
2016
- 2016-12-01 CN CN201611088172.4A patent/CN106527999A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000215057A (en) * | 1998-10-06 | 2000-08-04 | Texas Instr Inc <Ti> | Data processor, electronic communication device, information processing method, and method for using speedup of hardware |
CN102693342A (en) * | 2012-05-24 | 2012-09-26 | 哈尔滨工程大学 | Parameter selecting method for restraining sound wave energy in strong nonlinear medium |
CN105260333A (en) * | 2015-09-24 | 2016-01-20 | 福州瑞芯微电子股份有限公司 | Accelerated processing method and device for audio signal |
Non-Patent Citations (1)
Title |
---|
潘艇等: "基于龙格—库塔的弹道微分方程解算的FPGA实现", 《计算机测量与控制》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
ES2871554T3 (en) | Floating point block for neural network implementation | |
CN108564168B (en) | Design method for neural network processor supporting multi-precision convolution | |
CN107689948B (en) | Efficient data access management device applied to neural network hardware acceleration system | |
CN111542826A (en) | Digital architecture supporting analog coprocessors | |
CN109146067B (en) | Policy convolution neural network accelerator based on FPGA | |
CN109886400A (en) | The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel | |
CN110516801A (en) | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput | |
CN102495719B (en) | Vector floating point operation device and method | |
CN109828744A (en) | A kind of configurable floating point vector multiplication IP kernel based on FPGA | |
CN107301453A (en) | The artificial neural network forward operation apparatus and method for supporting discrete data to represent | |
CN107797962A (en) | Computing array based on neutral net | |
US10657442B2 (en) | Deep learning accelerator architecture with chunking GEMM | |
CN103678257A (en) | Positive definite matrix floating point inversion device based on FPGA and inversion method thereof | |
US11783200B2 (en) | Artificial neural network implementation in field-programmable gate arrays | |
CN110163355A (en) | A kind of computing device and method | |
US11620105B2 (en) | Hybrid floating point representation for deep learning acceleration | |
CN108710943B (en) | Multilayer feedforward neural network parallel accelerator | |
Zhang et al. | Implementation and optimization of the accelerator based on FPGA hardware for LSTM network | |
Shivapakash et al. | A power efficiency enhancements of a multi-bit accelerator for memory prohibitive deep neural networks | |
CN106527999A (en) | Hardware acceleration device and acceleration method for solving differential equations | |
Wu et al. | An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA | |
CN103247019B (en) | For the reconfigurable device based on AdaBoost algorithm of object detection | |
CN107831823B (en) | Gaussian elimination method for analyzing and optimizing power grid topological structure | |
CN113392963B (en) | FPGA-based CNN hardware acceleration system design method | |
Jia et al. | A high-performance accelerator for floating-point matrix multiplication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |