CN106527999A - Hardware acceleration device and acceleration method for solving differential equations - Google Patents

Hardware acceleration device and acceleration method for solving differential equations Download PDF

Info

Publication number
CN106527999A
CN106527999A CN201611088172.4A CN201611088172A CN106527999A CN 106527999 A CN106527999 A CN 106527999A CN 201611088172 A CN201611088172 A CN 201611088172A CN 106527999 A CN106527999 A CN 106527999A
Authority
CN
China
Prior art keywords
module
output
hardware
differential equation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611088172.4A
Other languages
Chinese (zh)
Inventor
姚小城
殷进勇
刘煜
王洋
吴建鲁
李毅
陶峥嵘
董海祥
王永
李小亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
716th Research Institute of CSIC
Original Assignee
716th Research Institute of CSIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 716th Research Institute of CSIC filed Critical 716th Research Institute of CSIC
Priority to CN201611088172.4A priority Critical patent/CN106527999A/en
Publication of CN106527999A publication Critical patent/CN106527999A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements

Abstract

The invention discloses a hardware acceleration device for solving differential equations. The device is composed of an input-output caching module, a series-parallel conversion module and a Runge-Kutta hardware acceleration module. The input-output caching module is mainly used for caching direct interactive data with a main control, and all data between an acceleration unit and the main control interacts via an input-output cache. The series-parallel conversion module is mainly used for performing multiple parallel outputs on the read data input to the cache according to needs, and converting the parallel output data of the hardware acceleration module into series data. The Runge-Kutta hardware acceleration module is mainly used for implementing hardware acceleration for solving differential equations. The hardware acceleration device for solving differential equations realizes generality of hardware acceleration for solving the differential equations by putting forwarding a general computing architecture, combining the partial reconfigurability of an FPGA (Field Programmable Gate Array) and flexibly configuring a core computing unit, and thus effectively meets the application requirement for quickly solving various differential equations.

Description

A kind of hardware accelerator and accelerated method for solving the differential equation
Technical field
The present invention relates to a kind of accelerator for solving the differential equation, particularly one kind are asked for the Runge Kutta differential equation The hardware accelerator and accelerated method of solution.
Background technology
ODE is research natural science and the motion of things, object and phenomenon, evolution and change rule in social science The mathematical theory the most basic and method of rule.Many principles in each field can be described as appropriate ODE, such as The law of motion of newton, the law of universal gravitation, law of conservation of energy, population development rule, disease propagation, market clearing price Variation tendency etc..In most cases, the analytical expression for finding complicated ODE is extremely difficult, therefore conventional approximate solution Method obtains result of calculation, and number crunching numerical solution is generally used in engineering.Quick with computer science and technology sends out Exhibition, classical ordinary differential numerical computation method experienced a revaluation, screening, transformation and the process innovated, and emerged in large numbers and permitted Many new ideas, new problem and new departures of computer potentiality can be played.Its operation efficiency is improved by the way of devices at full hardware accelerates A study hotspot is become.
But associated description is there is no in prior art.
The content of the invention
It is an object of the invention to provide a kind of accelerator and accelerated method for solving the differential equation, accelerates the differential equation The solution of group.
The technical solution for realizing the object of the invention is:A kind of accelerator for solving the differential equation, the device is by defeated Enter output buffer module, serioparallel exchange module and the hardware accelerator based on Long Gekuta to constitute.Input into/output from cache module It is mainly used in the caching with master control direct interaction data, all data between accelerator module and master control are slow by input and output Deposit interaction.String simultaneously/parallel serial conversion module is mainly responsible for the data of the reading of input-buffer various parallel outputs as needed, The parallel output data of hardware accelerator is converted to into serial data.Mainly it is responsible for based on the hardware accelerator of Long Gekuta Realize the hardware-accelerated of differential equation.
This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can with reference to FPGA local Reconstruction property, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
The read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, by what is read Data parallel is connected with the hardware acceleration unit based on Long Gekuta, and the module data input bit is a width of 32, outputs data bits 4096 bit wides of wide maximum support, specifically can be configured according to demand.Parallel serial conversion module is mainly by result of calculation with 32 Position bit wide output the write operation of controlled output caching.
A kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
A) it is first determined whether being differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into 1 Rank differential equation group;If first-order system, then step b) is directly carried out;
B) 1 rank differential equation group is solved using 4 classical rank Long Gekuta methods;
C) realize for the hardware of 4 rank Long Gekuta of b) step, depth is carried out to 4 subiterations processes in each iteration Degree optimization, main iteration adopt identical hardware configuration, and the sub- iteration in main iteration adopts identical hardware configuration, antithetical phrase to change every time Realization parallel hardware architecture to greatest extent in generation;
D) it is for the purpose in reaching b), complex to solving radical sign, trigonometric function solution, division, traversal lookup etc. Algorithm, is carried out using parallel arithmetic element, and adopts various optimization means, and such as division increases each iteration step length, triangle letter Number adopts two steps to merge, compares for the constant table of some needs traversals adopts the mode of coding that traversal is substituted in the way of tabling look-up Behaviour etc..The maximum delay path in every subiterations is found, is reduced during calculation delay by increasing degree of parallelism, it is other non- Maximum delay path needs the compromise for carrying out resource and speed.
Due to accelerating module and the communication of primary processor take it is larger, using by whole algorithm devices at full hardwareization run, reduce Interaction, primary processor only need to for parameter initialization value, step-length and number control signal to be sent to accelerating module, accelerating module Calculating intermediate data is then carried out by dma mode, successfully will communication when accounting for than control in relatively low level, its communication account for when ratio with The increase of algorithmic variable number, computation complexity and iterations and reduce.
Compared with prior art, remarkable advantage of the invention is:1) present invention realizes Systems of Ordinary Differential Equations using hardware Accelerate, with higher speed-up ratio;2) present invention proposes general interface structure so that the invention can meet different parameters Amount, the solution demand of the differential equation group of variable number;3) core calculations accelerator module borrows FPGA reconfigurable functions, for difference Equation group, reconstruct the core calculations unit, improve the versatility of resource utilization and invention.
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Description of the drawings
Fig. 1 is a kind of accelerator composition schematic diagram for solving the differential equation.
Fig. 2 is serioparallel exchange and parallel serial conversion module schematic diagram, wherein figure (a) is serioparallel exchange module schematic diagram, is schemed (b) It is parallel serial conversion module schematic diagram.
Fig. 3 is the hardware acceleration unit calculation flow chart based on Long Gekuta.
Specific embodiment
With reference to Fig. 1, a kind of hardware accelerator for solving the differential equation, by input into/output from cache module, is gone here and there and is turned Mold changing block and the hardware accelerator based on Long Gekuta are constituted.Input into/output from cache module is mainly used in and master control direct interaction The caching of data, all data between accelerator module and master control are interacted by input into/output from cache.String simultaneously/parallel-serial conversion mould Block is mainly responsible for by the data of the reading of input-buffer various parallel outputs as needed, by the parallel defeated of hardware accelerator Go out data and be converted to serial data.Hardware accelerator based on Long Gekuta is mainly responsible for the hardware for realizing differential equation Accelerate.This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can be with reference to FPGA local again Structure characteristic, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
With reference to Fig. 2, the read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, The data parallel for reading is connected with the hardware acceleration unit based on Long Gekuta, the module data input bit is a width of 32, defeated Go out data bit width 4096 bit wides of maximum support, specifically can be configured according to demand.Parallel serial conversion module will mainly be calculated As a result the write operation for being cached with 32 bit wide outputs controlled output.
A kind of specifically, hardware accelerator for solving the differential equation of the invention, including input buffer module, Output buffer module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta;Input-buffer mould The data bit width cached in block, output buffer module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module, Output buffer module is connected with parallel serial conversion module, output and the hardware acceleration unit based on Long Gekuta of serioparallel exchange module Connected, there is provided the differential equation calculates required all data, based on the hardware acceleration unit of Long Gekuta output and and string turn Mold changing block is connected, by result of calculation output to parallel serial conversion module;
Wherein input buffer module by serioparallel exchange module and enters row information based on the hardware acceleration unit of Long Gekuta Interaction, output buffer module carries out information exchange by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta, defeated Enter cache module, output buffer module for the caching of interaction data between main control unit, added based on the hardware of Long Gekuta All data between fast unit and main control unit are interacted by input into/output from cache module;
The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, and string turns The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by mold changing block;
Hardware accelerator based on Long Gekuta is used to realize the hardware-accelerated of differential equation.
The input buffer module includes input FIFO and control register group two parts, and input FIFO is passed for data Pass, control register group be used for control signal transmission, control register by host computer assignment, by the hardware based on Long Gekuta Accelerator module is resetted.
Described output buffer module includes output FIFO and status register group two parts, and output FIFO is used to calculate knot Fruit data transfer, status register group are used for the transmission of status signal, and status register is by based on the hardware-accelerated of Long Gekuta Unit carries out assignment, is resetted by host computer.
The serioparallel exchange module carries out read operation control to the input FIFO of input buffer module according to control signal, reads The data for taking carry out parallel output, while changing to control signal, generation is based on the hardware acceleration unit of Long Gekuta Control signal, while carrying out reset operation to control signal.
The parallel serial conversion module carries out write operation control to the output FIFO of output caching according to control signal, to state Register carries out assignment.
Using classical quadravalence Long Gekutafa, all calculating are adopted the hardware acceleration unit based on Long Gekuta Single precision/double-precision floating point is calculated, and for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware and accelerates Unit, the FPU Float Point Unit of support include addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, its core Restructural scheme of the differential equation part using FPGA, can be completed to different differential sides by parameter setting and reconstruct Journey is solved.
The data input bit wide of serioparallel exchange module is 32, and output data bit wide is maximum to support 4096 bit wides;And string turns Mold changing block is the write operation that result of calculation is exported simultaneously controlled output cache module with 32 bit wides.
The calculating intermediate data of the hardware acceleration unit based on Long Gekuta is transmitted by dma mode.
It is a kind of based on the above-mentioned accelerated method for solving the hardware accelerator of the differential equation, comprise the following steps:
Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to First order differential equation system, then execution step 2;If first-order system, then direct execution step 2;
Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system;Wherein main iteration adopts phase Same hardware configuration, using identical hardware configuration, sub- iteration is using parallel to greatest extent for the sub- iteration in main iteration every time Hardware configuration is realized.During sub- iteration, the maximum delay path in every subiterations is found, reduced by increasing degree of parallelism During calculation delay.
Using hardware, the present invention realizes that Systems of Ordinary Differential Equations accelerates, with higher speed-up ratio.
With reference to specific embodiment, the present invention will be further described.
Embodiment 1
With reference to Fig. 1, a kind of hardware accelerator for solving the differential equation, by input into/output from cache module, is gone here and there and is turned Mold changing block and the hardware accelerator based on Long Gekuta are constituted.Input into/output from cache module is mainly used in and master control direct interaction The caching of data, all data between accelerator module and master control are interacted by input into/output from cache.String simultaneously/parallel-serial conversion mould Block is mainly responsible for by the data of the reading of input-buffer various parallel outputs as needed, by the parallel defeated of hardware accelerator Go out data and be converted to serial data.Hardware accelerator based on Long Gekuta is mainly responsible for the hardware for realizing differential equation Accelerate.This is used for the hardware accelerator for solving the differential equation by proposing general computing architecture, can be with reference to FPGA local again Structure characteristic, by the flexible configuration to core calculations unit, realizes the hardware-accelerated generalization to Systems of Ordinary Differential Equations.
With reference to Fig. 2, the read operation that control input is mainly cached by serioparallel exchange module, and according to the indication signal for reading, The data parallel for reading is connected with the hardware acceleration unit based on Long Gekuta, the module data input bit is a width of 32, defeated Go out data bit width 4096 bit wides of maximum support, specifically can be configured according to demand.Parallel serial conversion module will mainly be calculated As a result the write operation for being cached with 32 bit wide outputs controlled output.
With reference to Fig. 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
As shown in figure 3, a kind of calculation process for solving the hardware accelerator of the differential equation is as follows:
A) it is first determined whether being differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into 1 Rank differential equation group;If first-order system, then step b) is directly carried out;
B) 1 rank differential equation group is solved using 4 classical rank Long Gekuta methods;
C) realize for the hardware of 4 rank Long Gekuta of b) step, depth is carried out to 4 subiterations processes in each iteration Degree optimization, main iteration adopt identical hardware configuration, and the sub- iteration in main iteration adopts identical hardware configuration, antithetical phrase to change every time Realization parallel hardware architecture to greatest extent in generation;
D) it is for the purpose in reaching b), complex to solving radical sign, trigonometric function solution, division, traversal lookup etc. Algorithm, is carried out using parallel arithmetic element, and adopts various optimization means, and such as division increases each iteration step length, triangle letter Number adopts two steps to merge, compares for the constant table of some needs traversals adopts the mode of coding that traversal is substituted in the way of tabling look-up Behaviour etc..The maximum delay path in every subiterations is found, is reduced during calculation delay by increasing degree of parallelism, it is other non- Maximum delay path needs the compromise for carrying out resource and speed.
Due to accelerating module and the communication of primary processor take it is larger, using by whole algorithm devices at full hardwareization run, reduce Interaction, primary processor only need to for parameter initialization value, step-length and number control signal to be sent to accelerating module, accelerating module Calculating intermediate data is then carried out by dma mode, successfully will communication when accounting for than control in relatively low level, its communication account for when ratio with The increase of algorithmic variable number, computation complexity and iterations and reduce.

Claims (10)

1. a kind of hardware accelerator for solving the differential equation, it is characterised in that including input buffer module, output caching Module, serioparallel exchange module, parallel serial conversion module and the hardware acceleration unit based on Long Gekuta;Input buffer module, output The data bit width cached in cache module is matched with bus bit wide, and input buffer module is connected with serioparallel exchange module, and output is slow Storing module is connected with parallel serial conversion module, and the output of serioparallel exchange module is connected with the hardware acceleration unit based on Long Gekuta, All data needed for the differential equation is calculated, the output and parallel serial conversion module based on the hardware acceleration unit of Long Gekuta are provided It is connected, by result of calculation output to parallel serial conversion module;
Wherein input buffer module carries out information exchange with the hardware acceleration unit based on Long Gekuta by serioparallel exchange module, Output buffer module carries out information exchange, input-buffer by parallel serial conversion module and the hardware acceleration unit based on Long Gekuta The caching of module, output buffer module for interaction data between main control unit, the hardware acceleration unit based on Long Gekuta All data between main control unit are interacted by input into/output from cache module;
The data that input-buffer is read by the serioparallel exchange module carry out parallel output to hardware accelerator, parallel-serial conversion mould The parallel output data of the hardware accelerator based on Long Gekuta is converted to serial data by block;
Hardware accelerator based on Long Gekuta is used to realize the hardware-accelerated of differential equation.
2. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the input is delayed Storing module includes input FIFO and control register group two parts, and input FIFO is used for data transfer, and control register group is used for The transmission of control signal, control register are resetted by the hardware acceleration unit based on Long Gekuta by host computer assignment.
3. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described output Cache module includes output FIFO and status register group two parts, and output FIFO is transmitted for calculation result data, and state is posted Storage group is used for the transmission of status signal, and status register carries out assignment by the hardware acceleration unit based on Long Gekuta, by upper Position machine is resetted.
4. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that the string simultaneously turns Mold changing root tuber carries out read operation control according to control signal to the input FIFO of input buffer module, and the data of reading carry out parallel defeated Go out, while changing to control signal, produce the control signal of the hardware acceleration unit based on Long Gekuta, while to control Signal carries out reset operation.
5. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described and go here and there and turn Mold changing root tuber carries out write operation control to the output FIFO of output caching according to control signal, carries out assignment to status register.
6. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that described based on dragon Using classical quadravalence Long Gekutafa, all calculating adopt single precision/double-precision floating point to the hardware acceleration unit of Ge Kuta Calculate, for ensureing precision and unified bit wide, the module provides conventional floating-point arithmetic hardware accelerator module, the floating-point fortune of support Calculating unit includes addition subtraction multiplication and division FPU Float Point Unit, trigonometric function, power function unit, and the differential equation part of its core is adopted With the restructural scheme of FPGA, can be completed to different differential equations by parameter setting and reconstruct.
7. the hardware accelerator for solving the differential equation according to claim 1, it is characterised in that serioparallel exchange mould The data input bit wide of block is 32, and output data bit wide is maximum to support 4096 bit wides;Parallel serial conversion module is by result of calculation The write operation of simultaneously controlled output cache module is exported with 32 bit wides.
8. the hardware accelerator for solving the differential equation according to claim 6, it is characterised in that described based on dragon The calculating intermediate data of the hardware acceleration unit of Ge Kuta is transmitted by dma mode.
9. a kind of accelerated method for solving the hardware accelerator of the differential equation based on described in claim 1, its feature It is to comprise the following steps:
Step 1, determine whether differential equation of higher order, if differential equation of higher order, then differential equation of higher order is turned to into single order Differential equation group, then execution step 2;If first-order system, then direct execution step 2;
Step 2, the 4 rank Long Gekuta methods solution that classics are adopted to first order differential equation system;Wherein main iteration adopts identical Hardware configuration, using identical hardware configuration, sub- iteration is using Parallel Hardware to greatest extent for the sub- iteration in main iteration every time Structure is realized.
10. accelerated method according to claim 9, it is characterised in that during step 2 neutron iteration, find every subiterations In maximum delay path, by increasing degree of parallelism reducing during calculation delay.
CN201611088172.4A 2016-12-01 2016-12-01 Hardware acceleration device and acceleration method for solving differential equations Pending CN106527999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611088172.4A CN106527999A (en) 2016-12-01 2016-12-01 Hardware acceleration device and acceleration method for solving differential equations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611088172.4A CN106527999A (en) 2016-12-01 2016-12-01 Hardware acceleration device and acceleration method for solving differential equations

Publications (1)

Publication Number Publication Date
CN106527999A true CN106527999A (en) 2017-03-22

Family

ID=58353906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611088172.4A Pending CN106527999A (en) 2016-12-01 2016-12-01 Hardware acceleration device and acceleration method for solving differential equations

Country Status (1)

Country Link
CN (1) CN106527999A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000215057A (en) * 1998-10-06 2000-08-04 Texas Instr Inc <Ti> Data processor, electronic communication device, information processing method, and method for using speedup of hardware
CN102693342A (en) * 2012-05-24 2012-09-26 哈尔滨工程大学 Parameter selecting method for restraining sound wave energy in strong nonlinear medium
CN105260333A (en) * 2015-09-24 2016-01-20 福州瑞芯微电子股份有限公司 Accelerated processing method and device for audio signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000215057A (en) * 1998-10-06 2000-08-04 Texas Instr Inc <Ti> Data processor, electronic communication device, information processing method, and method for using speedup of hardware
CN102693342A (en) * 2012-05-24 2012-09-26 哈尔滨工程大学 Parameter selecting method for restraining sound wave energy in strong nonlinear medium
CN105260333A (en) * 2015-09-24 2016-01-20 福州瑞芯微电子股份有限公司 Accelerated processing method and device for audio signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘艇等: "基于龙格—库塔的弹道微分方程解算的FPGA实现", 《计算机测量与控制》 *

Similar Documents

Publication Publication Date Title
ES2871554T3 (en) Floating point block for neural network implementation
CN108564168B (en) Design method for neural network processor supporting multi-precision convolution
CN107689948B (en) Efficient data access management device applied to neural network hardware acceleration system
CN111542826A (en) Digital architecture supporting analog coprocessors
CN109146067B (en) Policy convolution neural network accelerator based on FPGA
CN109886400A (en) The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel
CN110516801A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput
CN102495719B (en) Vector floating point operation device and method
CN109828744A (en) A kind of configurable floating point vector multiplication IP kernel based on FPGA
CN107301453A (en) The artificial neural network forward operation apparatus and method for supporting discrete data to represent
CN107797962A (en) Computing array based on neutral net
US10657442B2 (en) Deep learning accelerator architecture with chunking GEMM
CN103678257A (en) Positive definite matrix floating point inversion device based on FPGA and inversion method thereof
US11783200B2 (en) Artificial neural network implementation in field-programmable gate arrays
CN110163355A (en) A kind of computing device and method
US11620105B2 (en) Hybrid floating point representation for deep learning acceleration
CN108710943B (en) Multilayer feedforward neural network parallel accelerator
Zhang et al. Implementation and optimization of the accelerator based on FPGA hardware for LSTM network
Shivapakash et al. A power efficiency enhancements of a multi-bit accelerator for memory prohibitive deep neural networks
CN106527999A (en) Hardware acceleration device and acceleration method for solving differential equations
Wu et al. An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA
CN103247019B (en) For the reconfigurable device based on AdaBoost algorithm of object detection
CN107831823B (en) Gaussian elimination method for analyzing and optimizing power grid topological structure
CN113392963B (en) FPGA-based CNN hardware acceleration system design method
Jia et al. A high-performance accelerator for floating-point matrix multiplication

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322