CN110738311A - LSTM network acceleration method based on high-level synthesis - Google Patents

LSTM network acceleration method based on high-level synthesis Download PDF

Info

Publication number
CN110738311A
CN110738311A CN201910975595.5A CN201910975595A CN110738311A CN 110738311 A CN110738311 A CN 110738311A CN 201910975595 A CN201910975595 A CN 201910975595A CN 110738311 A CN110738311 A CN 110738311A
Authority
CN
China
Prior art keywords
lstm network
fitting
lstm
acceleration
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910975595.5A
Other languages
Chinese (zh)
Inventor
刘大同
蒋闵
王本宽
彭宇
彭喜元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910975595.5A priority Critical patent/CN110738311A/en
Publication of CN110738311A publication Critical patent/CN110738311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses an LSTM network acceleration method based on high-level synthesis, belongs to the field of embedded online application of deep neural networks, and aims to solve the problems that an existing LSTM network is complex in operation and an embedded platform is low in operation speed. The specific process of the invention is as follows: constructing an LSTM network model by using MATLAB for training; piecewise fitting the activation function; when the fitting error of the activation function is within the threshold value range, the LSTM network model is converted into a high-level language code, the code structure is optimized, and an optimization instruction is obtained; adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network; running an unoptimized accelerated LSTM network at a PS end by adopting a Zynq running platform to obtain running time, and running the LSTM accelerated network at a PL end to obtain the running time; and calculating an acceleration ratio and an error, and finishing the optimized acceleration when the acceleration ratio and the error are in a threshold range. The invention is used for accelerating the LSTM network.

Description

LSTM network acceleration method based on high-level synthesis
Technical Field
The invention relates to LSTM network acceleration methods based on high-level synthesis, and belongs to the field of embedded online application of deep neural networks.
Background
The LSTM (Long Short-Term Memory Network) belongs to types of Recurrent Neural Networks (RNN), is generally used for scenes with relatively Long interval and delay in multi-dimensional time sequence prediction, and is different from a common RNN Network in the structure of a neuron.
The FPGA is used as a special Circuit, has higher running speed and lower power consumption compared with the FPGA, but has poor universality, complex design and higher cost, the FPGA has High parallel Processing speed, flexible design and strong universality, is suitable for different embedded platforms, and can provide different optimized acceleration schemes.
In summary, it is imperative to design methods for accelerating the LSTM network to solve the problems of slow operation speed and poor real-time performance of the LSTM network, in order to solve the problem that the LSTM network is difficult to be embedded in online application under the limitations of time, power consumption, volume, etc.
Disclosure of Invention
The invention aims to solve the problems of complex operation and low operation speed of an embedded platform of the existing LSTM network, and provides LSTM network acceleration methods based on high-level synthesis.
The invention relates to a LSTM network acceleration method based on high-level synthesis, which comprises the following specific processes:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
Preferably, the activation functions of the segment fitting in S2 are sigmoid function and tanh function.
Preferably, the specific method of the segment fitting in S2 is as follows:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
Preferably, the specific process of obtaining the fitting error of the activation function in S3 is as follows:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
Preferably, the fitting error threshold range of S3 is: less than 10-3Of the order of magnitude.
Preferably, the LSTM network model is converted into C + + code as described in S4.
Preferably, the optimizing the code structure includes:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
Preferably, the specific method for replacing the data type with the fixed point in S5 is as follows:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
Preferably, the method for calculating the acceleration ratio of the operating time in S7 includes: dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
Preferably, the acceleration ratio threshold range of S7 is: greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
The invention has the advantages that LSTM network acceleration methods based on high-level synthesis are provided to meet the requirements of LSTM network model on optimization acceleration under different scenesThe invention adopts a high-level comprehensive technology to complete the low-speed problem, and uses Xilinx Zynq-7000 as an operation platform. The running time of the existing LSTM network at the PS end is 7.23ms, the running time of the network acceleration method at the PL end is 132.27us, and the acceleration ratio is 54.66. The calculation error of the LSTM network at the PS end is 2.29051e-14The calculation error at PL end by using the network acceleration method of the invention is 3.95783e-09
Drawings
FIG. 1 is a flow chart of the LSTM network acceleration method based on high-level synthesis according to the present invention
FIG. 2 is a fitting error curve of sigmoid function and tanh function.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only partial embodiments of of the present invention, rather than all embodiments.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention will now be described in further with reference to the following figures and examples, but not as a limitation of the invention.
In specific embodiment , the following describes the present embodiment with reference to fig. 1, and the specific process of the LSTM network acceleration method based on high-level integration in the present embodiment is as follows:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
In this embodiment, High-Level Synthesis (HLS), which is "Synthesis", is to translate the program code into a special netlist file of the NGC and implement the same. HLS is a technique that is described from a high level and then synthesized into a usable netlist file.
In the embodiment, Xilinx Zynq-7000 is used as an operation platform, and the model is Zynq-XC7Z045 SoC. The PS end consists of a dual-core ARM Cortex-A9 processor, and the main frequency is 666.666 MHz. The PL terminal is an FPGA-based architecture and has a clock frequency of 66.666 MHz.
And , the activation functions of the piecewise fitting of S2 are sigmoid function and tanh function.
In the embodiment, the LSTM network neurons use two activation functions, namely a sigmoid function and a tanh function, both of the two activation functions use exponential operation and division operation, and the hardware implementation of the two operations in the FPGA occupies a large amount of resources and consumes a long time for calculation. Therefore, the sigmoid function and the tanh function are fitted in a segmented manner, so that exponential operation and division operation of the LSTM network model are replaced, and the acceleration effect is achieved.
Step , the specific method of the piecewise fitting of S2 is:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
In this embodiment, the coefficients of the respective piecewise fitting of the sigmoid function and the tanh function of the two activation functions are shown in tables 1 and 2:
TABLE 1
Interval(s) Coefficient of cubic term Coefficient of quadratic term degree coefficient Constant term
(0,0.3) -0.020321834 -1.306422e-04 0.25001000 0.4999998
(0.3,0.6) -0.016878724 -0.003441058 0.25110966 0.4998746
(0.6,1) -0.010110341 -0.016191247 0.25922768 0.4981305
(1,1.5) —— -0.047769817 0.29248601 0.4863312
(1.5,2) —— -0.044297020 0.28135895 0.4952337
(2,2.5) —— -0.034908787 0.24360674 0.5332612
(2.5,3.5) —— -0.020618677 0.16971086 0.6290141
(3.5,5) —— -0.006967799 0.07381673 0.7980843
(5,7) —— -0.001314399 0.01849083 0.9339095
TABLE 2
Figure BDA0002232969800000041
Figure BDA0002232969800000051
The fitting errors of the two activation functions are shown in fig. 2, wherein a curve a represents a fitting error curve of a sigmoid function, and a curve b represents a fitting error curve of a tanh function, so that the fitting errors are relatively small, and the function replacement can be performed by the errors.
, the specific process of obtaining the fitting error of the activation function in S3 is:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
, the fitting error threshold range S3 is less than 10-3Of the order of magnitude.
Further , the LSTM network model is converted into C + + code as described in S4.
Further to step , the optimizing the code structure includes:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
In the present embodiment, the memset () function is used to complete data initialization, so that time consumption caused by data initialization can be avoided.
In the present embodiment, the multiplication is repeated by an intermediate variable substitution matrix, for example: the function needs to be multiplied by x repeatedly, and a x may be first calculated and then participate in the subsequent calculation, so as to reduce the time consumption caused by a large number of multiplication operations.
In the embodiment, parameter cache data is set in the calculation process, so that calculation delay caused by dynamic memory allocation is avoided.
In this embodiment, pointers are used to transfer array elements, reducing the number of cycles of array assignments.
In this embodiment, the data stream of the function data interface is received by using the cache array, so that the problem of interface data stream blockage is avoided.
, the specific method for changing the data type to fixed point in S5 is:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
In the embodiment, the LSTM network structure is described using a high-level language, and the floating-point type data can obtain an accurate result, but the LSTM network has a large number of multiplication operations, which occupy a large amount of hardware resources, and the floating-point data is complex to calculate and consumes a long time. The use of the fixed point data type can cause the reduction of data precision and the increase of calculation errors, but the calculation speed can be greatly improved, and in the acceptable range of the final calculation errors, proper data width, integer digits and decimal digits are selected, so that a better acceleration effect can be obtained.
, calculating the running time acceleration ratio in S7 by dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
, the acceleration ratio threshold range S7 is greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
And , the PS end is a processing system of the Zynq operation platform, the PL end is programmable logic of the Zynq operation platform, and the PS end and the PL end communicate by adopting an AXI bus protocol.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims (10)

1. The LSTM network acceleration method based on high-level synthesis is characterized by comprising the following specific processes:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
2. The LSTM network acceleration method based on high-level synthesis of claim 1, wherein the activation functions of the piecewise fitting of S2 are sigmoid function and tanh function.
3. The LSTM network acceleration method based on high-level synthesis according to claim 1 or 2, wherein the segment fitting method of S2 is as follows:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
4. The LSTM network acceleration method based on high-level synthesis according to claim 3, wherein the specific process of obtaining the fitting error of the activation function in S3 is as follows:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
5. The LSTM network acceleration method based on high-level synthesis according to claim 4, wherein the fitting error threshold range of S3 is: less than 10-3Of the order of magnitude.
6. The LSTM network acceleration method based on high-level synthesis according to claim 1, wherein the step of S4 is to convert the LSTM network model into C + + code.
7. The LSTM network acceleration method based on high-level synthesis according to claim 1 or 6, wherein optimizing the code structure comprises:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
8. The LSTM network acceleration method based on high-level integration according to claim 1, wherein the specific method for replacing the data type with fixed point at S5 is:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
9. The LSTM network acceleration method based on high-level synthesis according to claim 1, wherein the acceleration ratio of the runtime of S7 is calculated by: dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
10. The LSTM network acceleration method based on high-level synthesis according to claim 9, wherein the acceleration ratio threshold range of S7 is: greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
CN201910975595.5A 2019-10-14 2019-10-14 LSTM network acceleration method based on high-level synthesis Pending CN110738311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910975595.5A CN110738311A (en) 2019-10-14 2019-10-14 LSTM network acceleration method based on high-level synthesis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910975595.5A CN110738311A (en) 2019-10-14 2019-10-14 LSTM network acceleration method based on high-level synthesis

Publications (1)

Publication Number Publication Date
CN110738311A true CN110738311A (en) 2020-01-31

Family

ID=69268892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910975595.5A Pending CN110738311A (en) 2019-10-14 2019-10-14 LSTM network acceleration method based on high-level synthesis

Country Status (1)

Country Link
CN (1) CN110738311A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507465A (en) * 2020-06-16 2020-08-07 电子科技大学 Configurable convolutional neural network processor circuit

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902507A (en) * 2014-03-28 2014-07-02 中国科学院自动化研究所 Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor
CN106775905A (en) * 2016-11-19 2017-05-31 天津大学 Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108256636A (en) * 2018-03-16 2018-07-06 成都理工大学 A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN109144469A (en) * 2018-07-23 2019-01-04 上海亮牛半导体科技有限公司 Pipeline organization neural network matrix operation framework and method
US20190114548A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Static block scheduling in massively parallel software defined hardware systems
CN109934337A (en) * 2019-03-14 2019-06-25 哈尔滨工业大学 A kind of detection method of the spacecraft telemetry exception based on integrated LSTM
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN110084363A (en) * 2019-05-15 2019-08-02 电科瑞达(成都)科技有限公司 A kind of deep learning model accelerated method based on FPGA platform

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902507A (en) * 2014-03-28 2014-07-02 中国科学院自动化研究所 Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor
CN106775905A (en) * 2016-11-19 2017-05-31 天津大学 Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates
US20190114548A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Static block scheduling in massively parallel software defined hardware systems
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108256636A (en) * 2018-03-16 2018-07-06 成都理工大学 A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN109144469A (en) * 2018-07-23 2019-01-04 上海亮牛半导体科技有限公司 Pipeline organization neural network matrix operation framework and method
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN109934337A (en) * 2019-03-14 2019-06-25 哈尔滨工业大学 A kind of detection method of the spacecraft telemetry exception based on integrated LSTM
CN110084363A (en) * 2019-05-15 2019-08-02 电科瑞达(成都)科技有限公司 A kind of deep learning model accelerated method based on FPGA platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭鑫磊 等: "基于高层次综合的卷积神经网络设计与优化方法研究", 《微电子学与计算机》 *
王晓璐: "基于 Zynq的LS-SVM 算法加速器设计", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507465A (en) * 2020-06-16 2020-08-07 电子科技大学 Configurable convolutional neural network processor circuit

Similar Documents

Publication Publication Date Title
CN111459877B (en) Winograd YOLOv2 target detection model method based on FPGA acceleration
Guo et al. Software-hardware codesign for efficient neural network acceleration
CN106570559A (en) Data processing method and device based on neural network
CN108196822A (en) A kind of method and system of double-precision floating point extracting operation
CN106775577B (en) A kind of design method of the non-precision redundant manipulators multiplier of high-performance
Chang et al. A mixed-pruning based framework for embedded convolutional neural network acceleration
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN109739470A (en) A kind of computing system based on 2 type hyperbolic CORDIC arbitrary characteristics functions
CN104133656A (en) Floating point number divider adopting shift and subtraction operation by tail codes and floating point number division operation method adopting shift and subtraction operation by tail codes
JP2023516521A (en) Quantum error correction decoding system, method, fault tolerant quantum error correction system and chip
CN112632874A (en) Optimization method and system for numerical simulation of helicopter flow field
CN110222305B (en) Logarithmic function calculation system and method based on hyperbolic CORDIC
CN110738311A (en) LSTM network acceleration method based on high-level synthesis
Zong-ling et al. The design of lightweight and multi parallel CNN accelerator based on FPGA
CN114021710A (en) Deep learning convolution acceleration method and processor by using bit-level sparsity
CN113313244A (en) Near-storage neural network accelerator facing to addition network and acceleration method thereof
CN110825346B (en) Low logic complexity unsigned approximation multiplier
Yin et al. FPGA-based high-performance CNN accelerator architecture with high DSP utilization and efficient scheduling mode
Hsieh et al. A multiplier-less convolutional neural network inference accelerator for intelligent edge devices
CN101286185A (en) Numerical frequency synthesis circuit compiler accomplishing method based on linear interpolation structure
CN114925627B (en) Helicopter flow field numerical simulation system and method based on graphic processor
WO2022174733A1 (en) Neuron accelerated processing method and apparatus, and device and readable storage medium
CN116384455A (en) Non-uniform piecewise linearization activation function hardware implementation method
CN116303219A (en) Grid file acquisition method and device and electronic equipment
CN113434034B (en) Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination