CN110738311A - LSTM network acceleration method based on high-level synthesis - Google Patents
LSTM network acceleration method based on high-level synthesis Download PDFInfo
- Publication number
- CN110738311A CN110738311A CN201910975595.5A CN201910975595A CN110738311A CN 110738311 A CN110738311 A CN 110738311A CN 201910975595 A CN201910975595 A CN 201910975595A CN 110738311 A CN110738311 A CN 110738311A
- Authority
- CN
- China
- Prior art keywords
- lstm network
- fitting
- lstm
- acceleration
- error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses an LSTM network acceleration method based on high-level synthesis, belongs to the field of embedded online application of deep neural networks, and aims to solve the problems that an existing LSTM network is complex in operation and an embedded platform is low in operation speed. The specific process of the invention is as follows: constructing an LSTM network model by using MATLAB for training; piecewise fitting the activation function; when the fitting error of the activation function is within the threshold value range, the LSTM network model is converted into a high-level language code, the code structure is optimized, and an optimization instruction is obtained; adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network; running an unoptimized accelerated LSTM network at a PS end by adopting a Zynq running platform to obtain running time, and running the LSTM accelerated network at a PL end to obtain the running time; and calculating an acceleration ratio and an error, and finishing the optimized acceleration when the acceleration ratio and the error are in a threshold range. The invention is used for accelerating the LSTM network.
Description
Technical Field
The invention relates to LSTM network acceleration methods based on high-level synthesis, and belongs to the field of embedded online application of deep neural networks.
Background
The LSTM (Long Short-Term Memory Network) belongs to types of Recurrent Neural Networks (RNN), is generally used for scenes with relatively Long interval and delay in multi-dimensional time sequence prediction, and is different from a common RNN Network in the structure of a neuron.
The FPGA is used as a special Circuit, has higher running speed and lower power consumption compared with the FPGA, but has poor universality, complex design and higher cost, the FPGA has High parallel Processing speed, flexible design and strong universality, is suitable for different embedded platforms, and can provide different optimized acceleration schemes.
In summary, it is imperative to design methods for accelerating the LSTM network to solve the problems of slow operation speed and poor real-time performance of the LSTM network, in order to solve the problem that the LSTM network is difficult to be embedded in online application under the limitations of time, power consumption, volume, etc.
Disclosure of Invention
The invention aims to solve the problems of complex operation and low operation speed of an embedded platform of the existing LSTM network, and provides LSTM network acceleration methods based on high-level synthesis.
The invention relates to a LSTM network acceleration method based on high-level synthesis, which comprises the following specific processes:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
Preferably, the activation functions of the segment fitting in S2 are sigmoid function and tanh function.
Preferably, the specific method of the segment fitting in S2 is as follows:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
Preferably, the specific process of obtaining the fitting error of the activation function in S3 is as follows:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
Preferably, the fitting error threshold range of S3 is: less than 10-3Of the order of magnitude.
Preferably, the LSTM network model is converted into C + + code as described in S4.
Preferably, the optimizing the code structure includes:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
Preferably, the specific method for replacing the data type with the fixed point in S5 is as follows:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
Preferably, the method for calculating the acceleration ratio of the operating time in S7 includes: dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
Preferably, the acceleration ratio threshold range of S7 is: greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
The invention has the advantages that LSTM network acceleration methods based on high-level synthesis are provided to meet the requirements of LSTM network model on optimization acceleration under different scenesThe invention adopts a high-level comprehensive technology to complete the low-speed problem, and uses Xilinx Zynq-7000 as an operation platform. The running time of the existing LSTM network at the PS end is 7.23ms, the running time of the network acceleration method at the PL end is 132.27us, and the acceleration ratio is 54.66. The calculation error of the LSTM network at the PS end is 2.29051e-14The calculation error at PL end by using the network acceleration method of the invention is 3.95783e-09。
Drawings
FIG. 1 is a flow chart of the LSTM network acceleration method based on high-level synthesis according to the present invention
FIG. 2 is a fitting error curve of sigmoid function and tanh function.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only partial embodiments of of the present invention, rather than all embodiments.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention will now be described in further with reference to the following figures and examples, but not as a limitation of the invention.
In specific embodiment , the following describes the present embodiment with reference to fig. 1, and the specific process of the LSTM network acceleration method based on high-level integration in the present embodiment is as follows:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
In this embodiment, High-Level Synthesis (HLS), which is "Synthesis", is to translate the program code into a special netlist file of the NGC and implement the same. HLS is a technique that is described from a high level and then synthesized into a usable netlist file.
In the embodiment, Xilinx Zynq-7000 is used as an operation platform, and the model is Zynq-XC7Z045 SoC. The PS end consists of a dual-core ARM Cortex-A9 processor, and the main frequency is 666.666 MHz. The PL terminal is an FPGA-based architecture and has a clock frequency of 66.666 MHz.
And , the activation functions of the piecewise fitting of S2 are sigmoid function and tanh function.
In the embodiment, the LSTM network neurons use two activation functions, namely a sigmoid function and a tanh function, both of the two activation functions use exponential operation and division operation, and the hardware implementation of the two operations in the FPGA occupies a large amount of resources and consumes a long time for calculation. Therefore, the sigmoid function and the tanh function are fitted in a segmented manner, so that exponential operation and division operation of the LSTM network model are replaced, and the acceleration effect is achieved.
Step , the specific method of the piecewise fitting of S2 is:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
In this embodiment, the coefficients of the respective piecewise fitting of the sigmoid function and the tanh function of the two activation functions are shown in tables 1 and 2:
TABLE 1
Interval(s) | Coefficient of cubic term | Coefficient of quadratic term | degree coefficient | Constant term |
(0,0.3) | -0.020321834 | -1.306422e-04 | 0.25001000 | 0.4999998 |
(0.3,0.6) | -0.016878724 | -0.003441058 | 0.25110966 | 0.4998746 |
(0.6,1) | -0.010110341 | -0.016191247 | 0.25922768 | 0.4981305 |
(1,1.5) | —— | -0.047769817 | 0.29248601 | 0.4863312 |
(1.5,2) | —— | -0.044297020 | 0.28135895 | 0.4952337 |
(2,2.5) | —— | -0.034908787 | 0.24360674 | 0.5332612 |
(2.5,3.5) | —— | -0.020618677 | 0.16971086 | 0.6290141 |
(3.5,5) | —— | -0.006967799 | 0.07381673 | 0.7980843 |
(5,7) | —— | -0.001314399 | 0.01849083 | 0.9339095 |
TABLE 2
The fitting errors of the two activation functions are shown in fig. 2, wherein a curve a represents a fitting error curve of a sigmoid function, and a curve b represents a fitting error curve of a tanh function, so that the fitting errors are relatively small, and the function replacement can be performed by the errors.
, the specific process of obtaining the fitting error of the activation function in S3 is:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
, the fitting error threshold range S3 is less than 10-3Of the order of magnitude.
Further , the LSTM network model is converted into C + + code as described in S4.
Further to step , the optimizing the code structure includes:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
In the present embodiment, the memset () function is used to complete data initialization, so that time consumption caused by data initialization can be avoided.
In the present embodiment, the multiplication is repeated by an intermediate variable substitution matrix, for example: the function needs to be multiplied by x repeatedly, and a x may be first calculated and then participate in the subsequent calculation, so as to reduce the time consumption caused by a large number of multiplication operations.
In the embodiment, parameter cache data is set in the calculation process, so that calculation delay caused by dynamic memory allocation is avoided.
In this embodiment, pointers are used to transfer array elements, reducing the number of cycles of array assignments.
In this embodiment, the data stream of the function data interface is received by using the cache array, so that the problem of interface data stream blockage is avoided.
, the specific method for changing the data type to fixed point in S5 is:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
In the embodiment, the LSTM network structure is described using a high-level language, and the floating-point type data can obtain an accurate result, but the LSTM network has a large number of multiplication operations, which occupy a large amount of hardware resources, and the floating-point data is complex to calculate and consumes a long time. The use of the fixed point data type can cause the reduction of data precision and the increase of calculation errors, but the calculation speed can be greatly improved, and in the acceptable range of the final calculation errors, proper data width, integer digits and decimal digits are selected, so that a better acceleration effect can be obtained.
, calculating the running time acceleration ratio in S7 by dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
, the acceleration ratio threshold range S7 is greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
And , the PS end is a processing system of the Zynq operation platform, the PL end is programmable logic of the Zynq operation platform, and the PS end and the PL end communicate by adopting an AXI bus protocol.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.
Claims (10)
1. The LSTM network acceleration method based on high-level synthesis is characterized by comprising the following specific processes:
s1, constructing an LSTM network model by using MATLAB, and training the LSTM network model;
s2, fitting an activation function of the LSTM network model by MATLAB in a segmented manner;
s3, acquiring the fitting error of the activation function, judging whether the fitting error is within the fitting error threshold range, if not, returning to execute S2, and if so, executing S4;
s4, converting the LSTM network model into a high-level language code, and optimizing a code structure to obtain an optimization instruction;
s5, adding an optimization instruction in high-level synthesis, and replacing the data type with a fixed point to obtain an LSTM acceleration network;
s6, creating a project in Vivado, adopting a Zynq operation platform, operating an LSTM network which is not optimized and accelerated at a PS end to obtain the operation time of an unoptimized network model, and operating the LSTM accelerated network obtained at the PL end in S5 to obtain the operation time of an optimized network model;
s7, calculating the acceleration ratio and the error of the running time, judging whether the acceleration ratio is in the range of the acceleration ratio threshold value, and judging whether the error is in the range of the error threshold value, if not, returning to execute S5, and if so, finishing the optimized acceleration.
2. The LSTM network acceleration method based on high-level synthesis of claim 1, wherein the activation functions of the piecewise fitting of S2 are sigmoid function and tanh function.
3. The LSTM network acceleration method based on high-level synthesis according to claim 1 or 2, wherein the segment fitting method of S2 is as follows:
fitting was performed using a cubic function in the (0,1) interval and a quadratic function in the other intervals.
4. The LSTM network acceleration method based on high-level synthesis according to claim 3, wherein the specific process of obtaining the fitting error of the activation function in S3 is as follows:
s3-1, obtaining a fitting curve of the original activation function;
s3-2, obtaining a fitting curve of the piecewise fitting activation function;
and S3-3, subtracting the fitting curve of S3-2 from the fitting curve of S3-1, wherein the difference is the fitting error.
5. The LSTM network acceleration method based on high-level synthesis according to claim 4, wherein the fitting error threshold range of S3 is: less than 10-3Of the order of magnitude.
6. The LSTM network acceleration method based on high-level synthesis according to claim 1, wherein the step of S4 is to convert the LSTM network model into C + + code.
7. The LSTM network acceleration method based on high-level synthesis according to claim 1 or 6, wherein optimizing the code structure comprises:
adopting a memset () function to complete data initialization;
repeatedly multiplying the intermediate variable substitution matrix;
setting parameter cache data in the calculation process;
transferring array elements using pointers;
and receiving the data stream of the function data interface by adopting a cache array.
8. The LSTM network acceleration method based on high-level integration according to claim 1, wherein the specific method for replacing the data type with fixed point at S5 is:
using 24-bit fixed point data, -bit sign bit, three integer bits, and the remaining bits are decimal bits.
9. The LSTM network acceleration method based on high-level synthesis according to claim 1, wherein the acceleration ratio of the runtime of S7 is calculated by: dividing the optimized network model running time of the PL terminal by the unoptimized network model running time of the PS terminal;
the error calculation method comprises the following steps: and (4) making a difference between the operation result of the PL-terminal optimized network model obtained in the step (S6) and the operation result of the LSTM network model obtained in the step (S1), wherein the difference is an error.
10. The LSTM network acceleration method based on high-level synthesis according to claim 9, wherein the acceleration ratio threshold range of S7 is: greater than or equal to 50;
s7 the error threshold range is: less than or equal to 10-9Magnitude.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910975595.5A CN110738311A (en) | 2019-10-14 | 2019-10-14 | LSTM network acceleration method based on high-level synthesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910975595.5A CN110738311A (en) | 2019-10-14 | 2019-10-14 | LSTM network acceleration method based on high-level synthesis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110738311A true CN110738311A (en) | 2020-01-31 |
Family
ID=69268892
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910975595.5A Pending CN110738311A (en) | 2019-10-14 | 2019-10-14 | LSTM network acceleration method based on high-level synthesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110738311A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507465A (en) * | 2020-06-16 | 2020-08-07 | 电子科技大学 | Configurable convolutional neural network processor circuit |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902507A (en) * | 2014-03-28 | 2014-07-02 | 中国科学院自动化研究所 | Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor |
CN106775905A (en) * | 2016-11-19 | 2017-05-31 | 天津大学 | Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates |
CN108090560A (en) * | 2018-01-05 | 2018-05-29 | 中国科学技术大学苏州研究院 | The design method of LSTM recurrent neural network hardware accelerators based on FPGA |
CN108256636A (en) * | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
CN109144469A (en) * | 2018-07-23 | 2019-01-04 | 上海亮牛半导体科技有限公司 | Pipeline organization neural network matrix operation framework and method |
US20190114548A1 (en) * | 2017-10-17 | 2019-04-18 | Xilinx, Inc. | Static block scheduling in massively parallel software defined hardware systems |
CN109934337A (en) * | 2019-03-14 | 2019-06-25 | 哈尔滨工业大学 | A kind of detection method of the spacecraft telemetry exception based on integrated LSTM |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN110084363A (en) * | 2019-05-15 | 2019-08-02 | 电科瑞达(成都)科技有限公司 | A kind of deep learning model accelerated method based on FPGA platform |
-
2019
- 2019-10-14 CN CN201910975595.5A patent/CN110738311A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902507A (en) * | 2014-03-28 | 2014-07-02 | 中国科学院自动化研究所 | Matrix multiplication calculating device and matrix multiplication calculating method both oriented to programmable algebra processor |
CN106775905A (en) * | 2016-11-19 | 2017-05-31 | 天津大学 | Higher synthesis based on FPGA realizes the method that Quasi-Newton algorithm accelerates |
US20190114548A1 (en) * | 2017-10-17 | 2019-04-18 | Xilinx, Inc. | Static block scheduling in massively parallel software defined hardware systems |
CN108090560A (en) * | 2018-01-05 | 2018-05-29 | 中国科学技术大学苏州研究院 | The design method of LSTM recurrent neural network hardware accelerators based on FPGA |
CN108256636A (en) * | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
CN109144469A (en) * | 2018-07-23 | 2019-01-04 | 上海亮牛半导体科技有限公司 | Pipeline organization neural network matrix operation framework and method |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN109934337A (en) * | 2019-03-14 | 2019-06-25 | 哈尔滨工业大学 | A kind of detection method of the spacecraft telemetry exception based on integrated LSTM |
CN110084363A (en) * | 2019-05-15 | 2019-08-02 | 电科瑞达(成都)科技有限公司 | A kind of deep learning model accelerated method based on FPGA platform |
Non-Patent Citations (2)
Title |
---|
彭鑫磊 等: "基于高层次综合的卷积神经网络设计与优化方法研究", 《微电子学与计算机》 * |
王晓璐: "基于 Zynq的LS-SVM 算法加速器设计", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507465A (en) * | 2020-06-16 | 2020-08-07 | 电子科技大学 | Configurable convolutional neural network processor circuit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111459877B (en) | Winograd YOLOv2 target detection model method based on FPGA acceleration | |
Guo et al. | Software-hardware codesign for efficient neural network acceleration | |
CN106570559A (en) | Data processing method and device based on neural network | |
CN108196822A (en) | A kind of method and system of double-precision floating point extracting operation | |
CN106775577B (en) | A kind of design method of the non-precision redundant manipulators multiplier of high-performance | |
Chang et al. | A mixed-pruning based framework for embedded convolutional neural network acceleration | |
CN112257844B (en) | Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof | |
CN109739470A (en) | A kind of computing system based on 2 type hyperbolic CORDIC arbitrary characteristics functions | |
CN104133656A (en) | Floating point number divider adopting shift and subtraction operation by tail codes and floating point number division operation method adopting shift and subtraction operation by tail codes | |
JP2023516521A (en) | Quantum error correction decoding system, method, fault tolerant quantum error correction system and chip | |
CN112632874A (en) | Optimization method and system for numerical simulation of helicopter flow field | |
CN110222305B (en) | Logarithmic function calculation system and method based on hyperbolic CORDIC | |
CN110738311A (en) | LSTM network acceleration method based on high-level synthesis | |
Zong-ling et al. | The design of lightweight and multi parallel CNN accelerator based on FPGA | |
CN114021710A (en) | Deep learning convolution acceleration method and processor by using bit-level sparsity | |
CN113313244A (en) | Near-storage neural network accelerator facing to addition network and acceleration method thereof | |
CN110825346B (en) | Low logic complexity unsigned approximation multiplier | |
Yin et al. | FPGA-based high-performance CNN accelerator architecture with high DSP utilization and efficient scheduling mode | |
Hsieh et al. | A multiplier-less convolutional neural network inference accelerator for intelligent edge devices | |
CN101286185A (en) | Numerical frequency synthesis circuit compiler accomplishing method based on linear interpolation structure | |
CN114925627B (en) | Helicopter flow field numerical simulation system and method based on graphic processor | |
WO2022174733A1 (en) | Neuron accelerated processing method and apparatus, and device and readable storage medium | |
CN116384455A (en) | Non-uniform piecewise linearization activation function hardware implementation method | |
CN116303219A (en) | Grid file acquisition method and device and electronic equipment | |
CN113434034B (en) | Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |