CN113361687A - Configurable addition tree suitable for convolutional neural network training accelerator - Google Patents
Configurable addition tree suitable for convolutional neural network training accelerator Download PDFInfo
- Publication number
- CN113361687A CN113361687A CN202110597775.1A CN202110597775A CN113361687A CN 113361687 A CN113361687 A CN 113361687A CN 202110597775 A CN202110597775 A CN 202110597775A CN 113361687 A CN113361687 A CN 113361687A
- Authority
- CN
- China
- Prior art keywords
- order
- groups
- mode
- adders
- multiplexers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/491—Computations with decimal numbers radix 12 or 20.
- G06F7/4912—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a configurable addition tree suitable for a convolutional neural network training accelerator, which consists of three groups of addition units, wherein each group of addition units comprises a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series. Compared with the prior art, the method of the invention 1) can reduce the use of addition resources under the condition of large parallelism; 2) the method is suitable for accumulation of forward propagation conventional convolution 3 multiplied by 3, and also suitable for accumulation of weight gradient super large kernel convolution (unfixed size); 3) can be suitable for different data precisions.
Description
Technical Field
The invention belongs to the field of information technology and hardware acceleration of convolutional neural network training, and particularly relates to convolutional neural network training based on low power consumption and high performance.
Background
With the wide application of artificial intelligence technology, the field of on-line training chip design gradually becomes the leading edge of AI chip research at home and abroad. A Convolutional Neural Network (CNN) is a feedforward neural network, and can be widely applied to the fields of computer vision, natural language processing, and the like. The CNN training process involves large data storage, complex reading, and synchronization requirements, with very high requirements on storage space, access bandwidth, and management mechanisms. The existing hardware architecture explores an efficient hardware implementation mode of a convolution training operator around a training algorithm, and meets the requirements of a deep neural network on calculated amount and storage space. The basic operators of the CNN training algorithm comprise convolution, pooling, activation function, normalization, loss function, derivation of related operation and the like, wherein the convolution layer is an important component of the CNN and occupies a very important position. The method has important significance for training the convolutional neural network, and is a convolutional single-engine architecture supporting forward propagation and backward propagation in the CNN training process, and different deep neural network models are trained and mapped onto a configurable training accelerator architecture. The FPGA becomes one of the platforms for realizing the convolutional neural network training by utilizing the characteristics of strong programmability, high parallelism and low energy efficiency. There are two different accumulation forms for forward propagation (error back propagation) and weight gradient computation of the CNN training accelerator.
However, the existing CNN training accelerator is mainly optimized for multiplication units, the optimization for addition units is less, and 17 addition units are needed in the separate implementation of convolution kernel internal addition trees and self-accumulation unit modes under a single parallelism. When the parallelism is large, the addition unit needs to consume a large amount of computing resources.
Therefore, optimizing the addition tree to reduce the consumption of computing resources is an urgent technical problem to be solved by the present invention.
Disclosure of Invention
In order to further reduce the resource occupation of the addition unit, the invention provides a configurable addition tree suitable for a convolutional neural network training accelerator, and a configurable addition tree design is realized, so that the configurable addition tree design not only supports the in-core accumulation of forward propagation and error reverse transmission, but also supports the self-accumulation function of weight gradient calculation, and the optimization of hardware architectures with different accumulation forms in a CNN training accelerator is realized.
The technical scheme adopted by the invention to solve the problems is as follows:
a configurable addition tree suitable for a convolutional neural network training accelerator, the configurable addition tree being composed of three groups of addition units, the addition units of each group comprising a first-order multiplexer and adder structure, a second-order multiplexer and adder structure, and a third-order multiplexer and adder structure connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series.
Compared with the prior art, the configurable addition tree applicable to the convolutional neural network training accelerator can achieve the following beneficial effects:
1) under the condition of high parallelism, the use of addition resources is reduced;
2) the method is suitable for accumulation of forward propagation conventional convolution 3 multiplied by 3, and can also be suitable for accumulation of weight gradient super large kernel convolution (unfixed size).
3) Can be suitable for different data precisions.
Drawings
FIG. 1 is a schematic diagram of a configurable additive tree architecture for a convolutional neural network training accelerator according to the present invention;
FIG. 2 is a schematic diagram of the accumulation mode of a configurable addition tree suitable for a convolutional neural network training accelerator according to the present invention; (a) mode 0: convolution kernel addition tree mode, (b) mode 1: self accumulation mode.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
The configurable addition tree suitable for the convolutional neural network training accelerator is combined by software and hardware, different accumulation functions are realized by dynamically configuring the functional mode of the addition tree, and the problem of large resource occupation in the design of the training accelerator is solved.
FIG. 1 is a schematic diagram of a configurable addition tree architecture suitable for a convolutional neural network training accelerator according to the present invention.
In a CNN training accelerator, an adder tree is configured based on a convolutional layer with the size of 3 x 3, and the structure of the configurable adder tree comprises the following components: one group of every 3 adding units, convolvingThe results of the 9 multiply units in the core are divided into 3 groups (a)2、a1、a3)、(a5、a4、a6) And (a)8、a7、a9) The two 4-level addition trees, namely a first-level addition tree, a second-level addition tree, a third-level addition tree, a fourth-level addition tree and a 3-level addition tree, are formed through the connection of the multiplexer and the addition unit.
Each group of the adding units comprises a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series. Mode selection with a multiplexer: wherein, the mode 0 is convolution kernel accumulation mode, and the mode 1 is self accumulation mode. The multiplexer is arranged at the input end of each-stage adder, and the output of the multiplexer is connected with the next-stage adder in series. In the implementation of the whole network architecture, the network architecture is divided into three situations of forward propagation, error reverse transfer and weight gradient calculation of different layers. Controlling a multiplexer by register configuration, and configuring an addition tree to enter a mode 0 if the whole single-engine architecture realizes forward propagation and error reverse transmission in the CNN training process; if the entire single-engine architecture implements the weight gradient computation function, the configurable addition tree enters mode 1.
In the first set of addition units:
in a first-order multiplexer and adder configuration, a first-order multiplexer connects the result a of a multiplication unit in the mode 0 state2Connected to the output of a first-order adder via a feedback line in the mode 1 state, and the two inputs of the first-order adder are the output of a first-order multiplexer and the result a of the multiplication unit1;
In the structure of a group of second-order multiplexers and adders, the group of second-order multiplexers are connected with the output end of the group of first-order adders in a mode 0 state and connected with the output end of the group of second-order adders through a feedback line in a mode 1 state, and the two inputs of the group of second-order adders are the output of the group of second-order multiplexers and the result a of a multiplication unit respectively3;
In the structure of a group of third-order multiplexers and adders, a group of first third-order multiplexers are connected with the output ends of a group of second-order adders in a mode 0 state and connected with the output ends of the group of third-order adders through a feedback circuit in a mode 1 state, and a group of second third-order multiplexers are connected with a result a of a multiplication unit in the mode 1 state2The two inputs of the group of third-order adders are respectively the output of the group of first third-order multiplexers and the output of the group of second third-order multiplexers;
in the second group of adding units:
in a two-set first-order multiplexer and adder configuration, the two-set first-order multiplexer connects the result a of the multiplication unit in the mode 0 state5Connected to the output ends of the two groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the two groups of first-order adders are the output of the two groups of first-order multiplexers and the result a of the multiplication unit4;
In the structure of the two groups of second-order multiplexers and adders, the two groups of second-order multiplexers are connected with the output ends of the two groups of first-order adders in a mode 0 state and are connected with the output ends of the two groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the two groups of second-order adders are respectively the output ends of the two groups of second-order multiplexers and the result a of the multiplication unit6;
In the structure of the two groups of third-order multiplexers and adders, the two groups of first third-order multiplexers are connected with the output ends of the two groups of second-order adders in a mode 0 state and connected with the output ends of the two groups of third-order adders through a feedback circuit in a mode 1 state, and the two groups of second third-order multiplexers are connected with a result a of the multiplication unit in the mode 1 state5The two inputs of the two groups of third-order adders are respectively the output of the two groups of first third-order multiplexers and the output of the two groups of second third-order multiplexers;
in the third group of addition units:
in the structure of three groups of first-order multiplexers and adders, the three groups of first-order multiplexers are connected with the result a of the multiplication unit in the mode 0 state8Connected to the output ends of the three groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the three groups of first-order adders are the output of the three groups of first-order multiplexers and the result a of the multiplication unit7;
In the structure of the three groups of second-order multiplexers and adders, the three groups of second-order multiplexers are connected with the output ends of the three groups of first-order adders in a mode 0 state and are connected with the output ends of the three groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the three groups of second-order adders are the output ends of the three groups of second-order multiplexers and the result a of the multiplication unit respectively9;
In the structure of the three groups of third-order multiplexers and adders, three groups of first third-order multiplexers are suspended in the mode 0 state and are connected to the output ends of the three groups of third-order adders through feedback lines in the mode 1 state, and three groups of second third-order multiplexers are connected with the result a of the multiplication unit in the mode 1 state8The three groups of third-order adders are suspended in the mode 0 state, and two inputs of the three groups of third-order adders are respectively the output of the three groups of first third-order multiplexers and the output of the three groups of second third-order multiplexers;
the configurable addition tree requires only 9 addition units as a whole. Therefore, compared with the prior art that the convolution kernel internal addition tree mode and the self-accumulation mode are realized separately, the method can reduce the number of addition units by 47% with single parallelism, and has important significance for reducing the resources occupied by the addition units.
The configurable addition tree is used for a convolution single-engine architecture, and supports forward propagation, error reverse transfer and weight gradient calculation of the convolution single-engine architecture; and meanwhile, the method obtains better performance, can be compatible with two working modes of convolution kernel internal accumulation and self-accumulation in forward propagation, error reverse transfer and weight gradient calculation, and reduces the number of addition units, thereby reducing the consumption of calculation resources.
In the whole CNN training accelerator, the addition tree can meet the addition tree in a convolution kernel in forward propagation and also can meet the self-accumulation function of weight gradient calculation.
Fig. 2 is a schematic diagram of the addition tree accumulation mode proposed by the present invention. The convolution kernel addition tree pattern of pattern 0 is shown as (a). The self-accumulation mode of mode 1 is shown as (b). For the convolution kernel addition tree mode, there are:
in the first set of addition units: the first-level addition tree realizes a2+a1(ii) a The two-stage addition tree realizes a2+a1+a3(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6(ii) a Four-level addition tree realizes a2+a1+a3+a5+a4+a6 a8+a7+a9;
In the second group of adding units: the first-level addition tree realizes a5+a4(ii) a The two-stage addition tree realizes a5+a4+a6(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6(ii) a Four-level addition tree realizes a2+a1+a3+a5+a4+a6+a8+a7+a9;
In the third group of addition units: the first-level addition tree realizes a8+a7(ii) a The two-stage addition tree realizes a8+a7+a9(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6+a8+a7+a9。
Claims (3)
1. A configurable addition tree suitable for a convolutional neural network training accelerator is characterized in that the configurable addition tree is composed of three groups of addition units, and the addition units in each group comprise a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series.
2. The configurable addition tree suitable for use in a convolutional neural network training accelerator as defined in claim 1, wherein said multiplexer comprises a mode 0 and a mode 1, said mode 0 being a convolutional intra-core accumulation mode and said mode 1 being a self-accumulation mode.
3. The configurable addition tree suitable for use in a convolutional neural network training accelerator as claimed in claim 1, wherein the specific structure of said three sets of addition units is as follows:
in the first set of addition units:
in a first-order multiplexer and adder configuration, a first-order multiplexer connects the result a of a multiplication unit in the mode 0 state2Connected to the output of a first-order adder via a feedback line in the mode 1 state, and the two inputs of the first-order adder are the output of a first-order multiplexer and the result a of the multiplication unit1;
In the structure of a group of second-order multiplexers and adders, the group of second-order multiplexers are connected with the output end of the group of first-order adders in a mode 0 state and connected with the output end of the group of second-order adders through a feedback line in a mode 1 state, and the two inputs of the group of second-order adders are the output of the group of second-order multiplexers and the result a of a multiplication unit respectively3;
In the structure of a group of third-order multiplexers and adders, a group of first third-order multiplexers are connected with the output ends of a group of second-order adders in a mode 0 state and connected with the output ends of the group of third-order adders through a feedback circuit in a mode 1 state, and a group of second third-order multiplexers are connected with a result a of a multiplication unit in the mode 1 state2The two inputs of the group of third-order adders are respectively the output of the group of first third-order multiplexers and the output of the group of second third-order multiplexers;
in the second group of adding units:
in a two-set first-order multiplexer and adder configuration, the two-set first-order multiplexer connects the result a of the multiplication unit in the mode 0 state5Connected to the output ends of the two groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the two groups of first-order adders are the output of the two groups of first-order multiplexers and the result a of the multiplication unit4;
In the structure of the two groups of second-order multiplexers and adders, the two groups of second-order multiplexers are connected with the output ends of the two groups of first-order adders in a mode 0 state and are connected with the output ends of the two groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the two groups of second-order adders are respectively the output ends of the two groups of second-order multiplexers and the result a of the multiplication unit6;
In the structure of the two groups of third-order multiplexers and adders, the two groups of first third-order multiplexers are connected with the output ends of the two groups of second-order adders in a mode 0 state and connected with the output ends of the two groups of third-order adders through a feedback circuit in a mode 1 state, and the two groups of second third-order multiplexers are connected with a result a of the multiplication unit in the mode 1 state5The two inputs of the two groups of third-order adders are respectively the output of the two groups of first third-order multiplexers and the output of the two groups of second third-order multiplexers;
in the third group of addition units:
in the structure of three groups of first-order multiplexers and adders, the three groups of first-order multiplexers are connected with the result a of the multiplication unit in the mode 0 state8Connected to the output ends of the three groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the three groups of first-order adders are the output of the three groups of first-order multiplexers and the result a of the multiplication unit7;
In the structure of the three groups of second-order multiplexers and adders, the three groups of second-order multiplexers are connected with the output ends of the three groups of first-order adders in a mode 0 state and connected with the output ends of the three groups of second-order adders through feedback lines in a mode 1 state, and the three groups of second-order adders are connected with the output ends of the three groups of second-order addersThe two inputs of the multiplexer are respectively the output of the three groups of second-order multiplexers and the result a of the multiplication unit9;
In the structure of the three groups of third-order multiplexers and adders, three groups of first third-order multiplexers are suspended in the air in the mode 0 state and are connected to the output ends of the three groups of third-order adders through feedback lines in the mode 1 state, and three groups of second third-order multiplexers are connected with the result a of the multiplication unit in the mode 1 state8And in the suspension mode in the mode 0 state, two inputs of the three groups of third-order adders are respectively the output of the three groups of first third-order multiplexers and the output of the three groups of second third-order multiplexers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110597775.1A CN113361687B (en) | 2021-05-31 | 2021-05-31 | Configurable addition tree suitable for convolutional neural network training accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110597775.1A CN113361687B (en) | 2021-05-31 | 2021-05-31 | Configurable addition tree suitable for convolutional neural network training accelerator |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113361687A true CN113361687A (en) | 2021-09-07 |
CN113361687B CN113361687B (en) | 2023-03-24 |
Family
ID=77528209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110597775.1A Active CN113361687B (en) | 2021-05-31 | 2021-05-31 | Configurable addition tree suitable for convolutional neural network training accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113361687B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20110006409A (en) * | 2009-07-14 | 2011-01-20 | 서강대학교산학협력단 | Decoder using low density parity check code |
CN105611269A (en) * | 2015-12-18 | 2016-05-25 | 华中科技大学 | Real time parallax calculation system based on FPGA |
CN106203617A (en) * | 2016-06-27 | 2016-12-07 | 哈尔滨工业大学深圳研究生院 | A kind of acceleration processing unit based on convolutional neural networks and array structure |
CN109711542A (en) * | 2018-12-29 | 2019-05-03 | 西安交通大学 | A kind of DNN accelerator that supporting dynamic accuracy and its implementation |
US20190279083A1 (en) * | 2018-03-06 | 2019-09-12 | DinoplusAI Holdings Limited | Computing Device for Fast Weighted Sum Calculation in Neural Networks |
CN111898733A (en) * | 2020-07-02 | 2020-11-06 | 西安交通大学 | Deep separable convolutional neural network accelerator architecture |
CN112486457A (en) * | 2020-11-23 | 2021-03-12 | 杭州电子科技大学 | Hardware system for realizing improved FIOS modular multiplication algorithm |
-
2021
- 2021-05-31 CN CN202110597775.1A patent/CN113361687B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20110006409A (en) * | 2009-07-14 | 2011-01-20 | 서강대학교산학협력단 | Decoder using low density parity check code |
CN105611269A (en) * | 2015-12-18 | 2016-05-25 | 华中科技大学 | Real time parallax calculation system based on FPGA |
CN106203617A (en) * | 2016-06-27 | 2016-12-07 | 哈尔滨工业大学深圳研究生院 | A kind of acceleration processing unit based on convolutional neural networks and array structure |
US20190279083A1 (en) * | 2018-03-06 | 2019-09-12 | DinoplusAI Holdings Limited | Computing Device for Fast Weighted Sum Calculation in Neural Networks |
CN109711542A (en) * | 2018-12-29 | 2019-05-03 | 西安交通大学 | A kind of DNN accelerator that supporting dynamic accuracy and its implementation |
CN111898733A (en) * | 2020-07-02 | 2020-11-06 | 西安交通大学 | Deep separable convolutional neural network accelerator architecture |
CN112486457A (en) * | 2020-11-23 | 2021-03-12 | 杭州电子科技大学 | Hardware system for realizing improved FIOS modular multiplication algorithm |
Non-Patent Citations (1)
Title |
---|
王佩琪: "神经网络软硬件协同加速关键技术", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113361687B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447241B (en) | Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things | |
Yin et al. | An energy-efficient reconfigurable processor for binary-and ternary-weight neural networks with flexible data bit width | |
Wu et al. | A flexible and efficient FPGA accelerator for various large-scale and lightweight CNNs | |
CN109740739A (en) | Neural computing device, neural computing method and Related product | |
CN109740754A (en) | Neural computing device, neural computing method and Related product | |
Sun et al. | A high-performance accelerator for large-scale convolutional neural networks | |
CN112734020B (en) | Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network | |
CN115018062A (en) | Convolutional neural network accelerator based on FPGA | |
Chang et al. | Towards design methodology of efficient fast algorithms for accelerating generative adversarial networks on FPGAs | |
CN110598844A (en) | Parallel convolution neural network accelerator based on FPGA and acceleration method | |
Shahshahani et al. | Memory optimization techniques for fpga based cnn implementations | |
CN110110852B (en) | Method for transplanting deep learning network to FPAG platform | |
CN113361687B (en) | Configurable addition tree suitable for convolutional neural network training accelerator | |
CN107092462B (en) | 64-bit asynchronous multiplier based on FPGA | |
US20230128421A1 (en) | Neural network accelerator | |
An et al. | 29.3 an 8.09 tops/w neural engine leveraging bit-sparsified sign-magnitude multiplications and dual adder trees | |
CN112346704B (en) | Full-streamline type multiply-add unit array circuit for convolutional neural network | |
CN112149814A (en) | Convolutional neural network acceleration system based on FPGA | |
Zhan et al. | Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems | |
CN113191494B (en) | Efficient LSTM accelerator based on FPGA | |
Wen | FPGA-Based Deep Convolutional Neural Network Optimization Method | |
CN112766479A (en) | Neural network accelerator supporting channel separation convolution based on FPGA | |
Brown et al. | Nemo-cnn: An efficient near-memory accelerator for convolutional neural networks | |
Wang et al. | Design exploration of multi-fpgas for accelerating deep learning | |
Hossain et al. | Energy efficient computing with heterogeneous DNN accelerators |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |