CN113361687A - Configurable addition tree suitable for convolutional neural network training accelerator - Google Patents

Configurable addition tree suitable for convolutional neural network training accelerator Download PDF

Info

Publication number
CN113361687A
CN113361687A CN202110597775.1A CN202110597775A CN113361687A CN 113361687 A CN113361687 A CN 113361687A CN 202110597775 A CN202110597775 A CN 202110597775A CN 113361687 A CN113361687 A CN 113361687A
Authority
CN
China
Prior art keywords
order
groups
mode
adders
multiplexers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110597775.1A
Other languages
Chinese (zh)
Other versions
CN113361687B (en
Inventor
刘强
孟浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110597775.1A priority Critical patent/CN113361687B/en
Publication of CN113361687A publication Critical patent/CN113361687A/en
Application granted granted Critical
Publication of CN113361687B publication Critical patent/CN113361687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/4912Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a configurable addition tree suitable for a convolutional neural network training accelerator, which consists of three groups of addition units, wherein each group of addition units comprises a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series. Compared with the prior art, the method of the invention 1) can reduce the use of addition resources under the condition of large parallelism; 2) the method is suitable for accumulation of forward propagation conventional convolution 3 multiplied by 3, and also suitable for accumulation of weight gradient super large kernel convolution (unfixed size); 3) can be suitable for different data precisions.

Description

Configurable addition tree suitable for convolutional neural network training accelerator
Technical Field
The invention belongs to the field of information technology and hardware acceleration of convolutional neural network training, and particularly relates to convolutional neural network training based on low power consumption and high performance.
Background
With the wide application of artificial intelligence technology, the field of on-line training chip design gradually becomes the leading edge of AI chip research at home and abroad. A Convolutional Neural Network (CNN) is a feedforward neural network, and can be widely applied to the fields of computer vision, natural language processing, and the like. The CNN training process involves large data storage, complex reading, and synchronization requirements, with very high requirements on storage space, access bandwidth, and management mechanisms. The existing hardware architecture explores an efficient hardware implementation mode of a convolution training operator around a training algorithm, and meets the requirements of a deep neural network on calculated amount and storage space. The basic operators of the CNN training algorithm comprise convolution, pooling, activation function, normalization, loss function, derivation of related operation and the like, wherein the convolution layer is an important component of the CNN and occupies a very important position. The method has important significance for training the convolutional neural network, and is a convolutional single-engine architecture supporting forward propagation and backward propagation in the CNN training process, and different deep neural network models are trained and mapped onto a configurable training accelerator architecture. The FPGA becomes one of the platforms for realizing the convolutional neural network training by utilizing the characteristics of strong programmability, high parallelism and low energy efficiency. There are two different accumulation forms for forward propagation (error back propagation) and weight gradient computation of the CNN training accelerator.
However, the existing CNN training accelerator is mainly optimized for multiplication units, the optimization for addition units is less, and 17 addition units are needed in the separate implementation of convolution kernel internal addition trees and self-accumulation unit modes under a single parallelism. When the parallelism is large, the addition unit needs to consume a large amount of computing resources.
Therefore, optimizing the addition tree to reduce the consumption of computing resources is an urgent technical problem to be solved by the present invention.
Disclosure of Invention
In order to further reduce the resource occupation of the addition unit, the invention provides a configurable addition tree suitable for a convolutional neural network training accelerator, and a configurable addition tree design is realized, so that the configurable addition tree design not only supports the in-core accumulation of forward propagation and error reverse transmission, but also supports the self-accumulation function of weight gradient calculation, and the optimization of hardware architectures with different accumulation forms in a CNN training accelerator is realized.
The technical scheme adopted by the invention to solve the problems is as follows:
a configurable addition tree suitable for a convolutional neural network training accelerator, the configurable addition tree being composed of three groups of addition units, the addition units of each group comprising a first-order multiplexer and adder structure, a second-order multiplexer and adder structure, and a third-order multiplexer and adder structure connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series.
Compared with the prior art, the configurable addition tree applicable to the convolutional neural network training accelerator can achieve the following beneficial effects:
1) under the condition of high parallelism, the use of addition resources is reduced;
2) the method is suitable for accumulation of forward propagation conventional convolution 3 multiplied by 3, and can also be suitable for accumulation of weight gradient super large kernel convolution (unfixed size).
3) Can be suitable for different data precisions.
Drawings
FIG. 1 is a schematic diagram of a configurable additive tree architecture for a convolutional neural network training accelerator according to the present invention;
FIG. 2 is a schematic diagram of the accumulation mode of a configurable addition tree suitable for a convolutional neural network training accelerator according to the present invention; (a) mode 0: convolution kernel addition tree mode, (b) mode 1: self accumulation mode.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
The configurable addition tree suitable for the convolutional neural network training accelerator is combined by software and hardware, different accumulation functions are realized by dynamically configuring the functional mode of the addition tree, and the problem of large resource occupation in the design of the training accelerator is solved.
FIG. 1 is a schematic diagram of a configurable addition tree architecture suitable for a convolutional neural network training accelerator according to the present invention.
In a CNN training accelerator, an adder tree is configured based on a convolutional layer with the size of 3 x 3, and the structure of the configurable adder tree comprises the following components: one group of every 3 adding units, convolvingThe results of the 9 multiply units in the core are divided into 3 groups (a)2、a1、a3)、(a5、a4、a6) And (a)8、a7、a9) The two 4-level addition trees, namely a first-level addition tree, a second-level addition tree, a third-level addition tree, a fourth-level addition tree and a 3-level addition tree, are formed through the connection of the multiplexer and the addition unit.
Each group of the adding units comprises a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series. Mode selection with a multiplexer: wherein, the mode 0 is convolution kernel accumulation mode, and the mode 1 is self accumulation mode. The multiplexer is arranged at the input end of each-stage adder, and the output of the multiplexer is connected with the next-stage adder in series. In the implementation of the whole network architecture, the network architecture is divided into three situations of forward propagation, error reverse transfer and weight gradient calculation of different layers. Controlling a multiplexer by register configuration, and configuring an addition tree to enter a mode 0 if the whole single-engine architecture realizes forward propagation and error reverse transmission in the CNN training process; if the entire single-engine architecture implements the weight gradient computation function, the configurable addition tree enters mode 1.
In the first set of addition units:
in a first-order multiplexer and adder configuration, a first-order multiplexer connects the result a of a multiplication unit in the mode 0 state2Connected to the output of a first-order adder via a feedback line in the mode 1 state, and the two inputs of the first-order adder are the output of a first-order multiplexer and the result a of the multiplication unit1
In the structure of a group of second-order multiplexers and adders, the group of second-order multiplexers are connected with the output end of the group of first-order adders in a mode 0 state and connected with the output end of the group of second-order adders through a feedback line in a mode 1 state, and the two inputs of the group of second-order adders are the output of the group of second-order multiplexers and the result a of a multiplication unit respectively3
In the structure of a group of third-order multiplexers and adders, a group of first third-order multiplexers are connected with the output ends of a group of second-order adders in a mode 0 state and connected with the output ends of the group of third-order adders through a feedback circuit in a mode 1 state, and a group of second third-order multiplexers are connected with a result a of a multiplication unit in the mode 1 state2The two inputs of the group of third-order adders are respectively the output of the group of first third-order multiplexers and the output of the group of second third-order multiplexers;
in the second group of adding units:
in a two-set first-order multiplexer and adder configuration, the two-set first-order multiplexer connects the result a of the multiplication unit in the mode 0 state5Connected to the output ends of the two groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the two groups of first-order adders are the output of the two groups of first-order multiplexers and the result a of the multiplication unit4
In the structure of the two groups of second-order multiplexers and adders, the two groups of second-order multiplexers are connected with the output ends of the two groups of first-order adders in a mode 0 state and are connected with the output ends of the two groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the two groups of second-order adders are respectively the output ends of the two groups of second-order multiplexers and the result a of the multiplication unit6
In the structure of the two groups of third-order multiplexers and adders, the two groups of first third-order multiplexers are connected with the output ends of the two groups of second-order adders in a mode 0 state and connected with the output ends of the two groups of third-order adders through a feedback circuit in a mode 1 state, and the two groups of second third-order multiplexers are connected with a result a of the multiplication unit in the mode 1 state5The two inputs of the two groups of third-order adders are respectively the output of the two groups of first third-order multiplexers and the output of the two groups of second third-order multiplexers;
in the third group of addition units:
in the structure of three groups of first-order multiplexers and adders, the three groups of first-order multiplexers are connected with the result a of the multiplication unit in the mode 0 state8Connected to the output ends of the three groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the three groups of first-order adders are the output of the three groups of first-order multiplexers and the result a of the multiplication unit7
In the structure of the three groups of second-order multiplexers and adders, the three groups of second-order multiplexers are connected with the output ends of the three groups of first-order adders in a mode 0 state and are connected with the output ends of the three groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the three groups of second-order adders are the output ends of the three groups of second-order multiplexers and the result a of the multiplication unit respectively9
In the structure of the three groups of third-order multiplexers and adders, three groups of first third-order multiplexers are suspended in the mode 0 state and are connected to the output ends of the three groups of third-order adders through feedback lines in the mode 1 state, and three groups of second third-order multiplexers are connected with the result a of the multiplication unit in the mode 1 state8The three groups of third-order adders are suspended in the mode 0 state, and two inputs of the three groups of third-order adders are respectively the output of the three groups of first third-order multiplexers and the output of the three groups of second third-order multiplexers;
the configurable addition tree requires only 9 addition units as a whole. Therefore, compared with the prior art that the convolution kernel internal addition tree mode and the self-accumulation mode are realized separately, the method can reduce the number of addition units by 47% with single parallelism, and has important significance for reducing the resources occupied by the addition units.
The configurable addition tree is used for a convolution single-engine architecture, and supports forward propagation, error reverse transfer and weight gradient calculation of the convolution single-engine architecture; and meanwhile, the method obtains better performance, can be compatible with two working modes of convolution kernel internal accumulation and self-accumulation in forward propagation, error reverse transfer and weight gradient calculation, and reduces the number of addition units, thereby reducing the consumption of calculation resources.
In the whole CNN training accelerator, the addition tree can meet the addition tree in a convolution kernel in forward propagation and also can meet the self-accumulation function of weight gradient calculation.
Fig. 2 is a schematic diagram of the addition tree accumulation mode proposed by the present invention. The convolution kernel addition tree pattern of pattern 0 is shown as (a). The self-accumulation mode of mode 1 is shown as (b). For the convolution kernel addition tree mode, there are:
in the first set of addition units: the first-level addition tree realizes a2+a1(ii) a The two-stage addition tree realizes a2+a1+a3(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6(ii) a Four-level addition tree realizes a2+a1+a3+a5+a4+a6 a8+a7+a9
In the second group of adding units: the first-level addition tree realizes a5+a4(ii) a The two-stage addition tree realizes a5+a4+a6(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6(ii) a Four-level addition tree realizes a2+a1+a3+a5+a4+a6+a8+a7+a9
In the third group of addition units: the first-level addition tree realizes a8+a7(ii) a The two-stage addition tree realizes a8+a7+a9(ii) a Three-level addition tree realizes a2+a1+a3+a5+a4+a6+a8+a7+a9

Claims (3)

1. A configurable addition tree suitable for a convolutional neural network training accelerator is characterized in that the configurable addition tree is composed of three groups of addition units, and the addition units in each group comprise a first-order multiplexer and adder structure, a second-order multiplexer and adder structure and a third-order multiplexer and adder structure which are connected in series; mode selection with a multiplexer: the output of the multi-path selector is connected with the adder of the next stage in series.
2. The configurable addition tree suitable for use in a convolutional neural network training accelerator as defined in claim 1, wherein said multiplexer comprises a mode 0 and a mode 1, said mode 0 being a convolutional intra-core accumulation mode and said mode 1 being a self-accumulation mode.
3. The configurable addition tree suitable for use in a convolutional neural network training accelerator as claimed in claim 1, wherein the specific structure of said three sets of addition units is as follows:
in the first set of addition units:
in a first-order multiplexer and adder configuration, a first-order multiplexer connects the result a of a multiplication unit in the mode 0 state2Connected to the output of a first-order adder via a feedback line in the mode 1 state, and the two inputs of the first-order adder are the output of a first-order multiplexer and the result a of the multiplication unit1
In the structure of a group of second-order multiplexers and adders, the group of second-order multiplexers are connected with the output end of the group of first-order adders in a mode 0 state and connected with the output end of the group of second-order adders through a feedback line in a mode 1 state, and the two inputs of the group of second-order adders are the output of the group of second-order multiplexers and the result a of a multiplication unit respectively3
In the structure of a group of third-order multiplexers and adders, a group of first third-order multiplexers are connected with the output ends of a group of second-order adders in a mode 0 state and connected with the output ends of the group of third-order adders through a feedback circuit in a mode 1 state, and a group of second third-order multiplexers are connected with a result a of a multiplication unit in the mode 1 state2The two inputs of the group of third-order adders are respectively the output of the group of first third-order multiplexers and the output of the group of second third-order multiplexers;
in the second group of adding units:
in a two-set first-order multiplexer and adder configuration, the two-set first-order multiplexer connects the result a of the multiplication unit in the mode 0 state5Connected to the output ends of the two groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the two groups of first-order adders are the output of the two groups of first-order multiplexers and the result a of the multiplication unit4
In the structure of the two groups of second-order multiplexers and adders, the two groups of second-order multiplexers are connected with the output ends of the two groups of first-order adders in a mode 0 state and are connected with the output ends of the two groups of second-order adders through feedback lines in a mode 1 state, and the two inputs of the two groups of second-order adders are respectively the output ends of the two groups of second-order multiplexers and the result a of the multiplication unit6
In the structure of the two groups of third-order multiplexers and adders, the two groups of first third-order multiplexers are connected with the output ends of the two groups of second-order adders in a mode 0 state and connected with the output ends of the two groups of third-order adders through a feedback circuit in a mode 1 state, and the two groups of second third-order multiplexers are connected with a result a of the multiplication unit in the mode 1 state5The two inputs of the two groups of third-order adders are respectively the output of the two groups of first third-order multiplexers and the output of the two groups of second third-order multiplexers;
in the third group of addition units:
in the structure of three groups of first-order multiplexers and adders, the three groups of first-order multiplexers are connected with the result a of the multiplication unit in the mode 0 state8Connected to the output ends of the three groups of first-order adders through feedback lines in the mode 1 state, and the two inputs of the three groups of first-order adders are the output of the three groups of first-order multiplexers and the result a of the multiplication unit7
In the structure of the three groups of second-order multiplexers and adders, the three groups of second-order multiplexers are connected with the output ends of the three groups of first-order adders in a mode 0 state and connected with the output ends of the three groups of second-order adders through feedback lines in a mode 1 state, and the three groups of second-order adders are connected with the output ends of the three groups of second-order addersThe two inputs of the multiplexer are respectively the output of the three groups of second-order multiplexers and the result a of the multiplication unit9
In the structure of the three groups of third-order multiplexers and adders, three groups of first third-order multiplexers are suspended in the air in the mode 0 state and are connected to the output ends of the three groups of third-order adders through feedback lines in the mode 1 state, and three groups of second third-order multiplexers are connected with the result a of the multiplication unit in the mode 1 state8And in the suspension mode in the mode 0 state, two inputs of the three groups of third-order adders are respectively the output of the three groups of first third-order multiplexers and the output of the three groups of second third-order multiplexers.
CN202110597775.1A 2021-05-31 2021-05-31 Configurable addition tree suitable for convolutional neural network training accelerator Active CN113361687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110597775.1A CN113361687B (en) 2021-05-31 2021-05-31 Configurable addition tree suitable for convolutional neural network training accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110597775.1A CN113361687B (en) 2021-05-31 2021-05-31 Configurable addition tree suitable for convolutional neural network training accelerator

Publications (2)

Publication Number Publication Date
CN113361687A true CN113361687A (en) 2021-09-07
CN113361687B CN113361687B (en) 2023-03-24

Family

ID=77528209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110597775.1A Active CN113361687B (en) 2021-05-31 2021-05-31 Configurable addition tree suitable for convolutional neural network training accelerator

Country Status (1)

Country Link
CN (1) CN113361687B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110006409A (en) * 2009-07-14 2011-01-20 서강대학교산학협력단 Decoder using low density parity check code
CN105611269A (en) * 2015-12-18 2016-05-25 华中科技大学 Real time parallax calculation system based on FPGA
CN106203617A (en) * 2016-06-27 2016-12-07 哈尔滨工业大学深圳研究生院 A kind of acceleration processing unit based on convolutional neural networks and array structure
CN109711542A (en) * 2018-12-29 2019-05-03 西安交通大学 A kind of DNN accelerator that supporting dynamic accuracy and its implementation
US20190279083A1 (en) * 2018-03-06 2019-09-12 DinoplusAI Holdings Limited Computing Device for Fast Weighted Sum Calculation in Neural Networks
CN111898733A (en) * 2020-07-02 2020-11-06 西安交通大学 Deep separable convolutional neural network accelerator architecture
CN112486457A (en) * 2020-11-23 2021-03-12 杭州电子科技大学 Hardware system for realizing improved FIOS modular multiplication algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110006409A (en) * 2009-07-14 2011-01-20 서강대학교산학협력단 Decoder using low density parity check code
CN105611269A (en) * 2015-12-18 2016-05-25 华中科技大学 Real time parallax calculation system based on FPGA
CN106203617A (en) * 2016-06-27 2016-12-07 哈尔滨工业大学深圳研究生院 A kind of acceleration processing unit based on convolutional neural networks and array structure
US20190279083A1 (en) * 2018-03-06 2019-09-12 DinoplusAI Holdings Limited Computing Device for Fast Weighted Sum Calculation in Neural Networks
CN109711542A (en) * 2018-12-29 2019-05-03 西安交通大学 A kind of DNN accelerator that supporting dynamic accuracy and its implementation
CN111898733A (en) * 2020-07-02 2020-11-06 西安交通大学 Deep separable convolutional neural network accelerator architecture
CN112486457A (en) * 2020-11-23 2021-03-12 杭州电子科技大学 Hardware system for realizing improved FIOS modular multiplication algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王佩琪: "神经网络软硬件协同加速关键技术", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Also Published As

Publication number Publication date
CN113361687B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN109447241B (en) Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things
Yin et al. An energy-efficient reconfigurable processor for binary-and ternary-weight neural networks with flexible data bit width
Wu et al. A flexible and efficient FPGA accelerator for various large-scale and lightweight CNNs
CN109740739A (en) Neural computing device, neural computing method and Related product
CN109740754A (en) Neural computing device, neural computing method and Related product
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
CN112734020B (en) Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN115018062A (en) Convolutional neural network accelerator based on FPGA
Chang et al. Towards design methodology of efficient fast algorithms for accelerating generative adversarial networks on FPGAs
CN110598844A (en) Parallel convolution neural network accelerator based on FPGA and acceleration method
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN110110852B (en) Method for transplanting deep learning network to FPAG platform
CN113361687B (en) Configurable addition tree suitable for convolutional neural network training accelerator
CN107092462B (en) 64-bit asynchronous multiplier based on FPGA
US20230128421A1 (en) Neural network accelerator
An et al. 29.3 an 8.09 tops/w neural engine leveraging bit-sparsified sign-magnitude multiplications and dual adder trees
CN112346704B (en) Full-streamline type multiply-add unit array circuit for convolutional neural network
CN112149814A (en) Convolutional neural network acceleration system based on FPGA
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN113191494B (en) Efficient LSTM accelerator based on FPGA
Wen FPGA-Based Deep Convolutional Neural Network Optimization Method
CN112766479A (en) Neural network accelerator supporting channel separation convolution based on FPGA
Brown et al. Nemo-cnn: An efficient near-memory accelerator for convolutional neural networks
Wang et al. Design exploration of multi-fpgas for accelerating deep learning
Hossain et al. Energy efficient computing with heterogeneous DNN accelerators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant