CN110807522A - General calculation circuit of neural network accelerator - Google Patents
General calculation circuit of neural network accelerator Download PDFInfo
- Publication number
- CN110807522A CN110807522A CN201911055499.5A CN201911055499A CN110807522A CN 110807522 A CN110807522 A CN 110807522A CN 201911055499 A CN201911055499 A CN 201911055499A CN 110807522 A CN110807522 A CN 110807522A
- Authority
- CN
- China
- Prior art keywords
- cascade
- adder
- general
- output
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a general computation module circuit of a neural network accelerator, which consists of m general computation modules PE, wherein any ith general computation module PE consists of an RAM and a 2nThe device comprises a multiplier, an adder tree, a cascade adder, a bias adder, a first-in first-out queue and a ReLu activation function module, and different calculating circuits of the neural network are respectively built by utilizing single PE convolution configuration, cascade PE convolution configuration, a single PE full-connection configuration diagram and cascade PE full-connection configuration. The invention can configure the general computing circuit according to the variable of the neural network accelerator, thereby enabling the neural network to be built or modified more simply, conveniently and quickly, shortening the reasoning time of the neural network and reducing the hardware development time of relevant deep research.
Description
Technical Field
The invention belongs to the technical field of Field Programmable Gate Array (FPGA) design of integrated circuits, and particularly relates to a general computation module circuit of a neural network accelerator.
Background
In 2012, AlexNet takes away the champion with the challenge of large-scale visual recognition, and the deep neural network becomes a research hotspot again, wherein the research on the convolutional neural network receives more and more attention, and the AlexNet is widely applied to the fields of digital video monitoring, face recognition, image classification and the like. A large amount of iterative operations and data reading are used in the learning process of the convolutional neural network, and the CPU cannot fully utilize the parallelism characteristic existing in the neural network due to the limited number of the inner cores. In order to increase the computation speed of the convolutional neural network, researchers have proposed hardware architectures of the convolutional neural network based on GPU, FPGA and ASIC, wherein development based on GPU has been widely applied in many fields. In these platforms, the FPGA is a computationally intensive device, and a plurality of dedicated arithmetic computation units, logic module resources, and storage resources in the chip are provided on the chip, so that each computation unit of the convolutional neural network can be executed in parallel on the FPGA, and thus the FPGA is very suitable for being used as a hardware accelerator of the convolutional neural network. On the other hand, the FPGA has the characteristics of flexibility and high efficiency, the power consumption of the chip is much lower than that of a GPU, the chip size is smaller and the cost is lower than that of an ASIC chip, so that the FPGA can be very conveniently applied to various electronic products needing online image or sound processing at any time, such as financial prediction, artificial intelligent robots, medical diagnosis and the like, the FPGA is flexible in programming, the product is easy to upgrade and maintain, and the design cycle and the time to market of the product are relatively short. The acceleration research of the convolutional neural network on the FPGA platform is still in a starting stage and is not widely applied to various commercial fields;
although the current FPGA platform can implement convolutional neural network development, the platform also has limitations:
1) when the FPGA platform utilizes a hardware description language to develop the convolutional neural network, the design of a modularized hardware circuit is lacked, the debugging is complicated, and the hardware development period of the convolutional neural network is long;
2) because the traditional development mode of the FPGA describes the circuit behavior by using a hardware description language, when designing and building a neural network, various variables of the built neural network, such as the size of a convolution kernel, the number of feature maps, the number of convolution layers, the number of full connection layers, the network output category and the like, need to be considered, and the hardware design of basic components such as convolution layers, pooling layers, full connection layers, activation function layers and the like is relatively rigid, so that the flexibility is poor, if some variables of the neural network are changed, the circuit behavior of a bottom layer needs to be described by using the hardware description language again, so that the universality is poor;
disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a general computing circuit of a neural network accelerator, which aims to improve the universality and flexibility of a computing module PE, thereby improving the performance of the neural network accelerator and reducing the hardware development time.
The technical scheme adopted by the invention to achieve the aim is as follows:
the general calculation circuit of the neural network accelerator is characterized by consisting of m general calculation modules PE, wherein any ith general calculation module PE consists of RAM and 2nThe system comprises a multiplier, an adder tree, a cascade adder, an offset adder, a first-in first-out queue and a ReLu activation function module;
at the current cycle, 2nThe multiplier acquires the stored weight data from the RAM, receives and processes externally input calculation data to obtain 2 in the current periodnPassing the product to the adder tree;
the adder tree pair is 2 in the current cyclenThe products are accumulated to obtain the accumulated sum in the current period and then stored in the first-in first-outIn a queue;
the first-in first-out queue reads the accumulated sum in the current period and transmits the accumulated sum to the cascade adder;
the cascade adder receives the accumulated sum in the current period and calculates the accumulated sum with cascade inputs in different configurations to obtain the cascade output of the ith cascade adder in the current period;
the offset adder receives the cascade output of the ith cascade adder in the current period, calculates the cascade output with the offset data input externally in the current period, obtains an addition result and transmits the addition result to the ReLu activation function module;
and processing the addition result by the ReLu activation function module to obtain the output result of the ith general computation module PE in the current period and the output result of the general computation circuit in different configurations.
The general computation circuit of the neural network accelerator is also characterized in that the different configurations are performed according to the following steps:
The single PE convolution configuration is:
setting the cascade input of a cascade adder in the ith general computation module PE to be 0;
and taking the output results of the m general computation modules PE as the output results of the general computation circuit.
The cascaded PE convolution configuration is:
taking the cascade output of the cascade adder in the ith-1 th general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
when i is 1, setting the cascade input of a cascade adder in the ith general computation module PE to be 0;
and taking the output result of the m-th general computation module PE as the output result of the general computation circuit.
The single PE full connection configuration is as follows:
taking the cascade output of the cascade adder in the ith general computing unit in the previous period as the cascade input of the cascade adder in the ith general computing module PE;
and taking the output results of the m general computation modules PE as the output results of the general computation circuit.
The cascade PE full-connection configuration comprises the following steps:
taking the cascade output of the cascade adder in the ith-1 th general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
when i is 1, taking the cascade output of the cascade adder in the mth general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
and taking the output result of the m-th general computation module PE as the output result of the general computation circuit.
Compared with the prior art, the beneficial technical effects of the invention are as follows:
the invention fully utilizes the general characteristics of convolution neural network convolution calculation and full-connection calculation, configures a general calculation circuit according to variables of a neural network accelerator, determines to adopt single PE convolution configuration or cascade PE convolution configuration to build a calculation circuit of the neural network by judging whether the size of convolution kernels in the built neural network is smaller than the number of multipliers of a general calculation module PE, determines to adopt single PE full-connection configuration or cascade PE full-connection configuration to build a calculation circuit of the neural network by judging whether the number of input characteristic graphs of a full-connection layer in the built neural network is smaller than the number of multipliers of the general calculation module PE, thereby being applicable to convolution calculation and full-connection calculation under various conditions, and the configuration method supports various combination variable variables of the built neural network, when one variable in the neural network changes in the design process, the rewriting module of the calculation module is not required to be overturned, and only the general calculation module is required to be reconfigured, so that the circuit design scheme is simplified, the reasoning time of the neural network is shortened, and the complexity of a circuit of a convolutional neural network system and the hardware development time of the convolutional neural network are reduced;
drawings
FIG. 1 is a block diagram of a general computing module PE hardware circuit according to the present invention;
FIG. 2 is a diagram of a convolutional neural network;
FIG. 3 is a diagram of a single PE convolution arrangement in accordance with the present invention;
FIG. 4 is a diagram of a cascaded PE convolution arrangement in accordance with the present invention;
FIG. 5 is a diagram of a single PE full connection configuration according to the present invention;
fig. 6 is a diagram of the configuration of the cascaded PE full connection according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a general computation circuit of a neural network accelerator is composed of m general computation modules PE, where any one of the general computation modules PE is composed of RAM and 2nThe system comprises a multiplier, an adder tree, a cascade adder, an offset adder, a first-in first-out queue and a ReLu activation function module; in this embodiment, n is 2, thereby obtaining 4 multipliers
In the current period, 4 multipliers acquire stored weight data from the RAM, receive and process externally input calculation data, obtain 4 products in the current period and transmit the products to an adder tree;
the adder tree carries out accumulation processing on the 4 products in the current period to obtain the accumulation sum in the current period and then stores the accumulation sum in a first-in first-out queue;
the first-in first-out queue reads the accumulated sum in the current period and transmits the accumulated sum to the cascade adder, and the first-in first-out queue plays a role of caching data;
the cascade adder receives the accumulated sum in the current period and calculates the accumulated sum with cascade inputs in different configurations to obtain the cascade output of the cascade adder in the current period;
the offset adder receives the cascade output of the cascade adder in the current period, calculates the cascade output with the offset data input from the outside in the current period, and transmits the addition result to the ReLu activation function module;
and processing the addition result by the ReLu activation function module to obtain an output result of the general computation module PE in the current period.
As shown in fig. 2, the convolutional neural network is composed of convolutional layers, pooling layers, activation functions, and fully-connected layers; different configurations of the general computation module PE are suitable for convolution computation and full-connection computation under various conditions; the different configurations are carried out according to the following steps:
and 2, judging whether the number of input feature maps of a full-connection layer in the neural network is less than 4 of the number of multipliers, if so, executing single PE full-connection configuration, and otherwise, executing cascade PE full-connection configuration.
As shown in fig. 3, the single PE convolution configuration is suitable for the case where the size of the convolution kernel in the neural network is smaller than the number of the PE multipliers in the general computation module, for example, the convolution computation is performed when the convolution kernel is 2 × 2, and the general computation circuit in fig. 3 is a general computation module PE;
the single PE convolution configuration is:
setting the cascade input of a cascade adder in a general computation module PE to be 0;
taking the output result of the general computation module PE as the output result of the general computation circuit;
the calculation of the single PE convolution configuration is performed as follows:
In this embodiment, as shown in fig. 4, the cascaded PE convolution configuration is suitable for a case where the size of a convolution kernel in the neural network is larger than the number of general computation module PE multipliers, for example, convolution computation under the condition of convolution kernel 3 × 3, and the general computation circuit in fig. 4 is a cascade of 3 general computation modules PE;
the cascaded PE convolution configuration is:
taking the cascade output of the cascade adder in the 1 st general computation module PE in the previous period as the cascade input of the cascade adder in the 2 nd general computation module PE in the current period, and taking the cascade output of the cascade adder in the 2 nd general computation module PE in the current period as the cascade input of the cascade adder in the 3 rd general computation module PE in the next period;
setting the cascade input of a cascade adder in the 1 st general computation module PE as '0';
taking the output result of the 3 rd general computation module PE as the output result of the general computation circuit;
the calculation of the convolution configuration of the cascaded PE is carried out according to the following steps:
step 5, in the previous cycle, the first cascade adder receives the accumulated sum and calculates with the first cascade input, and because the cascade input of the first cascade adder is set to '0', the obtained cascade output of the first cascade adder is still equal to the accumulated sum, and the cascade output of the first cascade adder is specifically (1 × 1+1 × 2+1 × 3+1 × 4); the cascade output of the cascade adder in the first general computation module PE in the previous period is used as the cascade input of the cascade adder in the second general computation module PE in the current period; during the current cycle, the second cascade adder receives the accumulated sum and performs calculation with the second cascade input, and since the cascade input of the second cascade adder is the cascade output of the cascade adder in the first general calculation module PE of the previous cycle, the cascade output of the second cascade adder is obtained, specifically, (1 × 1+1 × 2+1 × 3+1 × 4+1 × 5+ 1+ 6+1 × 7+1 × 8); the cascade output of the cascade adder in the second general computation module PE of the current period is used as the cascade input of the cascade adder in the third general computation module PE of the next period; in the next cycle, the third cascade adder receives the accumulated sum and performs calculation with the third cascade input, and since the cascade input of the third cascade adder is the cascade output of the cascade adder in the second general calculation module PE in the current cycle, the cascade output of the third cascade adder is obtained, specifically, (1 × 1+1 × 2+1 × 3+1 × 4+1 × 5+ 1+ 6+1 × 7+1 × 8+1 × 9); the calculation process corresponds to the full addition step in the convolution calculation process under the condition that the convolution kernel is 3 x 3
In this embodiment, as shown in fig. 5, the single PE full-connection configuration is suitable for the case where the number of input feature maps of the full-connection layer in the neural network is less than the number of multipliers of the general computation module PE, and the general computation circuit in fig. 4 is a general computation module PE;
the single PE full connection configuration is as follows:
taking the cascade output of the cascade adder in the general computing unit in the previous period as the cascade input of the cascade adder in the general computing module PE;
and taking the output result of the general computation module PE as the output result of the general computation circuit.
The calculation of the single PE full-connection configuration is carried out according to the following steps:
As shown in fig. 6, the cascade PE full-connection configuration is suitable for the case where the number of input feature maps of the full-connection layer in the neural network is greater than the number of multipliers of the general computation modules PE, for example, the full-connection computation is performed when the number of input feature maps is 16, and the general computation circuit in fig. 6 is a cascade of four general computation modules PE;
the configuration of the cascade PE full connection is as follows:
taking the cascade output of the cascade adder in the 1 st general-purpose computing module PE in the previous period as the cascade input of the cascade adder in the 2 nd general-purpose computing module PE in the current period, taking the cascade output of the cascade adder in the 2 nd general-purpose computing module PE in the current period as the cascade input of the cascade adder in the 3 rd general-purpose computing module PE in the next period, and taking the cascade output of the cascade adder in the 3 rd general-purpose computing module PE in the next period as the cascade input of the cascade adder in the 4 th general-purpose computing module PE in the next period;
the cascade output of the cascade adder in the 4 th general computation module PE in the previous period is used as the cascade input of the cascade adder in the 1 st general computation module PE;
the output result of the 4 th general computation module PE is used as the output result of the general computation circuit.
The calculation of the cascade PE full-connection configuration is carried out according to the following steps:
step 6, in the previous period, the first cascade adder receives the accumulated sum and calculates with the first cascade input, and because the cascade input of the first cascade adder is set to be '0', the cascade output of the first cascade adder is still equal to the accumulated sum; the cascade output of the cascade adder in the first general computation module PE in the previous period is used as the cascade input of the cascade adder in the second general computation module PE in the current period; in the current period, the second cascade adder receives the accumulated sum and calculates with the second cascade input, and the cascade output of the second cascade adder is obtained because the cascade input of the second cascade adder is the cascade output of the cascade adder in the first general calculation module PE in the previous period; the cascade output of the cascade adder in the second general computation module PE of the current period is used as the cascade input of the cascade adder in the third general computation module PE of the next period; in the next period, the third cascade adder receives the accumulated sum and calculates with the third cascade input, and the cascade output of the third cascade adder is obtained because the cascade input of the third cascade adder is the cascade output of the cascade adder in the second general calculation module PE in the current period; the cascade output of the cascade adder in the third general computation module PE of the next period is used as the cascade input of the cascade adder in the fourth general computation module PE of the next period; in the next period, the fourth cascade adder receives the accumulated sum and calculates with the fourth cascade input, and because the cascade input of the fourth cascade adder is the cascade output of the cascade adder in the third general calculation module PE in the next period, the cascade output of the fourth cascade adder is obtained and is the sum of the accumulated sums of 4 adder trees, that is, the accumulated sum of 16 products;
Claims (6)
1. A general calculation circuit of a neural network accelerator is characterized by consisting of m general calculation modules PE, wherein any ith general calculation module PE consists of an RAM and a 2nThe system comprises a multiplier, an adder tree, a cascade adder, an offset adder, a first-in first-out queue and a ReLu activation function module;
at the current cycle, 2nThe multiplier acquires the stored weight data from the RAM, receives and processes externally input calculation data to obtain 2 in the current periodnPassing the product to the adder tree;
the adder tree pair is 2 in the current cyclenAccumulating the products to obtain the accumulated sum in the current period and storing the accumulated sum in the first-in first-out queue;
the first-in first-out queue reads the accumulated sum in the current period and transmits the accumulated sum to the cascade adder;
the cascade adder receives the accumulated sum in the current period and calculates the accumulated sum with cascade inputs in different configurations to obtain the cascade output of the ith cascade adder in the current period;
the offset adder receives the cascade output of the ith cascade adder in the current period, calculates the cascade output with the offset data input externally in the current period, obtains an addition result and transmits the addition result to the ReLu activation function module;
and processing the addition result by the ReLu activation function module to obtain the output result of the ith general computation module PE in the current period and the output result of the general computation circuit in different configurations.
2. The general purpose computing circuit of a neural network accelerator, as claimed in claim 1, wherein said different configurations are performed by:
step 1, judging whether the size of a convolution kernel in a neural network is smaller than the number of multipliers 2nIf yes, executing single PE convolution configuration; otherwise, executing the convolution configuration of the cascaded PE;
step 2, judging whether the number of input characteristic graphs of the full connection layer in the neural network is less than the number of multipliers 2nIf yes, executing single PE full connection configuration, otherwise executing cascade PE full connection configuration.
3. The general purpose computing circuit of a neural network accelerator as claimed in claim 2, wherein said single PE convolution configuration is:
setting the cascade input of a cascade adder in the ith general computation module PE to be 0;
and taking the output results of the m general computation modules PE as the output results of the general computation circuit.
4. The method of claim 2, wherein said cascaded PE convolution configuration is:
taking the cascade output of the cascade adder in the ith-1 th general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
when i is 1, setting the cascade input of a cascade adder in the ith general computation module PE to be 0;
and taking the output result of the m-th general computation module PE as the output result of the general computation circuit.
5. The method of claim 2, wherein the single-PE fully-connected configuration is:
taking the cascade output of the cascade adder in the ith general computing unit in the previous period as the cascade input of the cascade adder in the ith general computing module PE;
and taking the output results of the m general computation modules PE as the output results of the general computation circuit.
6. The method of claim 2, wherein the cascaded PE fully connected configuration is:
taking the cascade output of the cascade adder in the ith-1 th general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
when i is 1, taking the cascade output of the cascade adder in the mth general computation module PE in the previous period as the cascade input of the cascade adder in the ith general computation module PE;
and taking the output result of the m-th general computation module PE as the output result of the general computation circuit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911055499.5A CN110807522B (en) | 2019-10-31 | 2019-10-31 | General calculation circuit of neural network accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911055499.5A CN110807522B (en) | 2019-10-31 | 2019-10-31 | General calculation circuit of neural network accelerator |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110807522A true CN110807522A (en) | 2020-02-18 |
CN110807522B CN110807522B (en) | 2022-05-06 |
Family
ID=69489925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911055499.5A Active CN110807522B (en) | 2019-10-31 | 2019-10-31 | General calculation circuit of neural network accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110807522B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111610963A (en) * | 2020-06-24 | 2020-09-01 | 上海西井信息科技有限公司 | Chip structure and multiply-add calculation engine thereof |
CN112580787A (en) * | 2020-12-25 | 2021-03-30 | 北京百度网讯科技有限公司 | Data processing method, device and equipment of neural network accelerator and storage medium |
CN112862091A (en) * | 2021-01-26 | 2021-05-28 | 合肥工业大学 | Resource multiplexing type neural network hardware accelerating circuit based on quick convolution |
CN112965931A (en) * | 2021-02-22 | 2021-06-15 | 北京微芯智通科技合伙企业(有限合伙) | Digital integration processing method based on CNN cell neural network structure |
CN113095495A (en) * | 2021-03-29 | 2021-07-09 | 上海西井信息科技有限公司 | Control method of convolutional neural network module |
WO2021248540A1 (en) * | 2020-06-11 | 2021-12-16 | 杭州知存智能科技有限公司 | Data loading circuit and method |
US11977969B2 (en) | 2020-06-11 | 2024-05-07 | Hangzhou Zhicun Intelligent Technology Co., Ltd. | Data loading |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106875012A (en) * | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA |
CN108805266A (en) * | 2018-05-21 | 2018-11-13 | 南京大学 | A kind of restructural CNN high concurrents convolution accelerator |
CN109543140A (en) * | 2018-09-20 | 2019-03-29 | 中国科学院计算技术研究所 | A kind of convolutional neural networks accelerator |
CN109726806A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | Information processing method and terminal device |
CN109886400A (en) * | 2019-02-19 | 2019-06-14 | 合肥工业大学 | The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel |
-
2019
- 2019-10-31 CN CN201911055499.5A patent/CN110807522B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106875012A (en) * | 2017-02-09 | 2017-06-20 | 武汉魅瞳科技有限公司 | A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA |
CN109726806A (en) * | 2017-10-30 | 2019-05-07 | 上海寒武纪信息科技有限公司 | Information processing method and terminal device |
CN108805266A (en) * | 2018-05-21 | 2018-11-13 | 南京大学 | A kind of restructural CNN high concurrents convolution accelerator |
CN109543140A (en) * | 2018-09-20 | 2019-03-29 | 中国科学院计算技术研究所 | A kind of convolutional neural networks accelerator |
CN109886400A (en) * | 2019-02-19 | 2019-06-14 | 合肥工业大学 | The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel |
Non-Patent Citations (2)
Title |
---|
余子健: "基于FPGA的卷积神经网络加速器", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
朱智洋: "基于近似计算与数据调度的CNN加速器设计与优化", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021248540A1 (en) * | 2020-06-11 | 2021-12-16 | 杭州知存智能科技有限公司 | Data loading circuit and method |
CN113807506A (en) * | 2020-06-11 | 2021-12-17 | 杭州知存智能科技有限公司 | Data loading circuit and method |
US11977969B2 (en) | 2020-06-11 | 2024-05-07 | Hangzhou Zhicun Intelligent Technology Co., Ltd. | Data loading |
CN111610963A (en) * | 2020-06-24 | 2020-09-01 | 上海西井信息科技有限公司 | Chip structure and multiply-add calculation engine thereof |
CN111610963B (en) * | 2020-06-24 | 2021-08-17 | 上海西井信息科技有限公司 | Chip structure and multiply-add calculation engine thereof |
CN112580787A (en) * | 2020-12-25 | 2021-03-30 | 北京百度网讯科技有限公司 | Data processing method, device and equipment of neural network accelerator and storage medium |
CN112580787B (en) * | 2020-12-25 | 2023-11-17 | 北京百度网讯科技有限公司 | Data processing method, device and equipment of neural network accelerator and storage medium |
CN112862091A (en) * | 2021-01-26 | 2021-05-28 | 合肥工业大学 | Resource multiplexing type neural network hardware accelerating circuit based on quick convolution |
CN112965931A (en) * | 2021-02-22 | 2021-06-15 | 北京微芯智通科技合伙企业(有限合伙) | Digital integration processing method based on CNN cell neural network structure |
CN113095495A (en) * | 2021-03-29 | 2021-07-09 | 上海西井信息科技有限公司 | Control method of convolutional neural network module |
CN113095495B (en) * | 2021-03-29 | 2023-08-25 | 上海西井科技股份有限公司 | Control Method of Convolutional Neural Network Module |
Also Published As
Publication number | Publication date |
---|---|
CN110807522B (en) | 2022-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110807522B (en) | General calculation circuit of neural network accelerator | |
CN107862374B (en) | Neural network processing system and processing method based on assembly line | |
CN107844826B (en) | Neural network processing unit and processing system comprising same | |
CN110458279B (en) | FPGA-based binary neural network acceleration method and system | |
US10691996B2 (en) | Hardware accelerator for compressed LSTM | |
CN108108809B (en) | Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof | |
CN108090565A (en) | Accelerated method is trained in a kind of convolutional neural networks parallelization | |
CN109284824B (en) | Reconfigurable technology-based device for accelerating convolution and pooling operation | |
CN108629406B (en) | Arithmetic device for convolutional neural network | |
CN110163357A (en) | A kind of computing device and method | |
CN113240101B (en) | Method for realizing heterogeneous SoC (system on chip) by cooperative acceleration of software and hardware of convolutional neural network | |
CN113344179B (en) | IP core of binary convolution neural network algorithm based on FPGA | |
CN111126569B (en) | Convolutional neural network device supporting pruning sparse compression and calculation method | |
CN112734020B (en) | Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network | |
CN110163350A (en) | A kind of computing device and method | |
Hao | A general neural network hardware architecture on FPGA | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
Domingos et al. | An efficient and scalable architecture for neural networks with backpropagation learning | |
Shu et al. | High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination | |
CN112836793B (en) | Floating point separable convolution calculation accelerating device, system and image processing method | |
CN115222028A (en) | One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method | |
CN114065923A (en) | Compression method, system and accelerating device of convolutional neural network | |
CN110807479A (en) | Neural network convolution calculation acceleration method based on Kmeans algorithm | |
CN110765413A (en) | Matrix summation structure and neural network computing platform | |
Alaeddine et al. | A Pipelined Energy-efficient Hardware Accelaration for Deep Convolutional Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |