CN113673690A - Underwater noise classification convolution neural network accelerator - Google Patents
Underwater noise classification convolution neural network accelerator Download PDFInfo
- Publication number
- CN113673690A CN113673690A CN202110819180.6A CN202110819180A CN113673690A CN 113673690 A CN113673690 A CN 113673690A CN 202110819180 A CN202110819180 A CN 202110819180A CN 113673690 A CN113673690 A CN 113673690A
- Authority
- CN
- China
- Prior art keywords
- vector
- convolution kernel
- cache pool
- data
- data cache
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title abstract description 7
- 239000013598 vector Substances 0.000 claims abstract description 95
- 238000004364 calculation method Methods 0.000 claims abstract description 49
- 238000010586 diagram Methods 0.000 claims abstract description 30
- 230000017105 transposition Effects 0.000 claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 23
- 238000011176 pooling Methods 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000000903 blocking effect Effects 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 abstract description 3
- 230000004913 activation Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an underwater noise classification convolution neural network accelerator, which comprises: the device comprises a DMA controller, a feature map transposed vector unit, a convolution kernel transposed vector unit, a plurality of minimum calculation units and a data cache pool; the DMA controller reads an input feature map and a first layer of convolution kernel; the characteristic diagram transposition vector unit completes calculation of vector transposition according to the characteristic diagram to obtain a characteristic diagram vector, and writes the characteristic diagram vector into a data cache pool; the convolution kernel transposition vector unit completes first convolution kernel vector transposition according to the first layer of convolution kernels to obtain convolution kernel vectors, and writes the convolution kernel vectors into the data cache pool; each minimum computing unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs point multiplication on the feature map vector and the first convolution kernel vector to obtain a point multiplication result, and stores the point multiplication result in the data cache pool. The invention can adapt to FPGA devices with different resource conditions and realize an accelerator scheme for flexibly increasing and decreasing the parallelism of hardware.
Description
Technical Field
The invention belongs to the technical field of convolutional neural networks for underwater noise classification application, and particularly relates to an accelerator of a convolutional neural network for underwater noise classification.
Background
In the traditional underwater noise classification, a trained sonar operator is used for continuously monitoring sonar to obtain a classification result through experience. This approach relies heavily on the experience and working conditions of the sonographer. The sonar signals are amplified and sampled and then converted into one-dimensional digital signals.
In recent years, convolutional neural networks in deep learning have been widely used in classification problems. A plurality of practical results show that the convolutional neural network has the characteristic of high recognition accuracy when processing the classification problem. When the underwater noise classification network is deployed in an embedded environment, the problems of weak computing power and limited resources of a hardware platform need to be faced.
The special convolutional neural network formed aiming at the underwater target classification problem can realize the calculation capacity improvement by using a special convolutional neural network accelerator, and the real-time problem is solved.
In the related art, the convolutional neural network usually completes all convolutions first, then performs activation, then performs pooling operation, and completes calculation acceleration in a layered manner. The disadvantages of this approach are: each level needs a cache layer to store intermediate calculation results, so that the consumption of cache resources is large; the scale of each layer of computing unit is fixed, and the computing units cannot be expanded or cut according to the size of FPGA resources.
Disclosure of Invention
The technical problem solved by the invention is as follows: the defects in the prior art are overcome, the convolutional neural network accelerator for classifying the underwater noise is provided, the accelerator can adapt to FPGA devices with different resource conditions, and the accelerator scheme for flexibly increasing and decreasing the hardware parallelism is realized.
The purpose of the invention is realized by the following technical scheme: an underwater noise classification convolutional neural network accelerator, comprising: the device comprises a DMA controller, a feature map transposed vector unit, a convolution kernel transposed vector unit, a plurality of minimum calculation units and a data cache pool; the DMA controller reads an input feature map and a first layer of convolution kernel; the characteristic diagram transposition vector unit completes calculation of vector transposition according to the characteristic diagram to obtain a characteristic diagram vector, and writes the characteristic diagram vector into a data cache pool; the convolution kernel transposition vector unit completes first convolution kernel vector transposition according to the first layer of convolution kernels to obtain convolution kernel vectors, and writes the convolution kernel vectors into the data cache pool; each minimum computing unit reads the characteristic map vector and the first convolution kernel vector in the data cache pool, performs point multiplication on the characteristic map vector and the first convolution kernel vector to obtain a point multiplication result, and stores the point multiplication result in the data cache pool; and the DMA controller reads the dot product result in the data cache pool as an input characteristic diagram and a next layer of convolution kernel until the calculation of the number of the convolution layers is completely finished.
In the above-mentioned underwater noise classification convolution neural network accelerator, the size Amax of the data buffer pool is obtained by the following formula:
Amax=AI+MAX(AVi,APki,AO)+AVKi;
the method comprises the steps of obtaining a feature map data steering vector of each layer, obtaining an AI (input data) cache pool, an AO (output data) cache pool, AVi (amplitude versus amplitude) cache pool size after the feature map data steering vector of each layer, AVki (convolution kernel steering vector) and APki (automatic Power Take in) cache pool size of each layer, wherein AI is the input data cache pool size, AO is the output data cache pool size, AVi is the cache pool size after the feature map data steering vector of each layer, AVki is the convolution kernel steering vector size, and APki is the data size of each layer after pooling.
In the above-mentioned underwater noise classification convolution neural network accelerator, for the case that a > Amax, B ═ 1 indicates that the system is not blocked, and all data can be stored in FPGA, and the DMA controller only needs to complete data read-in and calculation result write-back once in the calculation process; for the case where a < Amax, B ═ MAX (AVi, APki, AO)/a, meaning that the input data is broken down into B blocks, if B is calculated to be a decimal, B ═ int (B) + 1.
In the underwater noise classification convolution neural network accelerator, the number V of the calculated vectors of the multiplier is obtained according to the number Kri of the convolution kernel rows and the number Kci of the convolution kernel columns.
In the above-mentioned underwater noise classification convolutional neural network accelerator, the number V of vectors calculated by the multiplier is max (kri x kci).
In the above accelerator, the number V of vectors calculated by the multiplier determines the throughput requirement of data input of a single accelerator calculation unit.
In the above-mentioned underwater noise classification convolution neural network accelerator, the minimum calculation unit is the minimum unit of the calculation unit, and the vector multiplication, activation and pooling operations are realized.
In the underwater noise classification convolutional neural network accelerator, when the byte number A of the data cache pool is smaller than the size Amax of the data cache pool and the blocking parameter B is larger than 1, the DMA controller reads in an input characteristic diagram, converts the characteristic diagram into a characteristic vector and writes the characteristic vector in an RAM outside an FPGA in a blocking mode.
Compared with the prior art, the invention has the following beneficial effects:
(1) the convolution calculation process is multi-layer multiplexing, and the storage multiplexing capability in the accelerator can be enhanced;
(2) the invention can realize the flexible configuration from a small FPGA to a large FPGA and provide a flexible hardware acceleration scheme for the FPGAs with different magnitude resources.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a block diagram of an underwater noise classification convolutional neural network accelerator provided by an embodiment of the present invention;
FIG. 2 is a block diagram of a multiplier-adder implementation according to an embodiment of the present invention;
fig. 3 is another block diagram of a multiplier-adder implementation according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a block diagram of an underwater noise classification convolutional neural network accelerator provided in an embodiment of the present invention. As shown in fig. 1, the underwater noise classification convolutional neural network accelerator comprises a DMA controller, a feature map transposed vector unit, a convolutional kernel transposed vector unit, a plurality of minimum calculation units, and a data cache pool; wherein the content of the first and second substances,
a DMA controller reads in an input feature map and a first layer of convolution kernel; the characteristic diagram transposition vector unit completes calculation of vector transposition according to the characteristic diagram to obtain a characteristic diagram vector, and writes the characteristic diagram vector into a data cache pool; the convolution kernel transposition vector unit completes first convolution kernel vector transposition according to the first layer of convolution kernels to obtain convolution kernel vectors, and writes the convolution kernel vectors into the data cache pool; each minimum computing unit reads the characteristic map vector and the first convolution kernel vector in the data cache pool, performs point multiplication on the characteristic map vector and the first convolution kernel vector to obtain a point multiplication result, and stores the point multiplication result in the data cache pool; and the DMA controller reads the dot product result in the data cache pool as an input characteristic diagram and a next layer of convolution kernel until the calculation of the number of the convolution layers is completely finished.
By utilizing the comprehensive script and the minimum calculation unit provided by the invention, the FPGA devices with different resource conditions can be adapted, and the accelerator scheme for flexibly increasing and decreasing the hardware parallelism is realized.
The invention realizes the generation of the hardware accelerator through the comprehensive script, and the comprehensive script needs a user to provide the following parameters: inputting the number C of the feature maps, the height H of the feature maps and the width W of the feature maps; inputting the number of convolution kernels Cki, the number of rows Kri of the convolution kernels, the number of columns Kci of the convolution kernels and the number of convolution steps Ksi; the row number Pr of the pooling units, the row number Pc of the pooling units and the displacement length Ps of the pooling units; the minimum number of computing units M; the byte number A of the data cache pool.
The minimum computing unit is the minimum unit of the computing unit in the invention and can realize vector multiplication, activation and pooling operations. The types of the activation functions are configured in the integrated script, and for the condition that different convolution layers use different activation functions in one network, various activation functions can be integrated, and the selection of the activation functions is completed through configuration options.
The convolutional neural network can realize the configuration of a minimum computing unit and a maximum N computing units, and the configuration quantity is determined by the real-time requirement of the system and the quantity of logic resources which can be provided by the FPGA.
FIG. 2 is a block diagram of a multiplier-adder implementation according to an embodiment of the present invention; fig. 3 is another block diagram of a multiplier-adder implementation according to an embodiment of the invention. As shown in fig. 2 and 3, the present invention configures the integrated option through the script based on the minimum computing unit structure, and realizes the FPGA chip capable of adapting to different resource capabilities. The implementation process is as follows:
the following configuration is done in the integrated script:
1) inputting characteristic diagram parameters: C. h, W, respectively;
2) inputting convolution kernel parameters: cki, Kri, Kci, Ksi;
3) parameters of the pooling unit: pr, Pc, Ps;
4) minimum calculation unit parameters: m;
5) the byte number A of the data cache pool;
6) the number of convolution layers: i;
and the comprehensive script calculates the size AI of the input data cache pool, the size AO of the output data cache pool, the size AVi of the cache pool after the data steering quantity of each layer of feature map, the size AVki of the convolution kernel steering vector and the size APki of each layer of data after pooling according to the parameters 1) to 4). Because a pipelining relationship exists in the calculation process of each convolution layer, after one layer of convolution calculation is completed, the next layer of convolution is carried out; the maximum number of data which possibly exist at the same time in each layer is considered in the script algorithm to obtain the size of the cache pool, and the algorithm is as follows: amax is AI + MAX (AVi, APki, AO) + AVKi, where i (1, Nk), Nk is the number of convolution layers.
And the script completes the setting of the blocking parameter B according to the parameter 5 and Amax. The algorithm for the B parameter is as follows:
for the condition that A is larger than Amax, B is 1, which means that the system is not blocked, all data can be stored in the FPGA, and DMA only needs to complete data reading and calculation result writing back once in the calculation process; for the case where a < Amax, B ═ MAX (AVi, APki, AO)/a, meaning that the input data is broken down into B blocks, if B is calculated to be a decimal, B ═ int (B) + 1.
The multiplier calculates the number of vectors V max (kri x kci). This parameter determines the throughput requirements of the data input to the individual accelerator compute units.
After the configuration is completed, the script is operated, and the accelerator circuit matched with the current FPGA resource amount is realized through the comprehensive tool. After the circuit is comprehensively completed and is deployed to the FPGA, the calculation process is as follows:
embodiment one, for the case where A > Amax, B <1 is not blocked.
6) The DMA integrally reads an input feature map and a first layer of convolution kernel and puts the input feature map and the first layer of convolution kernel into a calculation cache pool;
7) starting a feature map transpose vector unit, completing the transpose of the calculation vector, and writing the result into a calculation cache pool;
8) starting a convolution kernel transpose vector unit to complete the first convolution kernel vector transpose, and writing the result into a calculation cache pool;
9) and according to the minimum calculation unit parameter M, realizing the distribution of convolution calculation data streams, wherein the number of entries of each calculation unit reading vector is determined by the size of a pooling unit after one convolution calculation is finished, namely Pr x Pc.
10) After the vector enters a minimum calculation unit, finishing the point multiplication of the input feature map vector and the convolution kernel vector;
11) accumulating the calculation results in a minimum calculation unit;
12) adding the accumulated result to the offset;
13) performing activation operation on the addition result in the previous step;
14) repeating the actions of 10) to 13) until all the cells in the specified number of pooled cells are calculated;
15) starting pooling operation, outputting a calculation result, and entering a data cache pool until all vector calculations are completed;
16) using data in the data cache pool as an input characteristic diagram, reading in next layer of convolution kernel data by using DMA, entering the data cache pool, and repeating the steps from 7) to 16) to complete data calculation;
17) and stopping the operation behavior of the accelerator until the convolution calculation of the convolution layer number i and the appointed layer number is completely finished.
Example two, for the case of A < Amax, B >1 blocking.
And the DMA integrally reads the feature map, converts the feature map into feature vectors, and writes the feature vectors back to the RAM outside the FPGA in blocks.
And reading back the first block vector to a computing unit for computing.
18) And 6) executing the actions in the steps 6) to 14), starting pooling operation, finishing the calculation of the input characteristic data of the current block, and entering a data cache pool until the vector calculation of the current block is finished. And writing the data cache pool result back to the external RAM.
19) Starting DMA to read the second block vector into the computing unit, and completing the computing process in the step 18).
20) Reading all calculation results of the convolution of the first layer again according to the blocks, repeating 18), and 19) the specified action until the convolution calculation of the i specified layers is all completed, and stopping the operation of the accelerator.
The convolution calculation process is multi-layer multiplexing, and the storage multiplexing capability in the accelerator can be enhanced; the invention can realize the flexible configuration from a small FPGA to a large FPGA and provide a flexible hardware acceleration scheme for the FPGAs with different magnitude resources.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make variations and modifications of the present invention without departing from the spirit and scope of the present invention by using the methods and technical contents disclosed above.
Claims (8)
1. An underwater noise classification convolutional neural network accelerator, comprising: the device comprises a DMA controller, a feature map transposed vector unit, a convolution kernel transposed vector unit, a plurality of minimum calculation units and a data cache pool; wherein the content of the first and second substances,
a DMA controller reads in an input feature map and a first layer of convolution kernel; the characteristic diagram transposition vector unit completes calculation of vector transposition according to the characteristic diagram to obtain a characteristic diagram vector, and writes the characteristic diagram vector into a data cache pool; the convolution kernel transposition vector unit completes first convolution kernel vector transposition according to the first layer of convolution kernels to obtain convolution kernel vectors, and writes the convolution kernel vectors into the data cache pool; each minimum computing unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs point multiplication on the feature map vector and the first convolution kernel vector to obtain a point multiplication result, and stores the point multiplication result in the data cache pool.
2. The underwater noise classification convolutional neural network accelerator of claim 1, wherein: the size Amax of the data cache pool is obtained by the following formula:
Amax=AI+MAX(AVi,APki,AO)+AVKi;
the method comprises the steps of obtaining a feature map data steering vector of each layer, obtaining an AI (input data) cache pool, an AO (output data) cache pool, AVi (amplitude versus amplitude) cache pool size after the feature map data steering vector of each layer, AVki (convolution kernel steering vector) and APki (automatic Power Take in) cache pool size of each layer, wherein AI is the input data cache pool size, AO is the output data cache pool size, AVi is the cache pool size after the feature map data steering vector of each layer, AVki is the convolution kernel steering vector size, and APki is the data size of each layer after pooling.
3. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: for the condition that A is larger than Amax, B is 1, the system is not blocked, all data can be stored in the FPGA, and the DMA controller only needs to complete data reading and calculation result writing back once in the calculation process; for the case where a < Amax, B ═ MAX (AVi, APki, AO)/a, meaning that the input data is broken down into B blocks, if B is calculated to be a decimal, B ═ int (B) + 1.
4. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: and obtaining the number V of the calculation vectors of the multiplier according to the number Kri of the convolution kernel rows and the number Kci of the convolution kernel columns.
5. The underwater noise classification convolutional neural network accelerator of claim 4, wherein: the multiplier calculates the number V of vectors as V ═ max (kri x kci).
6. The underwater noise classification convolutional neural network accelerator of claim 5, wherein: the number of multiplier computation vectors V determines the throughput requirement of the data input of a single accelerator computation unit.
7. The underwater noise classification convolutional neural network accelerator of claim 5, wherein: the minimum compute unit is the minimum unit of compute unit, implementing vector multiply, activate, and pooling operations.
8. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: when the byte number A of the data cache pool is smaller than the size Amax of the data cache pool and the blocking parameter B is larger than 1, the DMA controller reads in an input characteristic diagram, converts the characteristic diagram into a characteristic vector, and writes the characteristic vector in a RAM outside the FPGA in a blocking mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110819180.6A CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110819180.6A CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113673690A true CN113673690A (en) | 2021-11-19 |
CN113673690B CN113673690B (en) | 2024-05-28 |
Family
ID=78539639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110819180.6A Active CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113673690B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
CN108875917A (en) * | 2018-06-28 | 2018-11-23 | 中国科学院计算技术研究所 | A kind of control method and device for convolutional neural networks processor |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
US20190188237A1 (en) * | 2017-12-18 | 2019-06-20 | Nanjing Horizon Robotics Technology Co., Ltd. | Method and electronic device for convolution calculation in neutral network |
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110991632A (en) * | 2019-11-29 | 2020-04-10 | 电子科技大学 | Method for designing heterogeneous neural network computing accelerator based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
WO2020186703A1 (en) * | 2019-03-20 | 2020-09-24 | Huawei Technologies Co., Ltd. | Convolutional neural network-based image processing method and image processing apparatus |
CN112950656A (en) * | 2021-03-09 | 2021-06-11 | 北京工业大学 | Block convolution method for pre-reading data according to channel based on FPGA platform |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
-
2021
- 2021-07-20 CN CN202110819180.6A patent/CN113673690B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
US20190188237A1 (en) * | 2017-12-18 | 2019-06-20 | Nanjing Horizon Robotics Technology Co., Ltd. | Method and electronic device for convolution calculation in neutral network |
CN108875917A (en) * | 2018-06-28 | 2018-11-23 | 中国科学院计算技术研究所 | A kind of control method and device for convolutional neural networks processor |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
WO2020186703A1 (en) * | 2019-03-20 | 2020-09-24 | Huawei Technologies Co., Ltd. | Convolutional neural network-based image processing method and image processing apparatus |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110991632A (en) * | 2019-11-29 | 2020-04-10 | 电子科技大学 | Method for designing heterogeneous neural network computing accelerator based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN112950656A (en) * | 2021-03-09 | 2021-06-11 | 北京工业大学 | Block convolution method for pre-reading data according to channel based on FPGA platform |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
Non-Patent Citations (5)
Title |
---|
BING LIU 等: "An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution", 《ELECTRONICS》, vol. 8, no. 3, 3 March 2019 (2019-03-03), pages 1 - 18 * |
袁鸣 等: "基于FPGA的油棕检测和硬件加速设计及实现", 《计算机科学与探索》, vol. 15, no. 2, 14 May 2020 (2020-05-14), pages 315 - 326 * |
袁鸣: "基于FPGA的遥感目标检测加速器的研究与实现", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 01, 15 January 2021 (2021-01-15), pages 028 - 222 * |
谢尚港: "卷积神经网络的FPGA算法加速研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 02, 15 February 2021 (2021-02-15), pages 135 - 774 * |
谢思璞 等: "多分支卷积神经网络的FPGA设计与优化", 《嵌入式技术》, vol. 47, no. 7, 6 July 2021 (2021-07-06), pages 97 - 101 * |
Also Published As
Publication number | Publication date |
---|---|
CN113673690B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10929746B2 (en) | Low-power hardware acceleration method and system for convolution neural network computation | |
US20200234124A1 (en) | Winograd transform convolution operations for neural networks | |
US10691996B2 (en) | Hardware accelerator for compressed LSTM | |
KR102528517B1 (en) | Exploiting input data sparsity in neural network compute units | |
CN108229645B (en) | Convolution acceleration and calculation processing method and device, electronic equipment and storage medium | |
JP2022037022A (en) | Execution of kernel stride in hardware | |
CN108629406B (en) | Arithmetic device for convolutional neural network | |
CN110807522B (en) | General calculation circuit of neural network accelerator | |
WO2019157812A1 (en) | Computing device and method | |
US20230026006A1 (en) | Convolution computation engine, artificial intelligence chip, and data processing method | |
CN107633297A (en) | A kind of convolutional neural networks hardware accelerator based on parallel quick FIR filter algorithm | |
US11544526B2 (en) | Computing device and method | |
CN108960414B (en) | Method for realizing single broadcast multiple operations based on deep learning accelerator | |
Kala et al. | UniWiG: Unified winograd-GEMM architecture for accelerating CNN on FPGAs | |
Shahshahani et al. | Memory optimization techniques for fpga based cnn implementations | |
Niu et al. | SPEC2: Spectral sparse CNN accelerator on FPGAs | |
Shu et al. | High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination | |
CN111222090B (en) | Convolution calculation module, neural network processor, chip and electronic equipment | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
CN113673690A (en) | Underwater noise classification convolution neural network accelerator | |
US11853868B2 (en) | Multi dimensional convolution in neural network processor | |
WO2021081854A1 (en) | Convolution operation circuit and convolution operation method | |
CN115222028A (en) | One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method | |
CN114154621A (en) | Convolutional neural network image processing method and device based on FPGA | |
CN110610227B (en) | Artificial neural network adjusting method and neural network computing platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |