CN113673690B - Underwater noise classification convolutional neural network accelerator - Google Patents
Underwater noise classification convolutional neural network accelerator Download PDFInfo
- Publication number
- CN113673690B CN113673690B CN202110819180.6A CN202110819180A CN113673690B CN 113673690 B CN113673690 B CN 113673690B CN 202110819180 A CN202110819180 A CN 202110819180A CN 113673690 B CN113673690 B CN 113673690B
- Authority
- CN
- China
- Prior art keywords
- vector
- convolution kernel
- feature map
- data cache
- cache pool
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 84
- 238000004364 calculation method Methods 0.000 claims abstract description 63
- 238000010586 diagram Methods 0.000 claims abstract description 13
- 230000017105 transposition Effects 0.000 claims abstract description 12
- 238000011176 pooling Methods 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 230000000903 blocking effect Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 abstract description 3
- 230000009471 action Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an underwater noise classification convolutional neural network accelerator, which comprises: the device comprises a DMA controller, a feature map transpose vector unit, a convolution kernel transpose vector unit, a plurality of minimum calculation units and a data cache pool; the DMA controller reads in the input characteristic diagram and the first layer convolution kernel; the feature map transposition vector unit completes calculation vector transposition according to the feature map to obtain a feature map vector, and writes the feature map vector into a data cache pool; the convolution kernel vector transferring unit is used for transferring a first convolution kernel vector according to the first layer of convolution kernels to obtain a convolution kernel vector, and writing the convolution kernel vector into a data cache pool; and each minimum calculation unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs dot multiplication on the feature map vector and the first convolution kernel vector to obtain a dot multiplication result, and stores the dot multiplication result in the data cache pool. The invention can adapt to FPGA devices with different resource conditions, and realizes an accelerator scheme for flexibly increasing and decreasing hardware parallelism.
Description
Technical Field
The invention belongs to the technical field of convolutional neural networks for underwater noise classification application, and particularly relates to an underwater noise classification convolutional neural network accelerator.
Background
The traditional underwater noise classification is to obtain classification results through experience by trained sonar staff and continuous monitoring sonar. This approach relies heavily on the sonar operator's experience and working state. The sonar signal can be converted into a one-dimensional digital signal after being amplified and sampled.
In recent years, convolutional neural networks have been widely used in deep learning for classification problems. A plurality of practical results show that the convolutional neural network has the characteristic of high recognition accuracy when the convolutional neural network is used for processing the classification problem. The underwater noise classification network is deployed in an embedded environment, and needs to face the problems of weak computing capacity and limited resources of a hardware platform.
Aiming at the special convolutional neural network formed by the underwater target classification problem, a special convolutional neural network accelerator can be used for realizing calculation force improvement, so that the real-time problem is solved.
In the related art, the convolutional neural network usually completes all convolutions, then performs activation and then pooling operation, and completes calculation acceleration in layers. The disadvantages of this approach are: each level needs a cache layer to store intermediate calculation results, and the consumption of cache resources is high; the scale of each layer of computing unit is fixed, and expansion or cutting cannot be performed according to the size of FPGA resources.
Disclosure of Invention
The invention solves the technical problems that: the accelerator scheme has the advantages that the defects of the prior art are overcome, the accelerator for the underwater noise classification convolutional neural network is provided, the accelerator can adapt to FPGA devices with different resource conditions, and the accelerator scheme for flexibly increasing and decreasing the hardware parallelism is realized.
The invention aims at realizing the following technical scheme: an underwater noise classification convolutional neural network accelerator comprising: the device comprises a DMA controller, a feature map transpose vector unit, a convolution kernel transpose vector unit, a plurality of minimum calculation units and a data cache pool; the DMA controller reads in the input characteristic diagram and the first layer convolution kernel; the feature map transposition vector unit completes calculation vector transposition according to the feature map to obtain a feature map vector, and writes the feature map vector into a data cache pool; the convolution kernel vector transferring unit is used for transferring a first convolution kernel vector according to the first layer of convolution kernels to obtain a convolution kernel vector, and writing the convolution kernel vector into a data cache pool; each minimum calculation unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs dot multiplication on the feature map vector and the first convolution kernel vector to obtain a dot multiplication result, and stores the dot multiplication result into the data cache pool; and the DMA controller reads the dot multiplication result in the data cache pool as an input characteristic diagram and a convolution kernel of the next layer until the calculation of the convolution layer number is completed.
In the underwater noise classification convolutional neural network accelerator, the data cache pool size Amax is obtained by the following formula:
Amax=AI+MAX(AVi,APki,AO)+AVKi;
Wherein AI is the size of the input data buffer pool, AO is the size of the output data buffer pool, AVi is the size of the buffer pool after the data of each layer of characteristic map are diverted, AVki is the size of convolution kernel diverted, and APki is the size of each layer of data after pooling.
In the underwater noise classification convolutional neural network accelerator, for the case that A is larger than Amax, B=1 indicates that the system is not segmented, data can be completely not stored in the FPGA, and a DMA controller only needs to complete data reading and calculation result writing back once in the calculation process; for the case of a < Amax, b=max (AVi, APki, AO)/a, representing the input data decomposed into B blocks, if B is calculated as a fraction, then b=int (B) +1.
In the underwater noise classification convolutional neural network accelerator, the number V of the calculated vectors of the multiplier is obtained according to the number Kri of the convolutional kernels and the number Kci of the convolutional kernels.
In the above underwater noise classification convolutional neural network accelerator, the number of multipliers is calculated as v= MAX (Kri x Kci).
In the underwater noise classification convolutional neural network accelerator, the number V of the calculated vectors of the multiplier determines the throughput requirement of the data input of a single accelerator calculation unit.
In the underwater noise classification convolutional neural network accelerator, the minimum calculation unit is the minimum unit of the calculation unit, and vector multiplication, activation and pooling operations are realized.
In the underwater noise classification convolutional neural network accelerator, when the byte number A of the data cache pool is smaller than the size Amax of the data cache pool and the blocking parameter B is larger than 1, the DMA controller reads in the input feature map, converts the input feature map into the feature vector, and blocks the feature vector and writes the feature vector into the RAM outside the FPGA.
Compared with the prior art, the invention has the following beneficial effects:
(1) The convolution calculation process is multi-layer multiplexing, so that the storage multiplexing capacity in the accelerator can be enhanced;
(2) The invention can realize flexible configuration from small-sized FPGA to large-sized FPGA, and provides flexible hardware acceleration scheme for FPGAs with different level resources.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a block diagram of an underwater noise classification convolutional neural network accelerator provided by an embodiment of the present invention;
FIG. 2 is a block diagram of an embodiment of a multiplier-adder implementation provided by the present invention;
Fig. 3 is another block diagram of a multiplier-adder implementation provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
Fig. 1 is a block diagram of an underwater noise classification convolutional neural network accelerator provided by an embodiment of the present invention. As shown in FIG. 1, the underwater noise classification convolutional neural network accelerator comprises a DMA controller, a feature map transpose vector unit, a convolutional kernel transpose vector unit, a plurality of minimum calculation units and a data cache pool; wherein,
The DMA controller reads in the input characteristic diagram and the first layer convolution kernel; the feature map transposition vector unit completes calculation vector transposition according to the feature map to obtain a feature map vector, and writes the feature map vector into a data cache pool; the convolution kernel vector transferring unit is used for transferring a first convolution kernel vector according to the first layer of convolution kernels to obtain a convolution kernel vector, and writing the convolution kernel vector into a data cache pool; each minimum calculation unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs dot multiplication on the feature map vector and the first convolution kernel vector to obtain a dot multiplication result, and stores the dot multiplication result into the data cache pool; and the DMA controller reads the dot multiplication result in the data cache pool as an input characteristic diagram and a convolution kernel of the next layer until the calculation of the convolution layer number is completed.
The comprehensive script and the minimum calculation unit provided by the invention can be adapted to FPGA devices with different resource conditions, and the accelerator scheme for flexibly increasing and decreasing the hardware parallelism is realized.
The invention realizes the generation of the hardware accelerator through the comprehensive script, and the user is required to provide the following parameters in the comprehensive script: inputting the number C of the feature graphs, the height H of the feature graphs and the width W of the feature graphs; the number Cki of convolution kernels, the number Kri of the convolution kernels, the number Kci of the convolution kernels and the number Ksi of convolution steps are input respectively; the number Pr of the pooling unit lines, the number Pc of the pooling unit lines and the displacement length Ps of the pooling unit lines; a minimum number of calculation units M; data cache pool byte number a.
The minimum calculation unit is the minimum unit of the calculation unit in the invention, and can realize vector multiplication, activation and pooling operation. The types of the activation functions are configured in the comprehensive script, and for the condition that different convolution layers use different activation functions in one network, multiple activation functions can be synthesized, and the selection of the activation functions is completed through configuration options.
The convolutional neural network can realize the configuration of a minimum computing unit and a maximum N computing units, and the configuration quantity is determined by the real-time requirement of the system and the quantity of logic resources which can be provided by the FPGA.
FIG. 2 is a block diagram of an embodiment of a multiplier-adder implementation provided by the present invention; fig. 3 is another block diagram of a multiplier-adder implementation provided by an embodiment of the present invention. As shown in fig. 2 and 3, the invention realizes the FPGA chip which can adapt to different resource capacities by configuring the comprehensive options through the script based on the minimum calculation unit structure. The implementation process is as follows:
the following configuration is completed in the integrated script:
1) Inputting characteristic diagram parameters: C. h, W;
2) Inputting convolution kernel parameters: cki, kri, kci, ksi;
3) Pooling unit parameters: pr, pc, ps;
4) Minimum calculation unit parameters: m;
5) The byte number A of the data cache pool;
6) Number of convolutions: i;
The comprehensive script calculates the input data buffer pool size AI, the output data buffer pool size AO, the buffer pool size AVi after the transfer of the data of each layer of characteristic map, the convolution kernel transfer amount size AVki and the data size APki of each layer after the pooling according to the parameters 1) -4). Because of the pipelining relation in the calculation process of each convolution layer, after the calculation of one layer of convolution is completed, the next layer of convolution is carried out; the maximum amount of data which can exist simultaneously in each layer is considered in the script algorithm, the size of the cache pool is obtained, and the algorithm is as follows: amax=ai+max (AVi, APki, AO) + AVKi, where i (1, nk), nk is the number of convolutional layers.
And the script completes the setting of the blocking parameter B according to the parameters 5 and Amax. The algorithm for the B parameter is as follows:
For the case that A > Amax, B=1 indicates that the system is not blocked, the data can be completely not stored in the FPGA, and the DMA only needs to complete once data reading and calculation result writing back in the calculation process; for the case of a < Amax, b=max (AVi, APki, AO)/a, representing the input data decomposed into B blocks, if B is calculated as a fraction, then b=int (B) +1.
The multiplier calculates the vector number v= MAX (Kri x Kci). This parameter determines the throughput requirements of the individual accelerator computing unit data inputs.
After the configuration is completed, running a script, and realizing an accelerator circuit matched with the current FPGA resource quantity through a comprehensive tool. After the circuit is comprehensively completed and deployed to the FPGA, the calculation process is as follows:
in embodiment one, for a > Amax, B <1 is not blocked.
6) The DMA integrally reads in the input feature map and the first layer convolution kernel, and puts the input feature map and the first layer convolution kernel into a calculation buffer pool;
7) Starting a feature map transposition vector unit, completing calculation vector transposition, and writing the result into a calculation buffer pool;
8) Starting a convolution kernel vector transposition unit, completing the transposition of a first convolution kernel vector, and writing the result into a calculation buffer pool;
9) According to the minimum calculation unit parameter M, the allocation of the convolution calculation data stream is realized, and the number of entries of each calculation unit reading vector is determined by the size of the pooling unit after one convolution calculation is finished, namely Prx Pc.
10 After the vector enters a minimum calculation unit, finishing the point multiplication of the input feature map vector and the convolution kernel vector;
11 The calculation results are accumulated in a minimum calculation unit;
12 Adding the accumulated result to the bias;
13 The addition result in the previous step is activated;
14 Repeating the actions of 10) -13) until the designated number of units are pooled to complete the calculation;
15 Starting pooling operation, outputting a calculation result, and entering a data cache pool until all vector calculation is completed;
16 Using data in the data cache pool as an input feature map, reading the next layer of convolution kernel data into the data cache pool by using DMA, and repeating the steps of 7) -16) to complete data calculation;
17 Until the convolution calculation of the specified layer number of the convolution layer number i is completed, the operation behavior of the accelerator is stopped.
In embodiment two, for the case of A < Amax, B >1 is blocked.
And the DMA integrally reads in the feature map, converts the feature map into feature vectors, and then writes the feature vectors back into the RAM outside the FPGA in blocks.
The first block vector is then read back into the calculation unit for preparation for calculation.
18 And (3) executing the actions in the steps 6) to 14), starting a pooling operation, completing the calculation of the input characteristic data of the current block, and entering a data cache pool until the calculation of the vector of the current block is completed. The data cache pool results are written back to external RAM.
19 Initiating DMA reading the second block vector into the calculation unit, completing the calculation process described in step 18).
20 Reading all calculation results of the first layer convolution according to the blocks, repeating 18), 19) the appointed action until the convolution calculation of the appointed layer number of i is completed, and stopping the operation action of the accelerator.
The convolution calculation process is multi-layer multiplexing, so that the storage multiplexing capacity in the accelerator can be enhanced; the invention can realize flexible configuration from small-sized FPGA to large-sized FPGA, and provides flexible hardware acceleration scheme for FPGAs with different level resources.
Although the present invention has been described in terms of the preferred embodiments, it is not intended to be limited to the embodiments, and any person skilled in the art can make any possible variations and modifications to the technical solution of the present invention by using the methods and technical matters disclosed above without departing from the spirit and scope of the present invention, so any simple modifications, equivalent variations and modifications to the embodiments described above according to the technical matters of the present invention are within the scope of the technical matters of the present invention.
Claims (8)
1. An underwater noise classification convolutional neural network accelerator, comprising: the device comprises a DMA controller, a feature map transpose vector unit, a convolution kernel transpose vector unit, a plurality of minimum calculation units and a data cache pool; wherein,
The DMA controller reads in the input characteristic diagram and the first layer convolution kernel; the feature map transposition vector unit completes calculation vector transposition according to the feature map to obtain a feature map vector, and writes the feature map vector into a data cache pool; the convolution kernel vector transferring unit is used for transferring a first convolution kernel vector according to the first layer of convolution kernels to obtain a convolution kernel vector, and writing the convolution kernel vector into a data cache pool; and each minimum calculation unit reads the feature map vector and the first convolution kernel vector in the data cache pool, performs dot multiplication on the feature map vector and the first convolution kernel vector to obtain a dot multiplication result, and stores the dot multiplication result in the data cache pool.
2. The underwater noise classification convolutional neural network accelerator of claim 1, wherein: the data cache pool size Amax is obtained by the following formula:
Amax=AI+MAX(AVi,APki,AO)+AVKi;
Wherein AI is the size of the input data buffer pool, AO is the size of the output data buffer pool, AVi is the size of the buffer pool after the data of each layer of characteristic map are diverted, AVki is the size of convolution kernel diverted, and APki is the size of each layer of data after pooling.
3. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: for the case that a > Amax, b=1 indicates that the system is not blocked, the data can be completely not stored in the FPGA, and the DMA controller only needs to complete once data reading and calculation result writing back in the calculation process; for the case of a < Amax, b=max (AVi, APki, AO)/a, representing the input data decomposed into B blocks, if B is calculated as a fraction, then b=int (B) +1.
4. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: and obtaining the number V of the calculated vectors of the multiplier according to the number Kri of the convolution kernel and the number Kci of the convolution kernel.
5. The underwater noise classification convolutional neural network accelerator of claim 4, wherein: the number of the calculated vectors of the multiplier is v= MAX (Kri x Kci).
6. The underwater noise classification convolutional neural network accelerator of claim 5, wherein: the number of multiplier calculation vectors V determines the throughput requirements of the data inputs of the individual accelerator calculation units.
7. The underwater noise classification convolutional neural network accelerator of claim 5, wherein: the minimum computation unit is the minimum unit of computation units, implementing vector multiplication, activation, and pooling operations.
8. The underwater noise classification convolutional neural network accelerator of claim 2, wherein: when the byte number A of the data cache pool is smaller than the size Amax of the data cache pool and the blocking parameter B is larger than 1, the DMA controller reads in the input feature map, converts the input feature map into a feature vector, and then blocks and writes the feature vector into a RAM outside the FPGA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110819180.6A CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110819180.6A CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113673690A CN113673690A (en) | 2021-11-19 |
CN113673690B true CN113673690B (en) | 2024-05-28 |
Family
ID=78539639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110819180.6A Active CN113673690B (en) | 2021-07-20 | 2021-07-20 | Underwater noise classification convolutional neural network accelerator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113673690B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
CN108875917A (en) * | 2018-06-28 | 2018-11-23 | 中国科学院计算技术研究所 | A kind of control method and device for convolutional neural networks processor |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110991632A (en) * | 2019-11-29 | 2020-04-10 | 电子科技大学 | Method for designing heterogeneous neural network computing accelerator based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
WO2020186703A1 (en) * | 2019-03-20 | 2020-09-24 | Huawei Technologies Co., Ltd. | Convolutional neural network-based image processing method and image processing apparatus |
CN112950656A (en) * | 2021-03-09 | 2021-06-11 | 北京工业大学 | Block convolution method for pre-reading data according to channel based on FPGA platform |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107844828B (en) * | 2017-12-18 | 2021-07-30 | 南京地平线机器人技术有限公司 | Convolution calculation method in neural network and electronic device |
-
2021
- 2021-07-20 CN CN202110819180.6A patent/CN113673690B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN107392309A (en) * | 2017-09-11 | 2017-11-24 | 东南大学—无锡集成电路技术研究所 | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA |
CN108875917A (en) * | 2018-06-28 | 2018-11-23 | 中国科学院计算技术研究所 | A kind of control method and device for convolutional neural networks processor |
CN109325591A (en) * | 2018-09-26 | 2019-02-12 | 中国科学院计算技术研究所 | Neural network processor towards Winograd convolution |
CN109934339A (en) * | 2019-03-06 | 2019-06-25 | 东南大学 | A kind of general convolutional neural networks accelerator based on a dimension systolic array |
WO2020186703A1 (en) * | 2019-03-20 | 2020-09-24 | Huawei Technologies Co., Ltd. | Convolutional neural network-based image processing method and image processing apparatus |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110991632A (en) * | 2019-11-29 | 2020-04-10 | 电子科技大学 | Method for designing heterogeneous neural network computing accelerator based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN112950656A (en) * | 2021-03-09 | 2021-06-11 | 北京工业大学 | Block convolution method for pre-reading data according to channel based on FPGA platform |
CN113077047A (en) * | 2021-04-08 | 2021-07-06 | 华南理工大学 | Convolutional neural network accelerator based on feature map sparsity |
Non-Patent Citations (5)
Title |
---|
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution;Bing Liu 等;《electronics》;20190303;第8卷(第3期);1-18 * |
卷积神经网络的FPGA算法加速研究与实现;谢尚港;《中国优秀硕士学位论文全文数据库信息科技辑》;20210215(第02期);I135-774 * |
基于FPGA的油棕检测和硬件加速设计及实现;袁鸣 等;《计算机科学与探索》;20200514;第15卷(第2期);315-326 * |
基于FPGA的遥感目标检测加速器的研究与实现;袁鸣;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20210115(第01期);C028-222 * |
多分支卷积神经网络的FPGA设计与优化;谢思璞 等;《嵌入式技术》;20210706;第47卷(第7期);97-101 * |
Also Published As
Publication number | Publication date |
---|---|
CN113673690A (en) | 2021-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109102065B (en) | Convolutional neural network accelerator based on PSoC | |
CN108647773B (en) | Hardware interconnection system capable of reconstructing convolutional neural network | |
CN110321997B (en) | High-parallelism computing platform, system and computing implementation method | |
CN111338695B (en) | Data processing method based on pipeline technology and related product | |
CN104899182A (en) | Matrix multiplication acceleration method for supporting variable blocks | |
KR20190107091A (en) | Calculation device and method | |
CN112219210B (en) | Signal processing device and signal processing method | |
CN111898733A (en) | Deep separable convolutional neural network accelerator architecture | |
US11544526B2 (en) | Computing device and method | |
CN110543936B (en) | Multi-parallel acceleration method for CNN full-connection layer operation | |
CN111126569B (en) | Convolutional neural network device supporting pruning sparse compression and calculation method | |
CN111160547B (en) | Device and method for artificial neural network operation | |
CN110874627B (en) | Data processing method, data processing device and computer readable medium | |
CN111353591A (en) | Computing device and related product | |
US11086574B2 (en) | Machine perception and dense algorithm integrated circuit | |
CN115310037A (en) | Matrix multiplication computing unit, acceleration unit, computing system and related method | |
CN113807998A (en) | Image processing method, target detection device, machine vision equipment and storage medium | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
Niu et al. | SPEC2: Spectral sparse CNN accelerator on FPGAs | |
CN114004351A (en) | Convolution neural network hardware acceleration platform | |
CN108960420B (en) | Processing method and acceleration device | |
CN113673690B (en) | Underwater noise classification convolutional neural network accelerator | |
CN111222090B (en) | Convolution calculation module, neural network processor, chip and electronic equipment | |
WO2021081854A1 (en) | Convolution operation circuit and convolution operation method | |
US11853868B2 (en) | Multi dimensional convolution in neural network processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |