CN111814675B - Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA - Google Patents

Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA Download PDF

Info

Publication number
CN111814675B
CN111814675B CN202010652929.8A CN202010652929A CN111814675B CN 111814675 B CN111814675 B CN 111814675B CN 202010652929 A CN202010652929 A CN 202010652929A CN 111814675 B CN111814675 B CN 111814675B
Authority
CN
China
Prior art keywords
module
feature map
line
window
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010652929.8A
Other languages
Chinese (zh)
Other versions
CN111814675A (en
Inventor
郭静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Technology Co ltd
Original Assignee
Shanghai Xuehu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Technology Co ltd filed Critical Shanghai Xuehu Technology Co ltd
Priority to CN202010652929.8A priority Critical patent/CN111814675B/en
Publication of CN111814675A publication Critical patent/CN111814675A/en
Application granted granted Critical
Publication of CN111814675B publication Critical patent/CN111814675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Abstract

The invention relates to the technical field of image processing, in particular to a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA. The method comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, and then the whole convolution computation is completed in the window accumulating module. The invention is different from the prior art that the scheme only supports the characteristic diagram cache design with one resolution, and the invention automatically configures the parameters of the characteristic diagram cache on the FPGA according to the real-time resolution, and can be compatible with CNN network realization with multiple resolutions under the condition of no need of modifying codes.

Description

Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA
Technical Field
The invention relates to the technical field of image processing, in particular to a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA.
Background
The convolutional neural network (Convolutional Neural Networks, CNN) is an efficient identification method comprising convolutional calculation, and is one of the representative algorithms for deep learning. In recent years, the method is widely applied to various fields, such as automatic labeling algorithm, picture searching, commodity recommendation, searching framework and the like. However, the most classical popular case in these applications is to perform image processing. The CNN can obtain the final classification processing result of the picture by directly inputting the original characteristic picture instead of the complicated picture preprocessing stage. Because the data volume involved in CNN operation is large, the CNN operation is usually realized by adopting large-scale computer programming, and the CNN operation also brings the problems of high realization difficulty and high cost.
Just because of the unique computing mode of CNN, the implementation efficiency of the general processor is not high, and the performance requirement cannot be met. Accordingly, various accelerators based on field programmable gate arrays (Field Programmable Gate Array, FPGAs), graphics Processors (GPUs), and application specific integrated circuits (Application Specific Integrated Circuit, ASICs) have been proposed in recent years to improve the performance of CNNs. The three methods can be compared with each other in terms of performance, power consumption and flexibility, and can be seen in fig. 1, so that the advantages of good performance, high energy efficiency, quick development period and the like of the FPGA are combined, and the FPGA has more and more attention on CNN acceleration.
The implementation of CNN on FPGA needs to calculate large data volume, namely, needs to read and write data of large data volume. Because of limited memory resources on FPGAs, external memory Dynamic Random Access Memory (DRAM) typically exists for such as feature map data required in CNN computation, and performs read and write operations with FPGAs. Because of the different application scenarios of CNN networks, the input feature images often have various resolutions, which requires that CNNs be as adaptive to dynamic resolution situations as possible when implemented on FPGAs.
Disclosure of Invention
In view of the above technical problems, the present invention provides a convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA, which is different from the previous scheme that only supports feature map buffer design with one resolution.
The technical scheme adopted for solving the technical problems is as follows:
a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, the system comprising:
the device comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, then the whole convolution computation is completed in the window accumulating module, and the window accumulating module is connected with a feature map output module.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that the feature map assembly module specifically comprises:
the zero-filling module is used for performing zero-filling operation on the feature map;
the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output;
and the window assembly module is connected with the line connection buffer module, and outputs a characteristic diagram window according to the channel finally through disassembly and recombination after all data required in the main convolution calculation module are obtained in a periodical manner.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that a BRAM cache module is arranged in the line cache module.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that a read-write controller is arranged in the line cache module and used for controlling read-write signals and read-write addresses of the line cache module and writing the read-write signals and the read-write addresses into the BRAM cache module.
The technical scheme has the following advantages or beneficial effects:
the invention provides a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, which can cope with the situation of resolution change when a CNN network is realized, is compatible with various resolutions without modifying codes, is more convenient and quick, saves a line switching initialization period in a BRAM cache mode, further improves feature map assembly efficiency, and completely does not influence the efficiency of a main convolutional calculation module.
Drawings
The invention and its features, aspects and advantages will become more apparent from the detailed description of non-limiting embodiments with reference to the following drawings. Like numbers refer to like parts throughout. The drawings may not be to scale, emphasis instead being placed upon illustrating the principles of the invention.
Fig. 1 is a FPGA, GPU, ASIC comparative;
FIG. 2 is a schematic diagram of a convolutional neural network implementing a window accumulation scheme on an FPGA;
FIG. 3 is a schematic diagram of the overall design of a feature map assembly module;
fig. 4 is a schematic diagram of a BRAM cache module scheme in a line cache module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to meet the condition that the input feature images possibly have multiple different resolutions in the practical CNN application, when the CNN network is realized on the FPGA, the FPGA is used for realizing higher compatibility of the input of the feature images with different resolutions of the CNN network through the dynamic configurable design of the control parameters of the feature image assembly module.
As shown in fig. 2 and fig. 3, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA of the present invention includes a feature map assembly module and a weight loading module, the feature map assembly module and the weight loading module are respectively connected with a main convolutional calculation module, the main convolutional calculation module is connected with a window accumulation module, the main convolutional calculation module inputs a feature map window of the feature map assembly module through the weight loading module, channel accumulation is completed in the main convolutional calculation module, and then the whole convolutional calculation is completed in the window accumulation module.
In the technical scheme of the invention, the window accumulation module is connected with a feature map output module, and the feature map assembly module specifically comprises:
the zero-filling module is used for performing zero-filling operation on the feature map; the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output; and the window assembly module is connected with the line connection buffer module, and outputs a characteristic diagram window according to the channel finally through disassembly and recombination after all data required in the main convolution calculation module are obtained in a periodical manner. In the technical scheme of the invention, a BRAM cache module and a read-write controller are arranged in a line cache module, and the read-write controller is used for controlling read-write signals and read-write addresses of the line cache module and writing the read-write signals and the read-write addresses into the BRAM cache module.
The technical scheme of the invention requires that the feature diagram assembly module can provide a convolution calculation window for the main convolution calculation module in a mode with higher compatibility within the allowable range of DDR transmission efficiency. Taking convolution of the convolution kernel 3x3 as an example, that is, a 3x3 feature map window is required to be provided for a main convolution calculation module, the method can adapt to the channel number (c) with high (h) and wide (w) resolution of an input feature map, and the step size (stride) and w, h, c, stride are any settable values.
The whole feature map assembly module comprises:
1) Zero filling module:
in the standard convolution calculation process, the feature map is often required to be subjected to zero padding operation. Under the condition of dynamic resolution requirement, the input feature map size is assumed to be high (h) and wide (w) and the number of channels is assumed to be high (c), the DDR input bit width is 128 bits, the feature map data is 8 bits, and the input feature map data is input according to the sequence of hwc. The zero-filling position and the number of the zero-filling points can be confirmed in a counting mode, and zero-filling operation is performed. The first line and the line head are subjected to zero filling through data input of a first signal and a first signal mark after the line is ended, and the line end zero filling position and the zero filling number are calculated as follows:
RowEndPadding=w*c*8/128;
RowEndPaddingNum = line end zero padding column number x c x 8/128
2) Line cache module:
and when the convolution step length is smaller than the convolution kernel size, the efficient multiplexing of the data can be realized through read address control by taking the BRAM buffer module as a design main body. In the above example, the output of the read address according to the cycle is 0,1,2,1,2,3,2,3,4 … …, so as to improve the convolution calculation efficiency, a combination of row pushing and read-before-write is adopted. Namely, when only three BRAM buffer modules are used for respectively buffering three lines of data, new data are sequentially written while the required characteristic diagram data are read from the BRAM, and new data are sequentially written while the required characteristic diagram data are read from the BRAM. The cross-line data update utilizes BRAM, and next line data is read out from BRAM of the cached next line data and then written into the line data cache BRAM, as shown in fig. 4. To support multiple resolution input scenarios, the key to the present invention is that the parameter input scheme can be automatically configured. Through h, w and c, parameters required by module control can be calculated. The line cache is usually output by adopting a first-in first-out queue in the prior proposal, but the condition of wasting data and cycle at the line conversion is encountered, and the control and the modification are not easy. The BRAM plus control parameter scheme enables the support degree to the dynamic resolution to be higher on the one hand, and improves the window assembly efficiency on the other hand. In the implementation process of the scheme, according to different resolutions, the number (n) of windows actually needed is added, so that the number (GroupNum) of window groups of each row and the length (groupwength) of window groups of each row can be obtained:
GroupNum=(w+RowEndPadd ingNum-(3-stride)÷stride÷n
GroupLength=n*stride*c
under the condition that the window group number and the window length are obtained, the read-write position can be accurately obtained through counting and comparing logic, and the read-write state can still be accurately positioned under the condition that the resolution is changed, so that line caching, line switching and line data output are realized.
3) Window assembly module:
and (3) outputting by a line connection buffer module, periodically obtaining all data required in the main convolution calculation module, and finally outputting a characteristic diagram window according to the channel through number disassembly and recombination. The window assembly mode can be determined by dynamically calculating the number of columns (WindowColumnNum) parameters required for each output window, thereby adapting to the multi-resolution situation.
WindowColumnNum=(n*stride+(3-stride))
The three main modules are combined to realize that the characteristic diagram data input into the FPGA according to the hwc sequence is converted into window data required by a convolution calculation module, so that the multi-resolution requirement is met, and the efficiency can completely meet the performance requirement within the design range of the scheme.
The invention can cope with the situation of resolution change when the CNN network is realized by controlling the dynamic afferent mode of the parameters (h, w, c, stride, n). Under the condition that the code does not need to be modified, the method is compatible with various resolutions and is more convenient and quick. Meanwhile, compared with other schemes, the BRAM caching mode adopts a first-in first-out queue (FIFO) to cache in a special place of line switching, so that the line switching initialization period is saved, the feature diagram assembly efficiency is further improved, and the convolution calculation main module efficiency is not affected.
Those skilled in the art will understand that the variations may be implemented in combination with the prior art and the above embodiments, and are not described herein. Such modifications do not affect the essence of the present invention, and are not described herein.
The preferred embodiments of the present invention have been described above. It is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention, using the methods and technical contents disclosed above, without affecting the essential content of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (3)

1. A convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, the system comprising:
the device comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, then the whole convolution computation is completed in the window accumulating module, and the window accumulating module is connected with a feature map output module;
the feature map assembling module specifically comprises: the zero-filling module is used for performing zero-filling operation on the feature map; the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output; the window assembly module is connected with the line connection cache module, and after all data required in the main convolution calculation module are obtained in a periodical manner, the window assembly module outputs a characteristic diagram window according to the channel through disassembly and recombination;
in the zero-filling module, assuming that the size of an input feature map is high h, the width w, the channel number c, the DDR input bit width is 128 bits, the feature map data is 8 bits, and the feature map data is sequentially input according to hwc, the zero-filling position and the number of zero-filling points can be confirmed in a counting mode, zero-filling operation is performed, the first line and the line head zero-filling operation perform zero-filling through the first signal mark after the first signal and the line end are input through data, and the calculation formulas of the line end zero-filling position and the zero-filling number are as follows:
RowEndPadding=w*c*8/128;
RowEndPaddingNum = line end zero padding column number x c x 8/128
In the line buffer module, according to different resolutions, the number n of windows actually needed is added, so that the number GroupNum of window groups in each line and the length groupwength of window groups in each line can be obtained:
GroupNum=(w+RowEndPaddingNum-(3-stride)÷stride÷n
GroupLength=n*stride*c
in the window assembly module, the line connection buffer module outputs, all data required in the main convolution calculation module are obtained in a divided period, the window assembly mode can be determined by dynamically calculating the required column number windows column number Num parameter of each output window according to the channel output characteristic diagram window finally through disassembly and recombination, so that the multi-resolution situation is adapted, and the formula is as follows:
WindowColumnNum=(n*stride+(3-stride))。
2. the convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA of claim 1, wherein a BRAM cache module is provided in the line cache module.
3. The convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA of claim 2, wherein a read-write controller is disposed in the line buffer module, and the read-write controller is configured to control read-write signals and read-write addresses of the line buffer module and write the read-write signals and the read-write addresses into the BRAM buffer module.
CN202010652929.8A 2020-07-08 2020-07-08 Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA Active CN111814675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010652929.8A CN111814675B (en) 2020-07-08 2020-07-08 Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010652929.8A CN111814675B (en) 2020-07-08 2020-07-08 Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA

Publications (2)

Publication Number Publication Date
CN111814675A CN111814675A (en) 2020-10-23
CN111814675B true CN111814675B (en) 2023-09-29

Family

ID=72842626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010652929.8A Active CN111814675B (en) 2020-07-08 2020-07-08 Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA

Country Status (1)

Country Link
CN (1) CN111814675B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359662B (en) * 2021-12-24 2023-06-13 江苏大学 Implementation method of convolutional neural network based on heterogeneous FPGA and fusion multi-resolution

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683265A (en) * 2015-02-27 2015-06-03 南京中新赛克科技有限责任公司 Accurate high-capacity packet counting method for 100G interfaces
CN107622226A (en) * 2017-08-27 2018-01-23 南京理工大学 Vehicle checking method and system based on improved deformable part model algorithm
CN109214504A (en) * 2018-08-24 2019-01-15 北京邮电大学深圳研究院 A kind of YOLO network forward inference accelerator design method based on FPGA
CN109272113A (en) * 2018-09-13 2019-01-25 深思考人工智能机器人科技(北京)有限公司 A kind of convolutional neural networks establish device and method
CN109416756A (en) * 2018-01-15 2019-03-01 深圳鲲云信息科技有限公司 Acoustic convolver and its applied artificial intelligence process device
CN109948777A (en) * 2018-11-14 2019-06-28 深圳大学 The implementation method of convolutional neural networks is realized based on the FPGA convolutional neural networks realized and based on FPGA
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110191330A (en) * 2019-06-13 2019-08-30 内蒙古大学 Depth map FPGA implementation method and system based on binocular vision green crop video flowing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020276B2 (en) * 2012-04-24 2015-04-28 Stmicroelectronics S.R.L. Hardware coprocessor for stripe-based interest point detection
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683265A (en) * 2015-02-27 2015-06-03 南京中新赛克科技有限责任公司 Accurate high-capacity packet counting method for 100G interfaces
CN107622226A (en) * 2017-08-27 2018-01-23 南京理工大学 Vehicle checking method and system based on improved deformable part model algorithm
CN109416756A (en) * 2018-01-15 2019-03-01 深圳鲲云信息科技有限公司 Acoustic convolver and its applied artificial intelligence process device
CN109214504A (en) * 2018-08-24 2019-01-15 北京邮电大学深圳研究院 A kind of YOLO network forward inference accelerator design method based on FPGA
CN109272113A (en) * 2018-09-13 2019-01-25 深思考人工智能机器人科技(北京)有限公司 A kind of convolutional neural networks establish device and method
CN109948777A (en) * 2018-11-14 2019-06-28 深圳大学 The implementation method of convolutional neural networks is realized based on the FPGA convolutional neural networks realized and based on FPGA
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110191330A (en) * 2019-06-13 2019-08-30 内蒙古大学 Depth map FPGA implementation method and system based on binocular vision green crop video flowing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A survey of FPGA-based accelerators for convolutional neural networks;Mittal S等;《Neural computing and applications》;第32卷(第4期);1109-1139 *
Optimizing CNN-based object detection algorithms on embedded FPGA platforms;Zhao R等;《Applied Reconfigurable Computing: 13th International Symposium》;255-267 *
基于FPGA动态重构的卷积神经网络硬件架构设计;何凯旋等;《信息技术与网络安全》;第38卷(第3期);77-81 *
基于FPGA的图像卷积IP核的设计与实现;朱学亮等;《微电子学与计算机》;第28卷(第6期);188-192 *

Also Published As

Publication number Publication date
CN111814675A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN106021182B (en) A kind of row transposition architecture design method based on Two-dimensional FFT processor
US11709911B2 (en) Energy-efficient memory systems and methods
CN108388527B (en) Direct memory access engine and method thereof
CN102890427B (en) Method for preparing skewed data in field programmable gate array (FPGA) of direct-writing type photoetching system
JP4846306B2 (en) Semiconductor memory device, semiconductor integrated circuit system using the same, and method for controlling semiconductor memory device
CN110825312A (en) Data processing device, artificial intelligence chip and electronic equipment
US11550586B2 (en) Method and tensor traversal engine for strided memory access during execution of neural networks
CN112005251A (en) Arithmetic processing device
CN111814675B (en) Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA
CN114092338A (en) Image zooming fast calculation method
CN104869284A (en) High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm
CN114780910B (en) Hardware system and calculation method for sparse convolution calculation
US8581918B2 (en) Method and system for efficiently organizing data in memory
CN110490312B (en) Pooling calculation method and circuit
US11443185B2 (en) Memory chip capable of performing artificial intelligence operation and method thereof
KR102306252B1 (en) Apparatus and method for transforming matrix and data processing system
CN115995249B (en) Matrix transposition operation device based on DRAM
CN112396072A (en) Image classification acceleration method and device based on ASIC and VGG16
CN113034344B (en) Two-dimensional FFT method with low memory resource overhead
CN116150055B (en) Data access method and device based on-chip cache and transposition method and device
CN114741352B (en) FPGA-based bilinear interpolation resampling implementation method and device
US20230307036A1 (en) Storage and Accessing Methods for Parameters in Streaming AI Accelerator Chip
JPS6125192B2 (en)
CN114840470A (en) Dimension transformation device friendly to on-chip cache and neural network processor
CN117216459A (en) Convolution operation method, convolution operation device, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant