CN111814675B - Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA - Google Patents
Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA Download PDFInfo
- Publication number
- CN111814675B CN111814675B CN202010652929.8A CN202010652929A CN111814675B CN 111814675 B CN111814675 B CN 111814675B CN 202010652929 A CN202010652929 A CN 202010652929A CN 111814675 B CN111814675 B CN 111814675B
- Authority
- CN
- China
- Prior art keywords
- module
- feature map
- line
- window
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/955—Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
Abstract
The invention relates to the technical field of image processing, in particular to a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA. The method comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, and then the whole convolution computation is completed in the window accumulating module. The invention is different from the prior art that the scheme only supports the characteristic diagram cache design with one resolution, and the invention automatically configures the parameters of the characteristic diagram cache on the FPGA according to the real-time resolution, and can be compatible with CNN network realization with multiple resolutions under the condition of no need of modifying codes.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA.
Background
The convolutional neural network (Convolutional Neural Networks, CNN) is an efficient identification method comprising convolutional calculation, and is one of the representative algorithms for deep learning. In recent years, the method is widely applied to various fields, such as automatic labeling algorithm, picture searching, commodity recommendation, searching framework and the like. However, the most classical popular case in these applications is to perform image processing. The CNN can obtain the final classification processing result of the picture by directly inputting the original characteristic picture instead of the complicated picture preprocessing stage. Because the data volume involved in CNN operation is large, the CNN operation is usually realized by adopting large-scale computer programming, and the CNN operation also brings the problems of high realization difficulty and high cost.
Just because of the unique computing mode of CNN, the implementation efficiency of the general processor is not high, and the performance requirement cannot be met. Accordingly, various accelerators based on field programmable gate arrays (Field Programmable Gate Array, FPGAs), graphics Processors (GPUs), and application specific integrated circuits (Application Specific Integrated Circuit, ASICs) have been proposed in recent years to improve the performance of CNNs. The three methods can be compared with each other in terms of performance, power consumption and flexibility, and can be seen in fig. 1, so that the advantages of good performance, high energy efficiency, quick development period and the like of the FPGA are combined, and the FPGA has more and more attention on CNN acceleration.
The implementation of CNN on FPGA needs to calculate large data volume, namely, needs to read and write data of large data volume. Because of limited memory resources on FPGAs, external memory Dynamic Random Access Memory (DRAM) typically exists for such as feature map data required in CNN computation, and performs read and write operations with FPGAs. Because of the different application scenarios of CNN networks, the input feature images often have various resolutions, which requires that CNNs be as adaptive to dynamic resolution situations as possible when implemented on FPGAs.
Disclosure of Invention
In view of the above technical problems, the present invention provides a convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA, which is different from the previous scheme that only supports feature map buffer design with one resolution.
The technical scheme adopted for solving the technical problems is as follows:
a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, the system comprising:
the device comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, then the whole convolution computation is completed in the window accumulating module, and the window accumulating module is connected with a feature map output module.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that the feature map assembly module specifically comprises:
the zero-filling module is used for performing zero-filling operation on the feature map;
the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output;
and the window assembly module is connected with the line connection buffer module, and outputs a characteristic diagram window according to the channel finally through disassembly and recombination after all data required in the main convolution calculation module are obtained in a periodical manner.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that a BRAM cache module is arranged in the line cache module.
In the technical scheme of the invention, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA is characterized in that a read-write controller is arranged in the line cache module and used for controlling read-write signals and read-write addresses of the line cache module and writing the read-write signals and the read-write addresses into the BRAM cache module.
The technical scheme has the following advantages or beneficial effects:
the invention provides a convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, which can cope with the situation of resolution change when a CNN network is realized, is compatible with various resolutions without modifying codes, is more convenient and quick, saves a line switching initialization period in a BRAM cache mode, further improves feature map assembly efficiency, and completely does not influence the efficiency of a main convolutional calculation module.
Drawings
The invention and its features, aspects and advantages will become more apparent from the detailed description of non-limiting embodiments with reference to the following drawings. Like numbers refer to like parts throughout. The drawings may not be to scale, emphasis instead being placed upon illustrating the principles of the invention.
Fig. 1 is a FPGA, GPU, ASIC comparative;
FIG. 2 is a schematic diagram of a convolutional neural network implementing a window accumulation scheme on an FPGA;
FIG. 3 is a schematic diagram of the overall design of a feature map assembly module;
fig. 4 is a schematic diagram of a BRAM cache module scheme in a line cache module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to meet the condition that the input feature images possibly have multiple different resolutions in the practical CNN application, when the CNN network is realized on the FPGA, the FPGA is used for realizing higher compatibility of the input of the feature images with different resolutions of the CNN network through the dynamic configurable design of the control parameters of the feature image assembly module.
As shown in fig. 2 and fig. 3, the convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA of the present invention includes a feature map assembly module and a weight loading module, the feature map assembly module and the weight loading module are respectively connected with a main convolutional calculation module, the main convolutional calculation module is connected with a window accumulation module, the main convolutional calculation module inputs a feature map window of the feature map assembly module through the weight loading module, channel accumulation is completed in the main convolutional calculation module, and then the whole convolutional calculation is completed in the window accumulation module.
In the technical scheme of the invention, the window accumulation module is connected with a feature map output module, and the feature map assembly module specifically comprises:
the zero-filling module is used for performing zero-filling operation on the feature map; the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output; and the window assembly module is connected with the line connection buffer module, and outputs a characteristic diagram window according to the channel finally through disassembly and recombination after all data required in the main convolution calculation module are obtained in a periodical manner. In the technical scheme of the invention, a BRAM cache module and a read-write controller are arranged in a line cache module, and the read-write controller is used for controlling read-write signals and read-write addresses of the line cache module and writing the read-write signals and the read-write addresses into the BRAM cache module.
The technical scheme of the invention requires that the feature diagram assembly module can provide a convolution calculation window for the main convolution calculation module in a mode with higher compatibility within the allowable range of DDR transmission efficiency. Taking convolution of the convolution kernel 3x3 as an example, that is, a 3x3 feature map window is required to be provided for a main convolution calculation module, the method can adapt to the channel number (c) with high (h) and wide (w) resolution of an input feature map, and the step size (stride) and w, h, c, stride are any settable values.
The whole feature map assembly module comprises:
1) Zero filling module:
in the standard convolution calculation process, the feature map is often required to be subjected to zero padding operation. Under the condition of dynamic resolution requirement, the input feature map size is assumed to be high (h) and wide (w) and the number of channels is assumed to be high (c), the DDR input bit width is 128 bits, the feature map data is 8 bits, and the input feature map data is input according to the sequence of hwc. The zero-filling position and the number of the zero-filling points can be confirmed in a counting mode, and zero-filling operation is performed. The first line and the line head are subjected to zero filling through data input of a first signal and a first signal mark after the line is ended, and the line end zero filling position and the zero filling number are calculated as follows:
RowEndPadding=w*c*8/128;
RowEndPaddingNum = line end zero padding column number x c x 8/128
2) Line cache module:
and when the convolution step length is smaller than the convolution kernel size, the efficient multiplexing of the data can be realized through read address control by taking the BRAM buffer module as a design main body. In the above example, the output of the read address according to the cycle is 0,1,2,1,2,3,2,3,4 … …, so as to improve the convolution calculation efficiency, a combination of row pushing and read-before-write is adopted. Namely, when only three BRAM buffer modules are used for respectively buffering three lines of data, new data are sequentially written while the required characteristic diagram data are read from the BRAM, and new data are sequentially written while the required characteristic diagram data are read from the BRAM. The cross-line data update utilizes BRAM, and next line data is read out from BRAM of the cached next line data and then written into the line data cache BRAM, as shown in fig. 4. To support multiple resolution input scenarios, the key to the present invention is that the parameter input scheme can be automatically configured. Through h, w and c, parameters required by module control can be calculated. The line cache is usually output by adopting a first-in first-out queue in the prior proposal, but the condition of wasting data and cycle at the line conversion is encountered, and the control and the modification are not easy. The BRAM plus control parameter scheme enables the support degree to the dynamic resolution to be higher on the one hand, and improves the window assembly efficiency on the other hand. In the implementation process of the scheme, according to different resolutions, the number (n) of windows actually needed is added, so that the number (GroupNum) of window groups of each row and the length (groupwength) of window groups of each row can be obtained:
GroupNum=(w+RowEndPadd ingNum-(3-stride)÷stride÷n
GroupLength=n*stride*c
under the condition that the window group number and the window length are obtained, the read-write position can be accurately obtained through counting and comparing logic, and the read-write state can still be accurately positioned under the condition that the resolution is changed, so that line caching, line switching and line data output are realized.
3) Window assembly module:
and (3) outputting by a line connection buffer module, periodically obtaining all data required in the main convolution calculation module, and finally outputting a characteristic diagram window according to the channel through number disassembly and recombination. The window assembly mode can be determined by dynamically calculating the number of columns (WindowColumnNum) parameters required for each output window, thereby adapting to the multi-resolution situation.
WindowColumnNum=(n*stride+(3-stride))
The three main modules are combined to realize that the characteristic diagram data input into the FPGA according to the hwc sequence is converted into window data required by a convolution calculation module, so that the multi-resolution requirement is met, and the efficiency can completely meet the performance requirement within the design range of the scheme.
The invention can cope with the situation of resolution change when the CNN network is realized by controlling the dynamic afferent mode of the parameters (h, w, c, stride, n). Under the condition that the code does not need to be modified, the method is compatible with various resolutions and is more convenient and quick. Meanwhile, compared with other schemes, the BRAM caching mode adopts a first-in first-out queue (FIFO) to cache in a special place of line switching, so that the line switching initialization period is saved, the feature diagram assembly efficiency is further improved, and the convolution calculation main module efficiency is not affected.
Those skilled in the art will understand that the variations may be implemented in combination with the prior art and the above embodiments, and are not described herein. Such modifications do not affect the essence of the present invention, and are not described herein.
The preferred embodiments of the present invention have been described above. It is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention, using the methods and technical contents disclosed above, without affecting the essential content of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.
Claims (3)
1. A convolutional neural network feature map assembly system supporting dynamic resolution based on an FPGA, the system comprising:
the device comprises a feature map assembling module and a weight loading module, wherein the feature map assembling module and the weight loading module are respectively connected with a main convolution computing module, the main convolution computing module is connected with a window accumulating module, the main convolution computing module inputs a feature map window of the feature map assembling module through the weight loading module, channel accumulation is completed in the main convolution computing module, then the whole convolution computation is completed in the window accumulating module, and the window accumulating module is connected with a feature map output module;
the feature map assembling module specifically comprises: the zero-filling module is used for performing zero-filling operation on the feature map; the line cache module is connected with the zero padding module and is used for realizing characteristic diagram line cache, line switching and line data output; the window assembly module is connected with the line connection cache module, and after all data required in the main convolution calculation module are obtained in a periodical manner, the window assembly module outputs a characteristic diagram window according to the channel through disassembly and recombination;
in the zero-filling module, assuming that the size of an input feature map is high h, the width w, the channel number c, the DDR input bit width is 128 bits, the feature map data is 8 bits, and the feature map data is sequentially input according to hwc, the zero-filling position and the number of zero-filling points can be confirmed in a counting mode, zero-filling operation is performed, the first line and the line head zero-filling operation perform zero-filling through the first signal mark after the first signal and the line end are input through data, and the calculation formulas of the line end zero-filling position and the zero-filling number are as follows:
RowEndPadding=w*c*8/128;
RowEndPaddingNum = line end zero padding column number x c x 8/128
In the line buffer module, according to different resolutions, the number n of windows actually needed is added, so that the number GroupNum of window groups in each line and the length groupwength of window groups in each line can be obtained:
GroupNum=(w+RowEndPaddingNum-(3-stride)÷stride÷n
GroupLength=n*stride*c
in the window assembly module, the line connection buffer module outputs, all data required in the main convolution calculation module are obtained in a divided period, the window assembly mode can be determined by dynamically calculating the required column number windows column number Num parameter of each output window according to the channel output characteristic diagram window finally through disassembly and recombination, so that the multi-resolution situation is adapted, and the formula is as follows:
WindowColumnNum=(n*stride+(3-stride))。
2. the convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA of claim 1, wherein a BRAM cache module is provided in the line cache module.
3. The convolutional neural network feature map assembly system supporting dynamic resolution based on the FPGA of claim 2, wherein a read-write controller is disposed in the line buffer module, and the read-write controller is configured to control read-write signals and read-write addresses of the line buffer module and write the read-write signals and the read-write addresses into the BRAM buffer module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010652929.8A CN111814675B (en) | 2020-07-08 | 2020-07-08 | Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010652929.8A CN111814675B (en) | 2020-07-08 | 2020-07-08 | Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814675A CN111814675A (en) | 2020-10-23 |
CN111814675B true CN111814675B (en) | 2023-09-29 |
Family
ID=72842626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010652929.8A Active CN111814675B (en) | 2020-07-08 | 2020-07-08 | Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814675B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359662B (en) * | 2021-12-24 | 2023-06-13 | 江苏大学 | Implementation method of convolutional neural network based on heterogeneous FPGA and fusion multi-resolution |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104683265A (en) * | 2015-02-27 | 2015-06-03 | 南京中新赛克科技有限责任公司 | Accurate high-capacity packet counting method for 100G interfaces |
CN107622226A (en) * | 2017-08-27 | 2018-01-23 | 南京理工大学 | Vehicle checking method and system based on improved deformable part model algorithm |
CN109214504A (en) * | 2018-08-24 | 2019-01-15 | 北京邮电大学深圳研究院 | A kind of YOLO network forward inference accelerator design method based on FPGA |
CN109272113A (en) * | 2018-09-13 | 2019-01-25 | 深思考人工智能机器人科技(北京)有限公司 | A kind of convolutional neural networks establish device and method |
CN109416756A (en) * | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Acoustic convolver and its applied artificial intelligence process device |
CN109948777A (en) * | 2018-11-14 | 2019-06-28 | 深圳大学 | The implementation method of convolutional neural networks is realized based on the FPGA convolutional neural networks realized and based on FPGA |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110191330A (en) * | 2019-06-13 | 2019-08-30 | 内蒙古大学 | Depth map FPGA implementation method and system based on binocular vision green crop video flowing |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9020276B2 (en) * | 2012-04-24 | 2015-04-28 | Stmicroelectronics S.R.L. | Hardware coprocessor for stripe-based interest point detection |
CN110058883B (en) * | 2019-03-14 | 2023-06-16 | 梁磊 | CNN acceleration method and system based on OPU |
-
2020
- 2020-07-08 CN CN202010652929.8A patent/CN111814675B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104683265A (en) * | 2015-02-27 | 2015-06-03 | 南京中新赛克科技有限责任公司 | Accurate high-capacity packet counting method for 100G interfaces |
CN107622226A (en) * | 2017-08-27 | 2018-01-23 | 南京理工大学 | Vehicle checking method and system based on improved deformable part model algorithm |
CN109416756A (en) * | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Acoustic convolver and its applied artificial intelligence process device |
CN109214504A (en) * | 2018-08-24 | 2019-01-15 | 北京邮电大学深圳研究院 | A kind of YOLO network forward inference accelerator design method based on FPGA |
CN109272113A (en) * | 2018-09-13 | 2019-01-25 | 深思考人工智能机器人科技(北京)有限公司 | A kind of convolutional neural networks establish device and method |
CN109948777A (en) * | 2018-11-14 | 2019-06-28 | 深圳大学 | The implementation method of convolutional neural networks is realized based on the FPGA convolutional neural networks realized and based on FPGA |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110191330A (en) * | 2019-06-13 | 2019-08-30 | 内蒙古大学 | Depth map FPGA implementation method and system based on binocular vision green crop video flowing |
Non-Patent Citations (4)
Title |
---|
A survey of FPGA-based accelerators for convolutional neural networks;Mittal S等;《Neural computing and applications》;第32卷(第4期);1109-1139 * |
Optimizing CNN-based object detection algorithms on embedded FPGA platforms;Zhao R等;《Applied Reconfigurable Computing: 13th International Symposium》;255-267 * |
基于FPGA动态重构的卷积神经网络硬件架构设计;何凯旋等;《信息技术与网络安全》;第38卷(第3期);77-81 * |
基于FPGA的图像卷积IP核的设计与实现;朱学亮等;《微电子学与计算机》;第28卷(第6期);188-192 * |
Also Published As
Publication number | Publication date |
---|---|
CN111814675A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106021182B (en) | A kind of row transposition architecture design method based on Two-dimensional FFT processor | |
US11709911B2 (en) | Energy-efficient memory systems and methods | |
CN108388527B (en) | Direct memory access engine and method thereof | |
CN102890427B (en) | Method for preparing skewed data in field programmable gate array (FPGA) of direct-writing type photoetching system | |
JP4846306B2 (en) | Semiconductor memory device, semiconductor integrated circuit system using the same, and method for controlling semiconductor memory device | |
CN110825312A (en) | Data processing device, artificial intelligence chip and electronic equipment | |
US11550586B2 (en) | Method and tensor traversal engine for strided memory access during execution of neural networks | |
CN112005251A (en) | Arithmetic processing device | |
CN111814675B (en) | Convolutional neural network feature map assembly system supporting dynamic resolution based on FPGA | |
CN114092338A (en) | Image zooming fast calculation method | |
CN104869284A (en) | High-efficiency FPGA implementation method and device for bilinear interpolation amplification algorithm | |
CN114780910B (en) | Hardware system and calculation method for sparse convolution calculation | |
US8581918B2 (en) | Method and system for efficiently organizing data in memory | |
CN110490312B (en) | Pooling calculation method and circuit | |
US11443185B2 (en) | Memory chip capable of performing artificial intelligence operation and method thereof | |
KR102306252B1 (en) | Apparatus and method for transforming matrix and data processing system | |
CN115995249B (en) | Matrix transposition operation device based on DRAM | |
CN112396072A (en) | Image classification acceleration method and device based on ASIC and VGG16 | |
CN113034344B (en) | Two-dimensional FFT method with low memory resource overhead | |
CN116150055B (en) | Data access method and device based on-chip cache and transposition method and device | |
CN114741352B (en) | FPGA-based bilinear interpolation resampling implementation method and device | |
US20230307036A1 (en) | Storage and Accessing Methods for Parameters in Streaming AI Accelerator Chip | |
JPS6125192B2 (en) | ||
CN114840470A (en) | Dimension transformation device friendly to on-chip cache and neural network processor | |
CN117216459A (en) | Convolution operation method, convolution operation device, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |