CN115080503A - Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping - Google Patents
Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping Download PDFInfo
- Publication number
- CN115080503A CN115080503A CN202210894357.3A CN202210894357A CN115080503A CN 115080503 A CN115080503 A CN 115080503A CN 202210894357 A CN202210894357 A CN 202210894357A CN 115080503 A CN115080503 A CN 115080503A
- Authority
- CN
- China
- Prior art keywords
- processing unit
- reconfigurable
- fft
- reconfigurable processing
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
- G06F15/7871—Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
Abstract
The invention relates to a systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping, which comprises: the reconfigurable processing unit array, the shared memory, the main controller and the on-chip memory; the reconfigurable processing unit array is responsible for FFT operation and comprises m multiplied by n reconfigurable processing units; the main controller is used for analyzing the configuration packet and writing configuration information into a configuration memory in each reconfigurable processing unit, the reconfigurable processing units execute corresponding operations under the dual drive of data flow and configuration flow, each reconfigurable processing unit and the interconnection among the reconfigurable processing units can be configured independently, and the reconfigurable processing unit array can be dynamically divided into subarrays for algorithm level parallel processing to realize acceleration; the shared memory is a plurality of groups of memories and mainly has two functions, namely, the shared memory is responsible for data interaction with the on-chip memory, and the shared memory stores intermediate data generated by each stage of FFT operation; the on-chip memory is used for storing programs, configuration information and data.
Description
Technical Field
The invention relates to the field of computer systems, in particular to a multi-level storage structure systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping.
Background
With the rapid development of information technology, the demand for signal processing capability in computationally intensive fields such as computers, communications, consumer electronics, and the like is increasing. As an important means for analyzing and processing digital signals, Fast Fourier Transform (FFT) is widely used. However, the FFT algorithm is computationally expensive and time consuming to implement, and particularly in the fields of scientific computing, image processing, etc., fixed point data cannot meet the precision requirement, and a floating point format is required, so that a large number of floating point complex multiplications bring about a great computational burden. In the era of everything interconnection, the calculation efficiency is one of the important standards for measuring the system performance, and the lack of the calculation efficiency leads the compromise of the system scheme in the aspects of precision, real-time performance and the like. At present, emerging application scenes and requirements are continuously emerging, the number of corresponding FFT (fast Fourier transform) operations is different, and higher requirements are provided for system flexibility. Therefore, the realization of the FFT accelerator with high calculation efficiency and strong flexibility is significant.
The existing FFT acceleration methods are mainly divided into two categories:
(1) software optimization based enhancement method
The software optimization-based improvement method is generally realized on general platforms such as a CPU and a GPU and is established on the basis of deep understanding of a target platform pipeline mechanism and a memory architecture. Although such methods have been highly optimized on target platforms, they are limited by the inherent memory access patterns and still are computationally inefficient.
(2) Method based on hardware special design
Hardware-based methods are typically implemented as FPGAs or ASICs. The hardware-based approach may achieve higher performance since the storage architecture may be specifically designed. By virtue of the parallel characteristic, the FPGA is considered as the most promising solution for the first time, but the FPGA has high energy consumption and cannot meet the requirement of power consumption sensitive application. The ASIC-based scheme has high area efficiency and energy efficiency, but due to the solidified circuit function, only supports single application, has insufficient flexibility and high design cost, and cannot adapt to the iteration speed of emerging applications.
In summary, the above solutions cannot satisfy multiple requirements of calculation, area and energy efficiency, real-time performance, and flexibility at the same time.
Disclosure of Invention
In order to solve the problem, the invention provides a multi-level storage structure systolic array reconfigurable processor aiming at FFT base module mapping, and the dynamic reconfigurable processor architecture CGRA is used for realizing FFT acceleration. The CGRA tool chain adopts a high-level language (such as C or C + +), so that the development period can be shortened. The reconfigurable unit provides a plurality of layers of flexibility and parallelism for the CGRA. In addition, CGRA is superior to fine-grained FPGAs in both energy and area efficiency.
The technical scheme of the invention is as follows: a systolic array reconfigurable processor for FFT-based block mapping, comprising:
the reconfigurable processing unit array, the main controller, the shared memory and the on-chip memory;
the reconfigurable processing unit array is responsible for FFT operation and comprises m multiplied by n reconfigurable processing units, wherein m is the number of rows and n is the number of columns;
the main controller is used for analyzing the configuration packet and writing configuration information into a configuration memory in each reconfigurable processing unit, the reconfigurable processing units execute corresponding operations under the dual drive of data flow and configuration flow, each reconfigurable processing unit and the interconnection among the reconfigurable processing units can be configured independently, and the reconfigurable processing unit array can be dynamically divided into subarrays for algorithm level parallel processing to realize acceleration;
the shared memory comprises a plurality of groups of memories and is used for carrying out data interaction with the on-chip memory and storing intermediate data generated by each stage of FFT operation;
the on-chip memory includes global and local registers for storing programs, configuration information and data.
On the other hand, for the systolic array processor mapped by the FFT base module, the method for executing the operation processing comprises the following steps:
firstly, a main controller moves original data from an on-chip memory to a shared memory; after the data preparation is finished, the main controller analyzes the configuration words and writes the configuration information of each reconfigurable unit into a corresponding local register; after all data and configuration information are prepared, initializing a timer and starting the reconfigurable processing unit array;
secondly, reading configuration information by the reconfigurable processing unit array, and determining iteration times; the method comprises the following steps that a part of reconfigurable processing units read original data from a shared memory, each reconfigurable processing unit reads corresponding configuration information and executes specified operation, and once iteration of a reconfigurable processing unit array is finished after all reconfigurable processing units complete operation; continuing to execute until all iterations are completed; stopping the timer and recording the number of clock cycles; during the FFT operation, intermediate data generated by each stage of FFT operation is stored in a shared memory;
and thirdly, writing the FFT operation result into the shared memory by the partial reconfigurable processing unit, and then writing the FFT operation result into the on-chip memory.
Has the beneficial effects that:
the invention provides a multi-level storage structure pulse array reconfigurable processor for mapping of an FFT (fast Fourier transform) base module, which can effectively improve the high calculation efficiency of FFT (fast Fourier transform) operation in a floating point number format and particularly can meet the application requirements of high precision and strong real-time performance; by simply increasing the capacity of the shared memory, the FFT operation with larger points can be processed under the condition of not changing other hardware modules, and the expandability is strong.
Drawings
FIG. 1 is a block diagram of a reconfigurable processor architecture according to the present invention;
FIG. 2 is a base 4 arithmetic core mapping module;
fig. 3 is a diagram of a subarray-based multi-point FFT mapping.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
According to the embodiment of the invention, a multi-layer storage structure systolic array reconfigurable processor for mapping FFT base modules is provided, the invention optimizes FFT algorithm characteristics and memory access bandwidth, the method provides an FFT hardware acceleration scheme with high calculation efficiency and strong expandability on the whole, and the reconfigurable processor comprises:
the reconfigurable processing unit array, the shared memory, the main controller, the on-chip memory and other main modules are as shown in fig. 1.
The reconfigurable processing unit array is responsible for FFT operation and comprises m multiplied by n reconfigurable processing units.
The main controller is used for analyzing the configuration packet and writing the configuration information into a configuration memory in each reconfigurable processing unit, and the reconfigurable processing unit executes corresponding operation under the dual drive of the data stream and the configuration stream; each reconfigurable processing unit and the interconnection among the reconfigurable processing units can be configured independently, so that the reconfigurable processing unit array can be dynamically divided into sub-arrays to perform algorithm-level parallel processing, and acceleration is realized.
According to the embodiment of the invention, in the process of designing the system architecture, the processing architecture of the reconfigurable processor is dynamically recombined in real time according to the requirement of large-point FFT operation, independent processing units are configured into a systolic array through configuration information, and meanwhile, various systolic array architectures which are beneficial to various algorithms are formed by defining the cutting and splicing modes of various reconfigurable processing unit arrays through software. The large point number refers to FFT of 128K and 256K points, for example;
1. for example, when 2-dimensional folding is required to be performed on 256K-point FFT, matrix transposition occurs during FFT calculation, and ping-pong buffering is required for memory access bandwidth and 1-dimensional FFT calculation in order to maximize FFT calculation efficiency.
2. Due to the characteristic that the FFT disk-shaped calculation access memory is discontinuous, the share memory corresponding to the PE array is subjected to targeted optimization, including bank number and bank bit width.
The shared memory is a plurality of groups of memories and has two main functions, namely, the shared memory is responsible for data interaction with the on-chip memory, and the shared memory stores intermediate data generated by each stage of FFT operation. By increasing the capacity of the shared memory, the accelerator can process FFT operation with larger points, thereby facilitating subsequent expansion.
According to the embodiment of the invention, a hierarchical data storage system is designed for improving the data access efficiency, the architecture relates to three levels of a system, a reconfigurable processing unit array and a reconfigurable processing unit, and physical units for correspondingly providing data access are respectively a shared memory, a global register and a local register. The main role of the global registers is to store data and parameters pointing to the plurality of reconfigurable processing units. The local register is mainly used for storing intermediate data in the reconfigurable processing unit and only used for the current reconfigurable processing unit to access.
Therefore, the invention designs a three-layer storage structure by analyzing the characteristics of the FFT, can store input and output data, intermediate data and the like in a layered manner, and can quickly complete the implementation of the FFT algorithm by matching with a hardware architecture.
The main controller is responsible for controlling the operation of the whole system, including controlling the configuration and data of the reconfigurable processing unit array, data movement between the shared memory and the on-chip memory, and the like. The on-chip memory is used for storing programs, configuration information and data.
The present invention provides an FFT mapping mechanism, which is usually radix-2 or radix-4 according to the FFT algorithm characteristics. According to the invention, a modularized FFT mapping mode is designed according to the architectural characteristics, and a processing unit array is divided into a plurality of sub-arrays to realize algorithm-level parallel processing. And dividing a plurality of radix 2 or radix 4 sub-modules according to the number of the FFT points, wherein the implementation mode of the radix 4 sub-module is shown in FIG. 2. And then FFT mapping with different points can be obtained through multi-point splicing. During mapping, the mapping of different point numbers of FFT can be realized by splicing the plurality of basic modules. Fig. 3 is a diagram illustrating FFT mapping of multiple points. The mapping result can obtain:
TABLE 1 Performance of the proposed FFT architecture for different number of points
The simulation data in table 1 shows that the proposed architecture flexibly supports FFT operation with a larger number of points, and has strong expandability. An FFT of 1K to 256K points can be achieved. According to one embodiment of the invention, typical application requirements in the target field are firstly analyzed, the range of the number of FFT operation points is judged, factors such as area and power consumption are comprehensively considered, and the capacity of the shared memory is determined. On the basis of definite hardware architecture, taking N-point FFT operation as an example to illustrate the processing steps, wherein N is an integer power of 4, and a base 4 FFT algorithm is adopted to be carried out in commonlog 4 NAnd (4) carrying out stage FFT operation.
Firstly, the main controller moves original data from an on-chip memory to a shared memory; after the data preparation is finished, the main controller analyzes the configuration words and writes the configuration information of each reconfigurable unit into a corresponding configuration memory; after all data and configuration information are prepared, a timer is initialized, and the reconfigurable processing unit array is started.
Secondly, reading configuration information by the reconfigurable processing unit array, and determining iteration times; the method comprises the following steps that a part of reconfigurable processing units read original data from a shared memory, each reconfigurable processing unit reads corresponding configuration information and executes specified operation, and once iteration of a reconfigurable processing unit array is finished after all reconfigurable processing units complete operation; continuing to execute until all iterations are completed; stopping the timer and recording the number of clock cycles; during the FFT operation, intermediate data generated by each stage of the FFT operation is stored in the shared memory.
Through the second step in this embodiment, the following advantages are obtained:
1. the reconfigurable processing unit and the basic butterfly unit of the FFT are mapped and fused. The configuration information of the reconfigurable processing units is combined in a multi-iteration mode, a large amount of similar configuration information is compressed, and the storage capacity of the configuration information is reduced. When the configuration information is executed, the iteration from top to bottom is executed, the iteration comprises the iterative execution of the configuration information of the whole framework, the iterative execution of the configuration information of a plurality of arrays is executed, and finally the iterative execution of the configuration information of each processing unit is refined.
2. During execution, configuration information preloading of each iterative execution is performed according to the occurrence frequency of each operator of the FFT algorithm, so that the operation process of the whole hardware structure can be accelerated.
And thirdly, writing the FFT operation result into the shared memory by the partial reconfigurable processing unit, and then writing the FFT operation result into the on-chip memory.
Compared with the traditional processor structure, the dynamic reconfiguration processing platform is simulated, and the result shows that the number of cycles required by the dynamic reconfiguration processing platform designed by the invention is obviously lower than that of the DSP and the FPGA.
TABLE 2 comparison of Properties
Although the illustrative embodiments of the present invention have been described in order to facilitate those skilled in the art to understand the invention, it is to be understood that the invention is not limited in scope to the specific embodiments, but rather, it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and it is intended that all matter contained in the invention and created by the inventive concept be protected.
Claims (4)
1. A systolic array reconfigurable processor for FFT-based block mapping, comprising:
the reconfigurable processing unit array, the main controller, the shared memory and the on-chip memory;
the reconfigurable processing unit array is responsible for FFT operation and comprises m multiplied by n reconfigurable processing units, wherein m is the number of rows and n is the number of columns;
the main controller is used for analyzing the configuration packet and writing configuration information into a configuration memory in each reconfigurable processing unit, the reconfigurable processing units execute corresponding operations under the dual drive of data flow and configuration flow, each reconfigurable processing unit and the interconnection among the reconfigurable processing units can be configured independently, and the reconfigurable processing unit array can be dynamically divided into subarrays for algorithm level parallel processing to realize acceleration;
the shared memory comprises a plurality of groups of memories and is used for carrying out data interaction with the on-chip memory and storing intermediate data generated by each stage of FFT operation;
the on-chip memory includes global and local registers for storing programs, configuration information and data.
2. The systolic array reconfigurable processor for mapping to FFT-based blocks according to claim 1, further comprising:
a hierarchical data storage mode is adopted, the architecture hierarchy relates to three levels of a reconfigurable processor, a reconfigurable processing unit array and a reconfigurable processing unit, and physical units for correspondingly providing data access are respectively a shared memory, a global register and a local register;
the global register is mainly used for storing data and parameters pointing to the plurality of reconfigurable processing units; all reconfigurable processing units in the reconfigurable processing unit array can access data;
the local register is mainly used for storing intermediate data in the reconfigurable processing unit and only accessed by the current reconfigurable processing unit;
the main controller is responsible for controlling the configuration and data of the reconfigurable processing unit array and data movement between the shared memory and the on-chip memory.
3. The systolic array reconfigurable processor for mapping to FFT-based blocks according to claim 1, further comprising:
and for the requirement of FFT operation, the processing architecture of the reconfigurable processor is dynamically recombined in real time, the independent reconfigurable processing units are configured into a pulse array through configuration information, and meanwhile, various pulse array architectures suitable for various algorithms are formed by defining the cutting and splicing modes of various reconfigurable processing unit arrays through software.
4. The systolic array reconfigurable processor for mapping to FFT-based blocks according to claim 1, further comprising:
the method comprises the steps of adopting a modularized FFT mapping mode, dividing a reconfigurable processing unit array into a plurality of sub-arrays to achieve algorithm level parallel processing, dividing a plurality of radix 2 or radix 4 sub-modules according to the number of FFT points, obtaining FFT mapping with different points through multi-point splicing, and realizing the FFT mapping with different points through splicing the plurality of radix modules in the mapping process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210894357.3A CN115080503A (en) | 2022-07-28 | 2022-07-28 | Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210894357.3A CN115080503A (en) | 2022-07-28 | 2022-07-28 | Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115080503A true CN115080503A (en) | 2022-09-20 |
Family
ID=83241965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210894357.3A Pending CN115080503A (en) | 2022-07-28 | 2022-07-28 | Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115080503A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19722365A1 (en) * | 1996-05-28 | 1997-12-04 | Nat Semiconductor Corp | Reconfigurable computer component with adaptive logic processor |
US20080155003A1 (en) * | 2006-12-21 | 2008-06-26 | National Chiao Tung University | Pipeline-based reconfigurable mixed-radix FFT processor |
CN101694648A (en) * | 2009-08-28 | 2010-04-14 | 曙光信息产业(北京)有限公司 | Fourier transform processing method and device |
CN102043761A (en) * | 2011-01-04 | 2011-05-04 | 东南大学 | Fourier transform implementation method based on reconfigurable technology |
CN202217276U (en) * | 2011-06-17 | 2012-05-09 | 江苏中科芯核电子科技有限公司 | FFT device based on parallel processing |
CN102831099A (en) * | 2012-07-27 | 2012-12-19 | 西安空间无线电技术研究所 | Implementation method of 3072-point FFT (Fast Fourier Transform) operation |
CN103678255A (en) * | 2013-12-16 | 2014-03-26 | 合肥优软信息技术有限公司 | FFT efficient parallel achieving optimizing method based on Loongson number three processor |
CN104679670A (en) * | 2015-03-10 | 2015-06-03 | 东南大学 | Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms |
WO2017125023A1 (en) * | 2016-01-19 | 2017-07-27 | 清华大学 | Pipeline reconfigurable single-precision floating-point fft/ifft coprocessor |
CN109977347A (en) * | 2019-03-29 | 2019-07-05 | 南京大学 | A kind of restructural fft processor for supporting multi-mode to configure |
CN110765709A (en) * | 2019-10-15 | 2020-02-07 | 天津大学 | FPGA-based 2-2 fast Fourier transform hardware design method |
CN114201725A (en) * | 2021-12-14 | 2022-03-18 | 电子科技大学 | Narrowband communication signal processing method based on multimode reconfigurable FFT |
-
2022
- 2022-07-28 CN CN202210894357.3A patent/CN115080503A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19722365A1 (en) * | 1996-05-28 | 1997-12-04 | Nat Semiconductor Corp | Reconfigurable computer component with adaptive logic processor |
US20080155003A1 (en) * | 2006-12-21 | 2008-06-26 | National Chiao Tung University | Pipeline-based reconfigurable mixed-radix FFT processor |
CN101694648A (en) * | 2009-08-28 | 2010-04-14 | 曙光信息产业(北京)有限公司 | Fourier transform processing method and device |
CN102043761A (en) * | 2011-01-04 | 2011-05-04 | 东南大学 | Fourier transform implementation method based on reconfigurable technology |
CN202217276U (en) * | 2011-06-17 | 2012-05-09 | 江苏中科芯核电子科技有限公司 | FFT device based on parallel processing |
CN102831099A (en) * | 2012-07-27 | 2012-12-19 | 西安空间无线电技术研究所 | Implementation method of 3072-point FFT (Fast Fourier Transform) operation |
CN103678255A (en) * | 2013-12-16 | 2014-03-26 | 合肥优软信息技术有限公司 | FFT efficient parallel achieving optimizing method based on Loongson number three processor |
CN104679670A (en) * | 2015-03-10 | 2015-06-03 | 东南大学 | Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms |
WO2017125023A1 (en) * | 2016-01-19 | 2017-07-27 | 清华大学 | Pipeline reconfigurable single-precision floating-point fft/ifft coprocessor |
CN109977347A (en) * | 2019-03-29 | 2019-07-05 | 南京大学 | A kind of restructural fft processor for supporting multi-mode to configure |
CN110765709A (en) * | 2019-10-15 | 2020-02-07 | 天津大学 | FPGA-based 2-2 fast Fourier transform hardware design method |
CN114201725A (en) * | 2021-12-14 | 2022-03-18 | 电子科技大学 | Narrowband communication signal processing method based on multimode reconfigurable FFT |
Non-Patent Citations (1)
Title |
---|
冷金麟等: "《Visual FoxPro程序设计》", 31 January 2012, 上海交通大学出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7640284B1 (en) | Bit reversal methods for a parallel processor | |
Mittal | A survey of accelerator architectures for 3D convolution neural networks | |
CN111915001B (en) | Convolution calculation engine, artificial intelligent chip and data processing method | |
US20210150362A1 (en) | Neural network compression based on bank-balanced sparsity | |
CN109977347B (en) | Reconfigurable FFT processor supporting multimode configuration | |
KR20220051006A (en) | Method of performing PIM (PROCESSING-IN-MEMORY) operation, and related memory device and system | |
Que et al. | Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs | |
US20180373677A1 (en) | Apparatus and Methods of Providing Efficient Data Parallelization for Multi-Dimensional FFTs | |
Zhou et al. | Addressing sparsity in deep neural networks | |
Nguyen et al. | ShortcutFusion: From tensorflow to FPGA-based accelerator with a reuse-aware memory allocation for shortcut data | |
Lou et al. | RV-CNN: Flexible and efficient instruction set for CNNs based on RISC-V processors | |
Huang et al. | A high performance multi-bit-width booth vector systolic accelerator for NAS optimized deep learning neural networks | |
JP2023534068A (en) | Systems and methods for accelerating deep learning networks using sparsity | |
US11614945B2 (en) | Apparatus and method of a scalable and reconfigurable fast fourier transform | |
Akin et al. | FFTs with near-optimal memory access through block data layouts: Algorithm, architecture and design automation | |
Akkad et al. | Embedded deep learning accelerators: A survey on recent advances | |
Asadikouhanjani et al. | Enhancing the utilization of processing elements in spatial deep neural network accelerators | |
Arora et al. | CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning Acceleration | |
US20230117042A1 (en) | Implementation of discrete fourier-related transforms in hardware | |
Mahale et al. | Windconv: A fused datapath cnn accelerator for power-efficient edge devices | |
CN115080503A (en) | Systolic array reconfigurable processor aiming at FFT (fast Fourier transform) base module mapping | |
US20220188613A1 (en) | Sgcnax: a scalable graph convolutional neural network accelerator with workload balancing | |
Srinivasa et al. | Trends and opportunities for SRAM based in-memory and near-memory computation | |
US20210241806A1 (en) | Streaming access memory device, system and method | |
Choi et al. | Energy-efficient and parameterized designs for fast Fourier transform on FPGAs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220920 |
|
RJ01 | Rejection of invention patent application after publication |