CN113918875B - Fast processing method of two-dimensional FFT - Google Patents
Fast processing method of two-dimensional FFT Download PDFInfo
- Publication number
- CN113918875B CN113918875B CN202111114501.9A CN202111114501A CN113918875B CN 113918875 B CN113918875 B CN 113918875B CN 202111114501 A CN202111114501 A CN 202111114501A CN 113918875 B CN113918875 B CN 113918875B
- Authority
- CN
- China
- Prior art keywords
- unit
- data
- cache
- fft calculation
- ram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims description 7
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Discrete Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention relates to a quick processing method of a two-dimensional FFT, which comprises the steps of reading original data in a RAM unit through a DMA unit, sequentially reading the data, storing the data into a cache unit A1, a cache unit A2, a cache unit A3 and a cache unit A4, sequentially calculating the data through an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4 to obtain a one-dimensional FFT calculation result, storing the result into a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4, and writing the result into the RAM unit through the DMA unit to finish the first-dimensional FFT calculation. Similar to the first dimension FFT computation, a second dimension FFT computation is performed. The invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the system real-time performance is good.
Description
Technical Field
The invention relates to the field of data processing, in particular to a quick processing method of a two-dimensional FFT.
Background
The existing radar data or other data is processed by two-dimensional FFT: firstly, sequentially performing DMA read operation, FFT calculation and DMA write operation on AD original data in a RAM, and then storing a one-dimensional calculation result into the RAM; and then sequentially performing DMA read operation, FFT calculation and DMA write operation on the one-dimensional FFT calculation result in the RAM, and storing the two-dimensional calculation result in the RAM. The treatment method has the following problems:
1. The DMA read-write operation and the FFT processing are executed in series, so that the operation efficiency is low, and the real-time requirement cannot be met;
2. Although DMA can support dual channels, if read-write operation is performed simultaneously, random read-write problem of RAM is brought, especially DRAM problem is outstanding, resulting in low code execution efficiency and failure to meet real-time requirement;
3. in the FFT processing process, the CPU is occupied by the thread for a long time, so that other threads cannot be normally scheduled;
4. Without fully utilizing hardware computing resources, the CPU may require a higher dominant frequency to meet application requirements, resulting in difficulty in cost reduction.
In view of the above problems, the present inventors have proposed a fast FFT processing method.
Disclosure of Invention
The invention aims to provide a quick processing method of a two-dimensional FFT (fast Fourier transform) so as to improve the operation efficiency of the FFT.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
The fast processing method of the two-dimensional FFT is realized based on a processing system, wherein the processing system comprises an FFT calculation unit, a cache unit, a RAM unit and a DMA unit;
The FFT calculation unit performs FFT calculation through a CPU and a hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4;
the cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4;
The RAM unit is used for storing data to be subjected to FFT calculation and data after the FFT calculation is completed; the DMA unit is used for reading and storing data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3 and the cache unit A4, or writing the data in the cache unit B1, the cache unit B2, the cache unit B3 and the cache unit B4 into the RAM unit;
the processing method comprises the steps of initial reading and calculating, cyclic reading and calculating, and ending calculating and writing;
The initial reading and calculation is as follows:
(1) Starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(2) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
the cyclic reading and writing and calculation are as follows:
(1) The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
(2) The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
(3) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(4) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
(5) The above processes are circulated until the data needing FFT calculation of the RAM are read;
the ending calculation and writing are as follows:
(1) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
(2) And starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
The CPU is provided with a BUSY mark, and when the FFT calculation unit calculates, the BUSY mark is 1.
After the scheme is adopted, the invention has the following beneficial effects:
1. The invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the system real-time performance is good.
2. The invention separates the read-write operation of the DMA unit to the RAM by adopting the pipeline and the ping-pong operation, and avoids the problem of low efficiency caused by random read-write of the RAM unit, in particular to the DRAM. The invention can exit and release CPU resources in time in the data processing process, and improves the application flexibility of the CPU.
Drawings
FIG. 1 is a schematic block diagram of the present invention;
Fig. 2 is a flow chart of the method of the present invention.
Detailed Description
As shown in fig. 1, the present invention discloses a fast processing system for a two-dimensional FFT, which includes an FFT calculation unit, a buffer unit, a RAM unit, and a DMA unit (not shown in the figure).
The FFT calculation unit performs FFT calculation through the CPU and the hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4. The cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4. The RAM unit is used to store data to be subjected to FFT computation and data for which FFT computation is completed. The DMA unit is configured to read data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3, and the cache unit A4, or write data in the cache unit B1, the cache unit B2, the cache unit B3, and the cache unit B4 into the RAM unit.
With continued reference to fig. 1, the present invention also discloses a fast processing method of the two-dimensional FFT based on the fast processing system of the two-dimensional FFT, which reads the original data in the RAM unit through the DMA unit, sequentially reads the data, stores the data in the buffer unit A1, the buffer unit A2, the buffer unit A3, and the buffer unit A4, sequentially calculates the data through the FFT calculation unit F1, the FFT calculation unit F2, the FFT calculation unit F3, and the FFT calculation unit F4 to obtain a one-dimensional FFT calculation result, and stores the one-dimensional FFT calculation result in the buffer unit B1, the buffer unit B2, the buffer unit B3, and the buffer unit B4, and then writes the data into the RAM unit through the DMA unit, thereby completing the first-dimensional FFT calculation. Similarly to the first dimension FFT calculation, the DMA unit reads the first dimension FFT calculation result data in the RAM unit, sequentially reads the data, stores the data in the buffer unit A1, the buffer unit A2, the buffer unit A3, and the buffer unit A4, sequentially calculates the data by the FFT calculation unit F1, the FFT calculation unit F2, the FFT calculation unit F3, and the FFT calculation unit F4 to obtain two dimension FFT calculation result data, and stores the two dimension FFT calculation result data in the FFT calculation unit B1, the FFT calculation unit B2, the FFT calculation unit B3, and the FFT calculation unit B4, and then writes the two dimension FFT calculation result data into the RAM unit by the DMA unit.
Specifically, the fast processing method of the present invention includes three blocks, namely, initial reading and calculating, cyclic reading and writing and calculating, and ending calculating and writing, as shown in fig. 2.
1. Initial reading and calculation:
(1) Starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(2) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
2. Cyclic reading and writing and calculating:
(1) The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
(2) The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
(3) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(4) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
(5) The above processes are circulated until the data needing FFT calculation of the RAM are read;
3. Ending the calculation and writing as follows:
(1) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
(2) And starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
The key of the invention is that firstly, the invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the real-time performance of the system is good. Secondly, the invention separates the read-write operation of the DMA unit to the RAM by adopting the pipeline and the ping-pong operation, thereby avoiding the problem of low efficiency caused by random read-write of the RAM unit, in particular to the DRAM. Third, the data processing flow of the present invention occupies a thread that can be temporarily exited during the loop in response to task requests of other threads.
On the basis of the above, the CPU is provided with a BUSY flag, and when the FFT calculation unit calculates, the BUSY flag is 1. Different threads can simultaneously request CPU resources, the system flexibility is good, the maximum utilization of hardware resources is realized, the cost can be further reduced, the performance is improved, and the product competitiveness is improved.
The foregoing embodiments of the present invention are not intended to limit the technical scope of the present invention, and therefore, any minor modifications, equivalent variations and modifications made to the above embodiments according to the technical principles of the present invention still fall within the scope of the technical proposal of the present invention.
Claims (2)
1. A fast processing method of a two-dimensional FFT is characterized in that: the processing method is realized based on a processing system, and the processing system comprises an FFT calculation unit, a cache unit, a RAM unit and a DMA unit;
The FFT calculation unit performs FFT calculation through a CPU and a hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4;
the cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4;
The RAM unit is used for storing data to be subjected to FFT calculation and data after the FFT calculation is completed; the DMA unit is used for reading and storing data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3 and the cache unit A4, or writing the data in the cache unit B1, the cache unit B2, the cache unit B3 and the cache unit B4 into the RAM unit;
the processing method comprises the steps of initial reading and calculating, cyclic reading and calculating, and ending calculating and writing;
The initial reading and calculation is as follows:
starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
the cyclic reading and writing and calculation are as follows:
The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
the FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
the above processes are circulated until the data which is needed to be subjected to FFT calculation by the RAM are read;
the ending calculation and writing are as follows:
the FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
and starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
2. The fast processing method of a two-dimensional FFT of claim 1, characterized by: the CPU is provided with a BUSY mark, and when the FFT calculation unit calculates, the BUSY mark is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111114501.9A CN113918875B (en) | 2021-09-23 | 2021-09-23 | Fast processing method of two-dimensional FFT |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111114501.9A CN113918875B (en) | 2021-09-23 | 2021-09-23 | Fast processing method of two-dimensional FFT |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113918875A CN113918875A (en) | 2022-01-11 |
CN113918875B true CN113918875B (en) | 2024-05-03 |
Family
ID=79235854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111114501.9A Active CN113918875B (en) | 2021-09-23 | 2021-09-23 | Fast processing method of two-dimensional FFT |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113918875B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1992018940A1 (en) * | 1991-04-18 | 1992-10-29 | Sharp Kabushiki Kaisha | Quasi radix-16 processor and method |
WO2003034269A1 (en) * | 2001-10-12 | 2003-04-24 | Pts Corporation | Method of performing a fft transform on parallel processors |
WO2010045808A1 (en) * | 2008-10-24 | 2010-04-29 | 中兴通讯股份有限公司 | Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform |
CN102340796A (en) * | 2011-05-16 | 2012-02-01 | 中兴通讯股份有限公司 | Secondary synchronization channel detection method and device |
EP2778948A2 (en) * | 2013-03-15 | 2014-09-17 | Analog Devices, Inc. | FFT Accelerator |
WO2018129930A1 (en) * | 2017-01-12 | 2018-07-19 | 深圳市中兴微电子技术有限公司 | Fast fourier transform processing method and device, and computer storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7856465B2 (en) * | 2006-12-21 | 2010-12-21 | Intel Corporation | Combined fast fourier transforms and matrix operations |
-
2021
- 2021-09-23 CN CN202111114501.9A patent/CN113918875B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1992018940A1 (en) * | 1991-04-18 | 1992-10-29 | Sharp Kabushiki Kaisha | Quasi radix-16 processor and method |
WO2003034269A1 (en) * | 2001-10-12 | 2003-04-24 | Pts Corporation | Method of performing a fft transform on parallel processors |
WO2010045808A1 (en) * | 2008-10-24 | 2010-04-29 | 中兴通讯股份有限公司 | Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform |
CN102340796A (en) * | 2011-05-16 | 2012-02-01 | 中兴通讯股份有限公司 | Secondary synchronization channel detection method and device |
EP2778948A2 (en) * | 2013-03-15 | 2014-09-17 | Analog Devices, Inc. | FFT Accelerator |
WO2018129930A1 (en) * | 2017-01-12 | 2018-07-19 | 深圳市中兴微电子技术有限公司 | Fast fourier transform processing method and device, and computer storage medium |
Non-Patent Citations (1)
Title |
---|
一种应用定制指令集可重构结构及FFT算法映射优化;刘磊;杨子煜;沈剑良;李思昆;;国防科技大学学报;20121228(06);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113918875A (en) | 2022-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991011B (en) | CPU multithreading and GPU (graphics processing unit) multi-granularity parallel and cooperative optimization based method | |
CN108733415B (en) | Method and device for supporting vector random access | |
US6865631B2 (en) | Reduction of interrupts in remote procedure calls | |
CN106681660B (en) | IO scheduling method and IO scheduling device | |
CN116501249A (en) | Method for reducing repeated data read-write of GPU memory and related equipment | |
CN113918875B (en) | Fast processing method of two-dimensional FFT | |
CN106469119A (en) | A kind of data write buffer method based on NVDIMM and its device | |
CN102810133A (en) | Ray query method for network game, and scene server | |
CN112463037B (en) | Metadata storage method, device, equipment and product | |
CN113569189B (en) | Fast Fourier transform calculation method and device | |
CN102495710B (en) | Method for processing data read-only accessing request | |
CN114063923A (en) | Data reading method and device, processor and electronic equipment | |
US20230070827A1 (en) | Accelerating computations in a processor | |
CN113220608A (en) | NVMe command processor and processing method thereof | |
CN111368250B (en) | Data processing system, method and equipment based on Fourier transformation/inverse transformation | |
CN116820333B (en) | SSDRAID-5 continuous writing method based on multithreading | |
CN106951311B (en) | Data processing method and server cluster | |
US8677028B2 (en) | Interrupt-based command processing | |
CN112837205B (en) | Delay correction-based batch matrix inversion method on graphics processor | |
US11829768B2 (en) | Method for scheduling out-of-order queue and electronic device items | |
US20240004653A1 (en) | Approach for managing near-memory processing commands from multiple processor threads to prevent interference at near-memory processing elements | |
CN111367625A (en) | Thread awakening method and device, storage medium and electronic equipment | |
Lee et al. | Parallel srp-phat for GPUs | |
US11809282B2 (en) | Optimized pipeline to boost de-dup system performance | |
CN114281554B (en) | 3D-CNN acceleration method and device for 3D image processing and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |