CN113918875B - Fast processing method of two-dimensional FFT - Google Patents

Fast processing method of two-dimensional FFT Download PDF

Info

Publication number
CN113918875B
CN113918875B CN202111114501.9A CN202111114501A CN113918875B CN 113918875 B CN113918875 B CN 113918875B CN 202111114501 A CN202111114501 A CN 202111114501A CN 113918875 B CN113918875 B CN 113918875B
Authority
CN
China
Prior art keywords
unit
data
cache
fft calculation
ram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111114501.9A
Other languages
Chinese (zh)
Other versions
CN113918875A (en
Inventor
林炳章
苏亮
吴江淼
叶炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tung Thih Electron Xiamen Co Ltd
Original Assignee
Tung Thih Electron Xiamen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tung Thih Electron Xiamen Co Ltd filed Critical Tung Thih Electron Xiamen Co Ltd
Priority to CN202111114501.9A priority Critical patent/CN113918875B/en
Publication of CN113918875A publication Critical patent/CN113918875A/en
Application granted granted Critical
Publication of CN113918875B publication Critical patent/CN113918875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Discrete Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a quick processing method of a two-dimensional FFT, which comprises the steps of reading original data in a RAM unit through a DMA unit, sequentially reading the data, storing the data into a cache unit A1, a cache unit A2, a cache unit A3 and a cache unit A4, sequentially calculating the data through an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4 to obtain a one-dimensional FFT calculation result, storing the result into a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4, and writing the result into the RAM unit through the DMA unit to finish the first-dimensional FFT calculation. Similar to the first dimension FFT computation, a second dimension FFT computation is performed. The invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the system real-time performance is good.

Description

Fast processing method of two-dimensional FFT
Technical Field
The invention relates to the field of data processing, in particular to a quick processing method of a two-dimensional FFT.
Background
The existing radar data or other data is processed by two-dimensional FFT: firstly, sequentially performing DMA read operation, FFT calculation and DMA write operation on AD original data in a RAM, and then storing a one-dimensional calculation result into the RAM; and then sequentially performing DMA read operation, FFT calculation and DMA write operation on the one-dimensional FFT calculation result in the RAM, and storing the two-dimensional calculation result in the RAM. The treatment method has the following problems:
1. The DMA read-write operation and the FFT processing are executed in series, so that the operation efficiency is low, and the real-time requirement cannot be met;
2. Although DMA can support dual channels, if read-write operation is performed simultaneously, random read-write problem of RAM is brought, especially DRAM problem is outstanding, resulting in low code execution efficiency and failure to meet real-time requirement;
3. in the FFT processing process, the CPU is occupied by the thread for a long time, so that other threads cannot be normally scheduled;
4. Without fully utilizing hardware computing resources, the CPU may require a higher dominant frequency to meet application requirements, resulting in difficulty in cost reduction.
In view of the above problems, the present inventors have proposed a fast FFT processing method.
Disclosure of Invention
The invention aims to provide a quick processing method of a two-dimensional FFT (fast Fourier transform) so as to improve the operation efficiency of the FFT.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
The fast processing method of the two-dimensional FFT is realized based on a processing system, wherein the processing system comprises an FFT calculation unit, a cache unit, a RAM unit and a DMA unit;
The FFT calculation unit performs FFT calculation through a CPU and a hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4;
the cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4;
The RAM unit is used for storing data to be subjected to FFT calculation and data after the FFT calculation is completed; the DMA unit is used for reading and storing data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3 and the cache unit A4, or writing the data in the cache unit B1, the cache unit B2, the cache unit B3 and the cache unit B4 into the RAM unit;
the processing method comprises the steps of initial reading and calculating, cyclic reading and calculating, and ending calculating and writing;
The initial reading and calculation is as follows:
(1) Starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(2) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
the cyclic reading and writing and calculation are as follows:
(1) The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
(2) The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
(3) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(4) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
(5) The above processes are circulated until the data needing FFT calculation of the RAM are read;
the ending calculation and writing are as follows:
(1) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
(2) And starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
The CPU is provided with a BUSY mark, and when the FFT calculation unit calculates, the BUSY mark is 1.
After the scheme is adopted, the invention has the following beneficial effects:
1. The invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the system real-time performance is good.
2. The invention separates the read-write operation of the DMA unit to the RAM by adopting the pipeline and the ping-pong operation, and avoids the problem of low efficiency caused by random read-write of the RAM unit, in particular to the DRAM. The invention can exit and release CPU resources in time in the data processing process, and improves the application flexibility of the CPU.
Drawings
FIG. 1 is a schematic block diagram of the present invention;
Fig. 2 is a flow chart of the method of the present invention.
Detailed Description
As shown in fig. 1, the present invention discloses a fast processing system for a two-dimensional FFT, which includes an FFT calculation unit, a buffer unit, a RAM unit, and a DMA unit (not shown in the figure).
The FFT calculation unit performs FFT calculation through the CPU and the hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4. The cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4. The RAM unit is used to store data to be subjected to FFT computation and data for which FFT computation is completed. The DMA unit is configured to read data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3, and the cache unit A4, or write data in the cache unit B1, the cache unit B2, the cache unit B3, and the cache unit B4 into the RAM unit.
With continued reference to fig. 1, the present invention also discloses a fast processing method of the two-dimensional FFT based on the fast processing system of the two-dimensional FFT, which reads the original data in the RAM unit through the DMA unit, sequentially reads the data, stores the data in the buffer unit A1, the buffer unit A2, the buffer unit A3, and the buffer unit A4, sequentially calculates the data through the FFT calculation unit F1, the FFT calculation unit F2, the FFT calculation unit F3, and the FFT calculation unit F4 to obtain a one-dimensional FFT calculation result, and stores the one-dimensional FFT calculation result in the buffer unit B1, the buffer unit B2, the buffer unit B3, and the buffer unit B4, and then writes the data into the RAM unit through the DMA unit, thereby completing the first-dimensional FFT calculation. Similarly to the first dimension FFT calculation, the DMA unit reads the first dimension FFT calculation result data in the RAM unit, sequentially reads the data, stores the data in the buffer unit A1, the buffer unit A2, the buffer unit A3, and the buffer unit A4, sequentially calculates the data by the FFT calculation unit F1, the FFT calculation unit F2, the FFT calculation unit F3, and the FFT calculation unit F4 to obtain two dimension FFT calculation result data, and stores the two dimension FFT calculation result data in the FFT calculation unit B1, the FFT calculation unit B2, the FFT calculation unit B3, and the FFT calculation unit B4, and then writes the two dimension FFT calculation result data into the RAM unit by the DMA unit.
Specifically, the fast processing method of the present invention includes three blocks, namely, initial reading and calculating, cyclic reading and writing and calculating, and ending calculating and writing, as shown in fig. 2.
1. Initial reading and calculation:
(1) Starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(2) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
2. Cyclic reading and writing and calculating:
(1) The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
(2) The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
(3) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
(4) The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
(5) The above processes are circulated until the data needing FFT calculation of the RAM are read;
3. Ending the calculation and writing as follows:
(1) The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
(2) And starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
The key of the invention is that firstly, the invention uses the DMA unit, the hardware accelerator FFTA and the CPU to execute in sections, so that the read-write operation of the RAM can be executed in parallel with FFT calculation, the operation efficiency is improved, and the real-time performance of the system is good. Secondly, the invention separates the read-write operation of the DMA unit to the RAM by adopting the pipeline and the ping-pong operation, thereby avoiding the problem of low efficiency caused by random read-write of the RAM unit, in particular to the DRAM. Third, the data processing flow of the present invention occupies a thread that can be temporarily exited during the loop in response to task requests of other threads.
On the basis of the above, the CPU is provided with a BUSY flag, and when the FFT calculation unit calculates, the BUSY flag is 1. Different threads can simultaneously request CPU resources, the system flexibility is good, the maximum utilization of hardware resources is realized, the cost can be further reduced, the performance is improved, and the product competitiveness is improved.
The foregoing embodiments of the present invention are not intended to limit the technical scope of the present invention, and therefore, any minor modifications, equivalent variations and modifications made to the above embodiments according to the technical principles of the present invention still fall within the scope of the technical proposal of the present invention.

Claims (2)

1. A fast processing method of a two-dimensional FFT is characterized in that: the processing method is realized based on a processing system, and the processing system comprises an FFT calculation unit, a cache unit, a RAM unit and a DMA unit;
The FFT calculation unit performs FFT calculation through a CPU and a hardware accelerator FFTA, and comprises an FFT calculation unit F1, an FFT calculation unit F2, an FFT calculation unit F3 and an FFT calculation unit F4;
the cache units comprise a cache unit A1, a cache unit A2, a cache unit A3, a cache unit A4, a cache unit B1, a cache unit B2, a cache unit B3 and a cache unit B4;
The RAM unit is used for storing data to be subjected to FFT calculation and data after the FFT calculation is completed; the DMA unit is used for reading and storing data in the RAM into the cache unit A1, the cache unit A2, the cache unit A3 and the cache unit A4, or writing the data in the cache unit B1, the cache unit B2, the cache unit B3 and the cache unit B4 into the RAM unit;
the processing method comprises the steps of initial reading and calculating, cyclic reading and calculating, and ending calculating and writing;
The initial reading and calculation is as follows:
starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
The FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1;
the cyclic reading and writing and calculation are as follows:
The FFT calculation unit F2 performs FFT calculation on the data in the buffer unit A2 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B2; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A3 and a cache unit A4;
The FFT calculation unit F3 performs FFT calculation on the data in the buffer unit A3 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B3; meanwhile, starting a DMA unit, and writing the data in the cache unit B1 and the cache unit B2 into the RAM;
The FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4; meanwhile, starting a DMA unit, reading data to be operated in the RAM unit, and putting the data into a cache unit A1 and a cache unit A2;
the FFT calculation unit F1 performs FFT calculation on the data in the buffer unit A1 by utilizing a CPU and a hardware accelerator FFTA, and the calculation result is stored in the buffer unit B1; meanwhile, starting a DMA unit, and writing the data in the cache unit B3 and the cache unit B4 into the RAM;
the above processes are circulated until the data which is needed to be subjected to FFT calculation by the RAM are read;
the ending calculation and writing are as follows:
the FFT calculation unit F4 performs FFT calculation on the data in the buffer unit A4 by utilizing the CPU and the hardware accelerator FFTA, and the calculation result is stored in the buffer unit B4;
and starting the DMA unit, and writing the data in the buffer unit B3 and the buffer unit B4 into the RAM.
2. The fast processing method of a two-dimensional FFT of claim 1, characterized by: the CPU is provided with a BUSY mark, and when the FFT calculation unit calculates, the BUSY mark is 1.
CN202111114501.9A 2021-09-23 2021-09-23 Fast processing method of two-dimensional FFT Active CN113918875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111114501.9A CN113918875B (en) 2021-09-23 2021-09-23 Fast processing method of two-dimensional FFT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111114501.9A CN113918875B (en) 2021-09-23 2021-09-23 Fast processing method of two-dimensional FFT

Publications (2)

Publication Number Publication Date
CN113918875A CN113918875A (en) 2022-01-11
CN113918875B true CN113918875B (en) 2024-05-03

Family

ID=79235854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111114501.9A Active CN113918875B (en) 2021-09-23 2021-09-23 Fast processing method of two-dimensional FFT

Country Status (1)

Country Link
CN (1) CN113918875B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992018940A1 (en) * 1991-04-18 1992-10-29 Sharp Kabushiki Kaisha Quasi radix-16 processor and method
WO2003034269A1 (en) * 2001-10-12 2003-04-24 Pts Corporation Method of performing a fft transform on parallel processors
WO2010045808A1 (en) * 2008-10-24 2010-04-29 中兴通讯股份有限公司 Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform
CN102340796A (en) * 2011-05-16 2012-02-01 中兴通讯股份有限公司 Secondary synchronization channel detection method and device
EP2778948A2 (en) * 2013-03-15 2014-09-17 Analog Devices, Inc. FFT Accelerator
WO2018129930A1 (en) * 2017-01-12 2018-07-19 深圳市中兴微电子技术有限公司 Fast fourier transform processing method and device, and computer storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7856465B2 (en) * 2006-12-21 2010-12-21 Intel Corporation Combined fast fourier transforms and matrix operations

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992018940A1 (en) * 1991-04-18 1992-10-29 Sharp Kabushiki Kaisha Quasi radix-16 processor and method
WO2003034269A1 (en) * 2001-10-12 2003-04-24 Pts Corporation Method of performing a fft transform on parallel processors
WO2010045808A1 (en) * 2008-10-24 2010-04-29 中兴通讯股份有限公司 Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform
CN102340796A (en) * 2011-05-16 2012-02-01 中兴通讯股份有限公司 Secondary synchronization channel detection method and device
EP2778948A2 (en) * 2013-03-15 2014-09-17 Analog Devices, Inc. FFT Accelerator
WO2018129930A1 (en) * 2017-01-12 2018-07-19 深圳市中兴微电子技术有限公司 Fast fourier transform processing method and device, and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种应用定制指令集可重构结构及FFT算法映射优化;刘磊;杨子煜;沈剑良;李思昆;;国防科技大学学报;20121228(06);全文 *

Also Published As

Publication number Publication date
CN113918875A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
CN106991011B (en) CPU multithreading and GPU (graphics processing unit) multi-granularity parallel and cooperative optimization based method
CN108733415B (en) Method and device for supporting vector random access
US6865631B2 (en) Reduction of interrupts in remote procedure calls
CN106681660B (en) IO scheduling method and IO scheduling device
CN116501249A (en) Method for reducing repeated data read-write of GPU memory and related equipment
CN113918875B (en) Fast processing method of two-dimensional FFT
CN106469119A (en) A kind of data write buffer method based on NVDIMM and its device
CN102810133A (en) Ray query method for network game, and scene server
CN112463037B (en) Metadata storage method, device, equipment and product
CN113569189B (en) Fast Fourier transform calculation method and device
CN102495710B (en) Method for processing data read-only accessing request
CN114063923A (en) Data reading method and device, processor and electronic equipment
US20230070827A1 (en) Accelerating computations in a processor
CN113220608A (en) NVMe command processor and processing method thereof
CN111368250B (en) Data processing system, method and equipment based on Fourier transformation/inverse transformation
CN116820333B (en) SSDRAID-5 continuous writing method based on multithreading
CN106951311B (en) Data processing method and server cluster
US8677028B2 (en) Interrupt-based command processing
CN112837205B (en) Delay correction-based batch matrix inversion method on graphics processor
US11829768B2 (en) Method for scheduling out-of-order queue and electronic device items
US20240004653A1 (en) Approach for managing near-memory processing commands from multiple processor threads to prevent interference at near-memory processing elements
CN111367625A (en) Thread awakening method and device, storage medium and electronic equipment
Lee et al. Parallel srp-phat for GPUs
US11809282B2 (en) Optimized pipeline to boost de-dup system performance
CN114281554B (en) 3D-CNN acceleration method and device for 3D image processing and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant