CN103810144A - FFT (fast fourier transform)/IFFT (inverse fast fourier transform) method and device for prime length - Google Patents
FFT (fast fourier transform)/IFFT (inverse fast fourier transform) method and device for prime length Download PDFInfo
- Publication number
- CN103810144A CN103810144A CN201210444181.8A CN201210444181A CN103810144A CN 103810144 A CN103810144 A CN 103810144A CN 201210444181 A CN201210444181 A CN 201210444181A CN 103810144 A CN103810144 A CN 103810144A
- Authority
- CN
- China
- Prior art keywords
- data
- order
- input
- cyclic convolution
- fft
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention provides an FFT (fast fourier transform)/IFFT (inverse fast fourier transform) method and an FFT/IFFT device for a prime length. The device comprises an input replacement module, a real coefficient cyclic convolution module, a complex coefficient cyclic convolution module and an output replacement module. By the use of a systolic array structure, the device disclosed by the invention is high in speed; by the use of a real coefficient cyclic convolution structure, the complex multiplication is avoided, and multiplication is realized through an adder network; the number of gate circuits can be effectively reduced; the architecture is suitable for realization of different types of FPGAs (field programmable gate arrays) or ASICs (application specific integrated circuits) from different factories.
Description
Technical field
The present invention relates to digital multimedia, be specifically related to the FFT(Fast Fourier Transform for the prime number point length of digital multimedia, communication system, Fast Fourier Transform (FFT))/IFFT(Inverse Fast-Fourier-Transform, Fast Fourier Transform Inverse (FFTI)) processor architecture, belong to digital processing field.
Background technology
In the development of digital signal processing, many algorithms can turn to DFT(Discrete Fourier Transform, discrete Fourier transformation as relevant, filtering, convolution etc.) realize.But square being directly proportional of length N between the calculated amount of directly calculating DFT and shift zone, is decomposed into several shorter DFT N point DFT, can make multiplication number of times greatly reduce, and has had Fast Fourier Transform (FFT).FFT is the fast algorithm of DFT, its proposition, has greatly reduced calculated amount, has fundamentally established the critical role of Fourier transform, become one of core technology in digital signal processing, be widely used in the fields such as radar, observation, high speed image processing, secure radio communication and digital communication.
Modern digital multimedia, communication system need to be used the FFT/IFFT of prime number point length, if the linear filtering major part of image processing is all prime number point length, as 7*7,11*11,13*13 etc., 480 FFT/IFFT, for hologram image processing, have wherein comprised and 5 FFT/IFFT at 3, some OFDM (Orthogonal Frequency Division Multiplexing, orthogonal frequency division multiplexi) modulation of system, separate mediation channel estimating and all need to use 12 point-1200 FFT/IFFT, and 5 FFT/IFFT FFT in these fast fourier transform/inverse transformations, are also all comprised at 3, how at FPGA(Field-Programmable Gate Array, field programmable gate array) or ASIC (Application Specific Integrated Circuit, special IC) in realize prime number point length FFT/IFFT be one of key of realizing of whole system, realize this function and have following difficulty:
1, there is no ready-made FFTIP core, existing IP kernel all only supports that length is the FFT/IFFT of 2 N power, as 512,1024 and 2048 FFT/IFFT;
2, rate request is very high, is the FFT/IFFT of 1200 as some ofdm systems need to complete total length in 41.66 microseconds;
3, traditional FFT/IFFT algorithm based on multiplication expends chip area;
4, application request FFT/IFFT can realize on different FPGA, and therefore FFT/IFFT processor architecture must have very high portability.
Summary of the invention
The object of the present invention is to provide one to utilize two port memory modules, under the present systolic array architecture of totalizer network implementation, logarithm existing continuous continual processing factually can be used lower clock frequency to complete the method and apparatus of required FFT/IFFT computing when reduction processor is to chip area demand.
The technical scheme addressing the above problem is: a kind of prime length FFT/IFFT device, comprising:
Input replacement module, comprising: double-port random read-write memory, adjust the putting in order of N data of input;
Real coefficient cyclic convolution module, comprise: adding network, cycle accumulor device, multiplexed chronotron, totalizer, through the data of described input replacement module adjustment order, input described adding network and multiplication of vectors, product is sent into described cycle accumulor device and totalizer again and done cumulatively, accumulation result is sent into described multiplexed chronotron and is adjusted data sequence and export according to the order of sequence data; Complete N-1 input data and be not more than the convolution algorithm of 1 real number and the accumulating operation of input data;
Complex coefficient cyclic convolution module, comprise: adding network, cycle accumulor device, multiplexed chronotron, totalizer, through the data of described input replacement module adjustment order, input described adding network and multiplication of vectors, product is sent into described cycle accumulor device and totalizer again and done cumulatively, accumulation result is sent into described multiplexed chronotron and is adjusted data sequence and export according to the order of sequence data; Complete N-1 input data and ± 1, the accumulating operation of the convolution algorithm of ± i and ± 1 ± i and input data;
Output replacement module, comprising: double-port random read-write memory, and by the result adjustment order through described real coefficient cyclic convolution module and complex coefficient cyclic convolution module calculating gained, a Sequential output N data.
Described adding network comprises shift cells right and additive operation unit.
Described cycle accumulor device comprises totalizer and delay register.
Described multiplexed chronotron comprises delay register and multiplexer.
Use a prime length FFT/IFFT method for device described in claim 1,
Comprise the steps:
Step 1: in input replacement module, double-port random read-write memory one end, for writing end, receives data in order, write memory, the double-port random read-write memory other end, for reading end, is adjusted data order to cyclic convolution desired sequence when output data;
Step 2: real number cyclic convolution module receives data in order, when input data and real coefficient product are made accumulating operation, data and real coefficient vector except first input data are made cyclic convolution, and time delay is also exported according to the order of sequence;
Step 3: complex coefficient cyclic convolution module receives data in order, in input in data make accumulating operation, the data except first input data with ± 1, the complex vector that ± i and ± 1 ± i form is made cyclic convolution, time delay is exported according to the order of sequence;
Step 4: in output replacement module, double-port random read-write memory one end is for writing end, and by the result of calculation write memory of complex coefficient cyclic convolution module, the double-port random read-write memory other end, for reading end, is adjusted data order to natural order and exported.
The invention has the advantages that use systolic array architecture, speed is fast; Use real number cyclic convolution structure, avoided complex multiplication, and used totalizer network to realize multiplication, can effectively reduce gate circuit number; This framework is suitable for based on different manufacturers, the realization of dissimilar FPGA or ASIC.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is prime length FFT/IFFT apparatus structure schematic diagram of the present invention;
Fig. 2 is that length is 5 FFT apparatus structure schematic diagram;
Fig. 3 is that length is adder structure schematic diagram in 5 FFT real number cyclic convolution module;
Fig. 4 is that length is adder structure schematic diagram in 5 FFT complex coefficient cyclic convolution module;
Fig. 5 is that length is 13 FFT apparatus structure schematic diagram;
Fig. 6 is that length is adder structure schematic diagram in 13 FFT real number cyclic convolution module;
Fig. 7 is that length is adder structure schematic diagram in 13 FFT complex coefficient cyclic convolution module.
Embodiment
In order to deepen the understanding of the present invention, below in conjunction with specific embodiment, the invention will be further described, and this embodiment only, for explaining the present invention, does not form limiting the scope of the present invention.
As Figure 1-4,
The computing formula of 5-POINT FFT is: F5=PL*G5*H5*PR
Wherein PL and PR are output input permutation matrix
G5 is plural cyclic convolution and cumulative matrix
H5 is real number cyclic convolution and cumulative matrix
The CSD coding of table 1H5 coefficient
Step 1: input RAM-I is double-port random read-write memory, port A is as writing inbound port, and port B is as reading port; RAM-I receives data [x0, x1 in order ... x4], at port A write memory in order, write after data, port B by displacement after sense data in proper order, output [x0, x1, x3, x4, x2].
Data [the x0 of step 2:RAM-I output, x1, x3, x4, x2] received by real number cyclic convolution module, except x0, other data are sent into adding network in order, adding network is by moving to right and the network of additive operation cell formation, to realize and vectorial V1=[0.87260.01960.1031-0.1620] multiply each other, and with 0.1667 multiply each other, these coefficients are all through 9 bit model symbolic numbers (CSD) codings; Send into cycle accumulor device B1 with vector product and make cycle accumulor, cycle accumulor device B1 is made up of totalizer and delay register; By multiplexed chronotron B2 time delay output cyclic convolution result, multiplexed chronotron B2 is made up of delay register and multiplexer again, and its convolution results [z1, z2, z3, z4] is exported to plural cyclic convolution module; Give plural cyclic convolution module with 0.1667 multiplied result and the cumulative output of x0 z0.
Step 3: plural cyclic convolution module, by received data, is sent in order adding network except z0, to realize and vectorial V2=[-i,-1+i, i ,-1-i] multiply each other, send into cycle accumulor device B3 with vector product and make cycle accumulor, cycle accumulor device B3 is made up of totalizer and delay register; By multiplexed chronotron B4 time delay output cyclic convolution result, multiplexed chronotron B4 is made up of delay register and multiplexer again; Z0 and real number cyclic convolution module are exported the directly cumulative y0 of generation of other data, and cumulative other output data [y0, y1, y2, y4, y3] that generate of z0 and each plural cyclic convolution result, export to output replacement module.
Step 4: output RAM-II is double-port random read-write memory, port A is as writing inbound port, and port B is as reading port; RAM-II order receives data, is written to correspondence memory position [y0, y1, y2, y4, y3] at port A, after data write, in port B sense data, by data after natural order [y0, y1, y2, y3, y4] output displacement.
Complete computing.
Use systolic array architecture, speed is fast; Use real number cyclic convolution structure, avoided complex multiplication, and used totalizer network to realize multiplication, can effectively reduce gate circuit number.
As shown in Fig. 1,5-7,
The computing formula of 13-POINT FFT is: F13=PL*G13*H13*PR
Wherein PL and PR are output input permutation matrix
G13 is plural cyclic convolution and cumulative matrix
H13 is real number cyclic convolution and cumulative matrix
The CSD coding of table 2H13 coefficient
Step 1: input RAM-I is double-port random read-write memory, port A is as writing inbound port, and port B is as reading port; RAM-I receives data [x0, x1 in order ... x12], at port A write memory in order, write after data, at port B, by calling over data after displacement, output [x0, x1, x2, x4, x8, x3, x6, x12, x11, x9, x5, x10, x7].
Data [the x0 of step 2:RAM-I output, x1, x2, x4, x8, x3, x6, x12, x11, x9, x5, x10, x7] received by real number cyclic convolution module, except x0, other data are sent into adding network in order, adding network is by moving to right and the network of additive operation cell formation, to realize and vectorial V1=[0.9066-0.0153 0.0368 0.1682-0.0346-0.0440-0.0165-0.1210 0.0555-0.0258 0.0445 0.1289] multiply each other, and with-0.0833 multiply each other, these coefficients are all through 9 bit model symbolic numbers (CSD) codings; Send into cycle accumulor device B1 with vector product and make cycle accumulor, cycle accumulor device B1 is made up of totalizer and delay register; By multiplexed chronotron B2 time delay output cyclic convolution result, multiplexed chronotron B2 is made up of delay register and multiplexer again; Its convolution results [z1, z2, z3 ... z12] export to plural cyclic convolution module; Input data and-0.0833 multiply each other, and its product and x0 are cumulative, and output z0 give plural cyclic convolution module.
Step 3: plural cyclic convolution module, by received data, is sent in order adding network except z0, to realize and vectorial V2=[1,1-1i ,-1i ,-1+1i,-1i ,-1,1,1+1i, 1i ,-1-1i, 1i,-1] multiply each other, send into cycle accumulor device B3 make cycle accumulor with vector product, cycle accumulor device B3 is made up of totalizer and delay register; By multiplexed chronotron B4 time delay output cyclic convolution result, multiplexed chronotron B4 is made up of delay register and multiplexer again; Z0 and real number cyclic convolution module are exported the directly cumulative y0 of generation of other data, cumulative other output data [y1, y7, y10, y5, y9, y11, y12, y6, y3, y8, y4, y2] that generate of z0 and each plural cyclic convolution result.
Step 4: output RAM-II is double-port random read-write memory, port A is as writing inbound port, and port B is as reading port; RAM-II order receives data, port A be written to correspondence memory position [y0, y1, y7, y10, y5, y9, y11,
Y12, y6, y3, y8, y4, y2], after data write, in port B sense data, by natural order [y0, y1, y2 ... y11, y12] the rear data of output displacement.
Complete computing.
Use systolic array architecture, can make whole arithmetic speed accelerate; Use real number cyclic convolution structure, avoided complex multiplication, and used totalizer network to realize multiplication, can effectively reduce gate circuit number.
IFFT computing is identical with the hardware device that FFT computing adopts, and just in the time of computing, twiddle factor is got to conjugation.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (5)
1. a prime length FFT/IFFT device, is characterized in that: comprising:
Input replacement module, comprising: double-port random read-write memory, adjust the putting in order of N data of input;
Real coefficient cyclic convolution module, comprise: adding network, cycle accumulor device, multiplexed chronotron, totalizer, through the data of described input replacement module adjustment order, input described adding network and multiplication of vectors, product is sent into described cycle accumulor device and totalizer again and done accumulating operation, cycle accumulor result is sent into described multiplexed chronotron adjustment data sequence and is exported according to the order of sequence data; Complete N-1 input data and be not more than the convolution algorithm of 1 real number and the accumulating operation of input data;
Complex coefficient cyclic convolution module, comprise: adding network, cycle accumulor device, multiplexed chronotron, totalizer, through the data of described input replacement module adjustment order, input described adding network and multiplication of vectors, product is sent into described cycle accumulor device and totalizer again and done accumulating operation, cycle accumulor result is sent into described multiplexed chronotron adjustment data sequence and is exported according to the order of sequence data; Complete N-1 input data and ± 1, the accumulating operation of the convolution algorithm of ± i and ± 1 ± i and input data;
Output replacement module, comprising: double-port random read-write memory, and by the result adjustment order through described real coefficient cyclic convolution module and complex coefficient cyclic convolution module calculating gained, a Sequential output N data.
2. prime length FFT/IFFT device according to claim 1, is characterized in that: described adding network comprises shift cells right and additive operation unit.
3. prime length FFT/IFFT device according to claim 1, is characterized in that: described cycle accumulor device comprises totalizer and delay register.
4. prime length FFT/IFFT device according to claim 1, is characterized in that: described multiplexed chronotron comprises delay register and multiplexer.
5. a prime length FFT/IFFT method of using device described in claim 1, is characterized in that:
Comprise the steps:
Step 1: in input replacement module, double-port random read-write memory one end, for writing end, receives data in order, write memory, the double-port random read-write memory other end, for reading end, is adjusted data order to cyclic convolution desired sequence when output data;
Step 2: real number cyclic convolution module receives data in order, when input data and real coefficient product are made accumulating operation, data and real coefficient vector except first input data are made cyclic convolution, and time delay is also exported according to the order of sequence;
Step 3: complex coefficient cyclic convolution module receives data in order, in input in data make accumulating operation, the data except first input data with ± 1, the vector that ± i and ± 1 ± i form is made cyclic convolution, time delay is exported according to the order of sequence;
Step 4: in output replacement module, double-port random read-write memory one end is for writing end, and by the result of calculation write memory of complex coefficient cyclic convolution module, the double-port random read-write memory other end, for reading end, is adjusted data order to natural order and exported.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210444181.8A CN103810144B (en) | 2012-11-08 | 2012-11-08 | A kind of prime length FFT/IFFT method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210444181.8A CN103810144B (en) | 2012-11-08 | 2012-11-08 | A kind of prime length FFT/IFFT method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103810144A true CN103810144A (en) | 2014-05-21 |
CN103810144B CN103810144B (en) | 2018-12-07 |
Family
ID=50706933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210444181.8A Expired - Fee Related CN103810144B (en) | 2012-11-08 | 2012-11-08 | A kind of prime length FFT/IFFT method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103810144B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893327A (en) * | 2016-03-31 | 2016-08-24 | 重庆大学 | Method for quickly computing elasticity deformation of deep groove ball bearing and angular contact ball bearing based on FFT (fast fourier transform) |
CN109871951A (en) * | 2019-03-06 | 2019-06-11 | 苏州浪潮智能科技有限公司 | A kind of deep learning processor and electronic equipment |
CN111626412A (en) * | 2020-05-12 | 2020-09-04 | 浙江大学 | One-dimensional convolution acceleration device and method for complex neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101136891A (en) * | 2007-08-09 | 2008-03-05 | 复旦大学 | 3780-point quick Fourier transformation processor of pipelining structure |
US7398165B1 (en) * | 2007-04-17 | 2008-07-08 | Jiun-Jih Miau | Intelligent signal processor for vortex flowmeter |
CN101667984A (en) * | 2008-09-04 | 2010-03-10 | 上海明波通信技术有限公司 | 3780-point fast Fourier transform processor and computing control method thereof |
CN101763337A (en) * | 2008-12-25 | 2010-06-30 | 上海明波通信技术有限公司 | N-point FFT/IFFT/IFFT/IFFT method and device |
CN102214159A (en) * | 2010-11-11 | 2011-10-12 | 福州大学 | Method for realizing 3780-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and processor thereof |
-
2012
- 2012-11-08 CN CN201210444181.8A patent/CN103810144B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7398165B1 (en) * | 2007-04-17 | 2008-07-08 | Jiun-Jih Miau | Intelligent signal processor for vortex flowmeter |
CN101136891A (en) * | 2007-08-09 | 2008-03-05 | 复旦大学 | 3780-point quick Fourier transformation processor of pipelining structure |
CN101667984A (en) * | 2008-09-04 | 2010-03-10 | 上海明波通信技术有限公司 | 3780-point fast Fourier transform processor and computing control method thereof |
CN101763337A (en) * | 2008-12-25 | 2010-06-30 | 上海明波通信技术有限公司 | N-point FFT/IFFT/IFFT/IFFT method and device |
CN102214159A (en) * | 2010-11-11 | 2011-10-12 | 福州大学 | Method for realizing 3780-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and processor thereof |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893327A (en) * | 2016-03-31 | 2016-08-24 | 重庆大学 | Method for quickly computing elasticity deformation of deep groove ball bearing and angular contact ball bearing based on FFT (fast fourier transform) |
CN105893327B (en) * | 2016-03-31 | 2018-06-05 | 重庆大学 | Deep groove ball bearing and angular contact ball bearing flexible deformation quick calculation method based on FFT |
CN109871951A (en) * | 2019-03-06 | 2019-06-11 | 苏州浪潮智能科技有限公司 | A kind of deep learning processor and electronic equipment |
CN111626412A (en) * | 2020-05-12 | 2020-09-04 | 浙江大学 | One-dimensional convolution acceleration device and method for complex neural network |
CN111626412B (en) * | 2020-05-12 | 2023-10-31 | 浙江大学 | One-dimensional convolution acceleration device and method for complex neural network |
Also Published As
Publication number | Publication date |
---|---|
CN103810144B (en) | 2018-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Blahut | Fast algorithms for signal processing | |
CN101330489A (en) | Processor for FFT / IFFT as well as processing method thereof | |
CN105045766A (en) | Data processing method and processor based on 3072-point fast Fourier transformation | |
CN109525256B (en) | Channelized transmitting structure of narrow-transition-band filter bank based on FPGA | |
CN103810144A (en) | FFT (fast fourier transform)/IFFT (inverse fast fourier transform) method and device for prime length | |
CN104932992B (en) | A kind of flexible retransmission method of the variable Digital Microwave of bandwidth granularity | |
CN111737638A (en) | Data processing method based on Fourier transform and related device | |
CN102624357B (en) | Implementation structure of fractional delay digital filter | |
US20100070551A1 (en) | Fourier transform processing and twiddle factor generation | |
KR102376492B1 (en) | Fast Fourier transform device and method using real valued as input | |
Kurniawan et al. | Multidimensional Householder based high-speed QR decomposition architecture for MIMO receivers | |
Xiaojun et al. | RS encoder design based on FPGA | |
Patil et al. | An area efficient and low power implementation of 2048 point FFT/IFFT processor for mobile WiMAX | |
Bhagat et al. | High‐throughput and compact FFT architectures using the Good–Thomas and Winograd algorithms | |
Jiang et al. | A novel overall in-place in-order prime factor FFT algorithm | |
CN115033840A (en) | Modulation signal processing device and electronic equipment | |
CN103179398A (en) | FPGA (field programmable gate array) implement method for lifting wavelet transform | |
CN111597498B (en) | Frequency spectrum acquisition method based on large-point FFT circuit | |
CN103488611A (en) | FFT (Fast Fourier Transformation) processor based on IEEE802.11.ad protocol | |
CN103152059A (en) | Device and method of generating of constant coefficient matrix of radio sonde (RS) of consultative committee for space data system (CCSDS) | |
KR20120109214A (en) | Fft processor and fft method for ofdm system | |
Pan et al. | Subquadratic space complexity Gaussian normal basis multipliers over GF (2m) based on Dickson–Karatsuba decomposition | |
CN113591022A (en) | Read-write scheduling processing method and device capable of decomposing data | |
CN103023512B (en) | Device and method for generating constant coefficient matrix in ATSC system RS coding | |
Yang et al. | A novel 3780-point FFT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181207 Termination date: 20201108 |