CN205486097U

CN205486097U - FFT device based on FPGA

Info

Publication number: CN205486097U
Application number: CN201620035015.6U
Authority: CN
Inventors: 王纪宁
Original assignee: Putian Information Technology Co Ltd
Current assignee: Putian Information Technology Co Ltd
Priority date: 2016-01-14
Filing date: 2016-01-14
Publication date: 2016-08-17
Anticipated expiration: 2026-01-14

Abstract

The utility model relates to a FFT device based on FPGA, the device includes: - 4 butterfly computation wares in cache module, control module and the base, control module links to each other with cache module and base - 4 butterfly computation ware respectively for control data's input, output are arranged in control data with mode buffer memory to the cache module of table tennis buffer memory, are arranged in control data to accomplish the FFT operation with the mode of cyclic addressing at - 4 butterfly computation wares in the base, cache module is used for the computation result of the data of initial input preceding 34, output back 34 to the mediant certificate is used for preserving, - 4 butterfly computation wares in the base are used for the data of initial input back 14, export preceding 14 computation result. The utility model provides high operational speed uses the total degree of depth that has reduced storage data's RAM under the circumstances that the FFT device is suitable of quantity and induced draft fan - 2 butterfly computation unit at the DSP multiplier.

Description

FFT device based on FPGA

Technical field

This utility model relates to field of digital signals, particularly relates to a kind of FFT based on FPGA Device.

Background technology

In a wireless communication system, commonly used fast Fourier transform FFT is to input time domain letter Number carry out transform analysis, observe frequency-domain waveform, to obtain the frequency domain character of signal.OFDM profit Multi-carrier modulation is replaced with inverse discrete fourier transform and discrete Fourier transform (IDFT/DFT) With the realization of demodulation, i.e. at transmitting terminal, data to be modulated are carried out IFFT computing and realize modulation, The receiving terminal data to receiving carry out FFT computing and realize demodulation, thus greatly reduce system The complexity realized.

FPGA can solve concurrency and speed issue well, and has flexible configuration, easily In features such as upgradings, it it is the conventional method realizing fast Fourier transform FFT.Such as, The Virtex6 family chip of Xilinx, inside FPGA, provide not only multiple referred to as DSP The computing unit of Slices, additionally provides read-write LUT unit, two-port RAM unit.

The soft core of fft algorithm IP within Virtex6 family chip of Xilinx is divided into four kinds at present Pattern, is respectively as follows: the data stream I/O (Pipelined, Streaming I/O) of flowing water, base-4 is dashed forward Send out I/O (Radix-4, Burst I/O), base-2 burst I/O (Radix-2, Burst I/O), base-2Lite Burst I/O (Radix-2Lite, Burst I/O).Pipelined and Burst can be divided into by structure Two kinds, following is a brief introduction of the implementation method of two kinds of structures, as follows:

(1) the data stream I/O of flowing water.

The flowing water that the data stream I/O structure of flowing water processes engine by one group of base-2 butterfly unit comes Realize continuous data to process.Each process engine have memory block store input data and in Between data.

(2) base-4 burst I/O.

For base-4 burst I/O structure, FFT IP kernel processes with base-4 butterfly unit and draws Hold up realization.

For the data stream I/O structure of flowing water, IP kernel is processing current frame data transformation calculations Meanwhile, next frame input data can be loaded and export the transformation results data of former frame, permissible Input data acquisition continuous print result of calculation output after certain computation delay continuously.Input Data are orders, and output data can be inverted order or order.Below as a example by 8 o'clock The FFT device of base-2 butterfly pipeline system is described.

Base-2DIF carried out butterfly computation in units of 2 o'clock, advanced row data before entering computing Caching, makes the top half of input data combine with the latter half.Basic structure is as follows:

If clock cycle one data of caching, i.e. first clock buffer 0, when second Clock caching 1... input data buffer storage ram space is 4, i.e. arrives when the 5th data " 4 " Time, the data " 0 " of caching and " 4 " directly carry out butterfly computation, and need not store data. The frequency domain data of final output follows inverted order arrangement, the FFT of base-2 butterfly pipeline system of 8 Input/output list as shown in table 1:

The input/output list of the FFT of table 1 base-2 butterfly pipeline system

Input (positive sequence)	Decimal scale	Output (inverted order)	Decimal scale
				000	0	000	0
001	1	100	4
				010	2	010	2
011	3	110	6
				100	4	001	1
101	5	101	5
				110	6	011	3
111	7	111	7

According to butterfly diagram, 8 base-2FFT are divided into 3 grades, data cached before computing need space Being 4, the space that mediant needs is respectively 4,2, needs advanced person during last Sequential output data Row caching a little, then address output, spatial cache is 8.Use ram space altogether Being 18, every grade uses a butterfly computation, amounts to and uses 3 butterfly processing elements, it is assumed that 1 One butterfly processing element uses 3 DSP multipliers, then amount to and use 9 DSP multiplication Device.

The data stream I/O of base-2 flowing water utilizes every grade to place butterfly unit and storage intermediate data, Allow data can be carried out continuously fixing point FFT, along with counting of FFT computing is increased, take Resource also with growth, and owing to every grade of computing only uses base-2 butterfly unit, calculate The priority of number is fixing, so when afterbody requires Sequential output, needing extra increasing RAM, table 2 has been added up the FFT device of application base-2 butterfly processing element and has been used scale scaling When pattern processes, store the total depth of the RAM that data take and carry out what computing took The quantity of DSP multiplier.

The stock number that the data stream I/O structure of table 2 base-2 butterfly unit flowing water takies

Utility model content

Technical problem to be solved in the utility model is: existing FFT device is suitable in data The problem that sequence output is low to RAM utilization rate, need more FPGA resource.

For solving above-mentioned technical problem, the utility model proposes a kind of flowing water based on FPGA Wire type FFT device.Should include by pipeline system FFT device based on FPGA:

Cache module, control module and base-4 butterfly computation device；

Described control module is connected with cache module and base-4 butterfly computation device respectively, is used for controlling The input of data, output, cache to cache module in the way of ping-pong buffer for controlling data In, in the way of cyclic addressing, in base-4 butterfly computation device, complete FFT fortune for controlling data Calculate；

Described cache module is the data of 3/4 before initial input, the operation result of 3/4 after output, And be used for preserving intermediate data；

Described base-4 butterfly computation device is the data of 1/4 after initial input, export the fortune of front 1/4 Calculate result.

Alternatively, described cache module is multiple dual port RAM or multiple single port RAM.

Alternatively, the number of described dual port RAM is 7 or 8, by counting certainly of FFT computing Fixed.

Alternatively, the twice that the total depth of the plurality of RAM is counted less than or equal to FFT computing.

Alternatively, the number of described base-4 butterfly computation device is 1 or 2, by FFT computing Count decision.

The FFT device based on FPGA that the utility model proposes, uses radix-4 butterfly computing Device, improves arithmetic speed, uses the mode of cyclic addressing to eliminate intermediate data storage, Extra RAM is need not, in DSP multiplier usage quantity and application during data Sequential output The RAM of storage data is decreased in the case of the FFT device of base-2 butterfly processing element is suitable Total depth, improve the utilization rate to RAM, save the resource of FPGA.

Accompanying drawing explanation

By feature and advantage of the present utility model, accompanying drawing can be more clearly understood from reference to accompanying drawing It is schematic and should not be construed as this utility model is carried out any restriction, in the accompanying drawings:

Fig. 1 is the structural representation of the FFT device of application base-2 butterfly processing element；

Fig. 2 is that the structure of the FFT device based on FPGA of one embodiment of this utility model is shown It is intended to；

Fig. 3 is the schematic diagram of the FFT device based on FPGA of one embodiment of this utility model；

Fig. 4 is the schematic diagram of the FFT method based on FPGA of one embodiment of this utility model.

Detailed description of the invention

Below in conjunction with accompanying drawing, embodiment of the present utility model is described in detail.

Fig. 2 shows the knot of the FFT device based on FPGA of one embodiment of this utility model Structure schematic diagram.

As in figure 2 it is shown, the FFT device based on FPGA of the present embodiment includes:

Cache module 1, control module 2 and base-4 butterfly computation device 3；

Control module 2 is connected with cache module 1 and base-4 butterfly computation device 3 respectively, is used for controlling The input of data, output, cache to cache module in the way of ping-pong buffer for controlling data In 1, in the way of cyclic addressing, in base-4 butterfly computation device 3, complete FFT for controlling data Computing；

Cache module 1 is the data of 3/4 before initial input, the operation result of 3/4 after output, and For preserving intermediate object program；

Base-4 butterfly computation device 2 is the data of 1/4 after initial input, export the computing knot of front 1/4 Really.

The FFT device based on FPGA of the present embodiment, uses radix-4 butterfly arithmetical unit, carries High arithmetic speed, need not extra during data between using the mode of cyclic addressing in storage RAM, need not extra RAM when data Sequential output, uses number at DSP multiplier Storage number is decreased in the case of amount is suitable with the FFT device of application base-2 butterfly processing element According to the total depth of RAM, improve the utilization rate to RAM, save the money of FPGA Source.

In the optional embodiment of one, described cache module is multiple dual port RAM or many Individual single port RAM.In FFT device based on FPGA, cache module is that dual port RAM is permissible Reach to use the less effect of number of RAM.

The number of described dual port RAM is 7 or 8, by the decision of counting of FFT computing.

The twice that the total depth of the plurality of RAM is counted less than or equal to FFT computing.

The number of described base-4 butterfly computation device is 1 or 2, by the decision of counting of FFT computing.

Fig. 3 is the schematic diagram of the FFT device based on FPGA of one embodiment of this utility model. As it is shown on figure 3, this FFT device includes some dual port RAMs and butterfly computation device and selector, Wherein count 2 times of the total RAM degree of depth up to FFT, width is data width.Butterfly computation 2 base-4 butterfly computation devices are at most set, every 8 block RAMs can within a cycle parallel output 8 Data, can make full use of two radix-4 butterflyunits, improve arithmetic speed.

Fig. 4 is the method schematic diagram of the FFT based on FPGA of one embodiment of utility model.As Shown in Fig. 4, use the FFT method of FFT device based on FPGA as above, including:

S41: sequentially input the first frame data, after completing 1 grade of butterfly computation of the first frame data, Use ping-pong buffer to sequentially input the second frame data, and complete the M level butterfly fortune of the first frame data Calculate；

S42: complete the Sequential output of the butterfly computation result of the first frame data, carries out simultaneously The caching of two frame data and butterfly computation；

S43: complete the M level butterfly computation of the second frame data, uses ping-pong buffer to carry out simultaneously The caching of the 3rd frame data, and proceed by 1 grade of butterfly computation of the 3rd frame data；

S44: constantly repeat the caching of data, butterfly computation and result output procedure, complete many The butterfly computation of frame data；

Wherein, M is the progression of butterfly computation, and N is counting of FFT computing, N=4^M；Data Read and storage uses cyclic addressing mode.

Further, described in sequentially input the first frame data, complete 1 grade of butterfly of the first frame data After shape computing, use ping-pong buffer to sequentially input the second frame data, and complete the first frame data M level butterfly computation；Complete the Sequential output of the butterfly computation result of the first frame data, enter simultaneously Caching and the butterfly computation of row the second frame data include:

Sequentially input first frame data of front 3/4 to the Part I of cache module, when after 1/4 When first frame data arrive base-4 butterfly computation device, the direct and cache module according to butterfly computation figure In data carry out butterfly computation, and the result of 1 grade of butterfly computation is preserved to cache module Part I；

Complete the M level butterfly computation of the first frame data, base-4 butterfly computation device Sequential output first Front the 1/4 of the butterfly computation result of frame data, the operation result of rear 3/4 preserves to cache module Part I；Ping-pong buffer is used to sequentially input second frame data of front 3/4 to cache module Part II, when second frame data of rear 1/4 arrive base-4 butterfly computation device, transports according to butterfly Nomogram data directly and in cache module carry out butterfly computation；

After the butterfly computation result of Part I Sequential output first frame data of cache module 3/4；

Correspondingly, the digital independent of described cache module and storage use cyclic addressing mode.

Illustrate that the table tennis in this FFT method based on FPGA delays with a specific example below Deposit process.

If it is 4096 points that a frame serial data carries out counting of FFT computing, use base-4DIF Computing, the RAM of use is that RAM1-14 in Fig. 3 is (it should be noted that in Fig. 3 RAM1-14 be single port RAM, the process of following ping-pong buffer is also with single port RAM As a example by illustrate；Counting for FFT is the computing of 4096, it is possible to use 7 Individual dual port RAM, its process and operation principle are similar with use single port RAM), its process As follows:

(1) caching input serial data frame 0, spatial cache is set to computing and counts 3/4, i.e. 4096*0.75=3072, be i.e. cached to RAM6.

(2) when the 3073rd data arrive, according to butterfly computation figure, directly with before In caching RAM, the 1st, 1025,2049 data carry out base-4 butterfly computation.And will meter Calculation result is stored in and caches to RAM1～RAM8.

(3) when the 3074th data arrive, according to butterfly computation figure, directly with caching In the 2nd, 1026,2050 data carry out base-4 butterfly computation, and result of calculation is stored in Cache to RAM1～RAM8.

(4) when the 3075th data arrive, according to butterfly computation figure, directly with caching In the 3rd, 1027,2051 data carry out base-4 butterfly computation, and result of calculation is stored in Cache to RAM1～RAM8.

When the 3076th data arrive ....

When the 4096th data arrive, according to butterfly computation figure, directly with caching in the 1024,2048,3072 data carry out base-4 butterfly computation, and result of calculation are stored in slow Deposit in RAM.Now complete all butterfly computations of the 1st grade.Complete 1 grade of computing The caching RAM that data are stored in is RAM1～RAM8.

(5) caching next frame input data frame 1, spatial cache is opened from RAM9 Begin, in these 1024 clock cycle, it is possible to use 2 radix-4 butterflyunits are to 1～6 Data in caching RAM proceed to process, and now 1 clock cycle reads buffer Interior 8 point data carry out butterfly computation, complete 1024*8=8192 point within 1024 cycles altogether, I.e. 8192/4096=2 level butterfly computation.Now data complete for 3 grades of computings are still stored back to RAM1～RAM8, it is achieved former address computing.

(6) continuing to cache frame 1 data, spatial cache is RAM11 and RAM12, In these 1024 clock cycle, it is possible to use 2 radix-4 butterflyunits are to 1～6 cachings Data in RAM proceed to process, and now 1 clock cycle reads 8 points in buffer Data carry out butterfly computation, complete 1024*8=8192 point altogether, i.e. within 1024 cycles 8192/4096=2 level butterfly computation.Now data complete for 5 grades of computings are still stored back to RAM1～RAM8, it is achieved former address computing.

(7) continuing to cache frame 1 data, spatial cache is RAM13 and RAM14, In these 1024 clock cycle, it is possible to use 1 radix-4 butterflyunit delays 7～14 Depositing the data in RAM to proceed to process, now 1 clock cycle reads in buffer 4 Point data carries out butterfly computation, completes 1024*4=4096 point altogether, i.e. within 1024 cycles 4096/4096=1 level butterfly computation.Now data complete for 6 grades of computings are still stored back to RAM1～RAM8, it is achieved former address computing.Owing to having been completed the computing of afterbody, During calculating, the result after 6 grades of computings directly can be exported, when all calculating are complete Cheng Shi, result output 1/4.

(8) frame 1 is carried out 1 grade of computing, operation result is stored in RAM1～2, RAM9～14, the operation result of the previous frame frame 0 of RAM3 output simultaneously.

(9) to RAM3,4 cache, data cached for next frame frame 2 data, with Time RAM5 output previous frame operation result, these two butterfly computations of time frame 1 data separate Device completes 3 grades of butterfly computations.

(10) to RAM5,6 cache, simultaneously the output of RAM7 start frame 0 data, Frame 1 data complete 5 grades of computings.

(11) to RAM7,8 caching, frame 1 data complete 6 grades of computings and export.

Further, described cyclic addressing mode includes:

Carry out 1 grade of butterfly computation, 1 grade of butterfly computation result is preserved according to the mode of cyclic addressing In cache module；

Carry out the butterfly computation of intergrade, read the number in cache module according to cyclic addressing mode According to, intergrade butterfly computation result is saved in cache module according to the mode of cyclic addressing；

Carry out the butterfly computation of afterbody, according to cyclic addressing mode, butterfly computation result is protected Deposit to cache module, the data being successively read in cache module Sequential output butterfly computation knot Really.

Specifically, described in carry out 1 grade of butterfly computation, 1 grade of butterfly computation result is sought according to circulation The mode of location is saved in cache module and includes:

Carry out 1 grade of butterfly computation, 1 grade of butterfly computation result is divided into 16 groups, by described 16 groups of butterflies The 0-3 group data of shape operation result are sequentially stored into a RAM, the 2nd RAM, the 3rd RAM With the 4th RAM；The 4-7 group data of described 16 groups of butterfly computation results are sequentially stored into second RAM, the 3rd RAM, a 4th RAM and RAM；By described 16 groups of butterfly computation results 8-11 group data be sequentially stored into the 3rd RAM, the 4th RAM, a RAM and second RAM；The 12-15 group data of described 16 groups of butterfly computation results are sequentially stored into the 4th RAM, a RAM, the 2nd RAM and the 3rd RAM.

Specifically, described in carry out the butterfly computation of intergrade, read slow according to cyclic addressing mode Data in storing module, are saved in intergrade butterfly computation result according to the mode of cyclic addressing Cache module includes:

Carry out the butterfly computation of intergrade, read the number in cache module according to cyclic addressing mode According to being input to the first port of base-4 butterfly computation device, the second port, the 3rd port and the 4th end Mouthful；The data read in cache module according to cyclic addressing mode are input to base-4 butterfly computation device The second port, the 3rd port, the 4th port and the first port；Read according to cyclic addressing mode Take the data in cache module be input to the 3rd port of base-4 butterfly computation device, the 4th port, First port and the second port；The data reading cache module according to cyclic addressing mode are input to 4th port, the first port, the second port and the 3rd port of base-4 butterfly computation device；

Wherein, a length of the 1/4 of each conversion inputs mouth^M×N；

The butterfly computation result of each intergrade is divided into 16 groups, protects according to the mode of cyclic addressing Exist in cache module.

Specifically, the butterfly computation of afterbody is carried out, according to cyclic addressing mode by butterfly described in Shape operation result preserves to cache module, and the data being successively read in cache module order are defeated Go out butterfly computation result to include:

Carry out afterbody butterfly computation, the data of the first port in base-4 butterfly computation device are protected Deposit to a RAM, the data of the second port in base-4 butterfly computation device are preserved to the 3rd The data of the 3rd port in base-4 butterfly computation device are preserved to the 2nd RAM, by base by RAM In-4 butterfly computation devices, the data of the 4th port preserve to the 4th RAM；

Wherein, described cache module has carried out multi-stage data division, until often organizing the number of data It is 1；

The data being successively read in cache module Sequential output butterfly computation result.

With a specific example, the cyclic addressing in FFT method based on FPGA is described below Process.(this time introduce is the method using a butterfly computation device, with two butterfly fortune Calculate device method consistent)

(1) carry out being sequentially input in RAM by N point data, until 3/4 data are input to After RAM, start 1 grade of addressing and calculate.

(2) 1 grades of addressing: RAM1～3 sequential reads out data also according to address 0～(1/4*N-1) As front 3 inputs of butterfly computation device, the 4th input of butterfly computation is for directly to come Data.By the 0th～(1/16*N-1) of butterfly computation device output port 1～4 after calculating Individual data are sequentially stored into RAM1,2,3,4, sequence number (1/16*N)～(1/8*N-1) Being sequentially stored into RAM2,3,4,1, sequence number (1/8*N)～(3/16*N-1) are sequentially stored into RAM3,4,1,2, sequence number (3/16*N)～(1/4*N-1) be sequentially stored into RAM4,1, 2、3。

(3) 2 grades of addressing: RAM1 read addresses 0～(1/16*N-1), (1/16*N)～(1/8*N-1), (1/8*N)～(3/16*N-1), (3/16*N)～(1/4*N-1) data respectively as butterfly The data of shape carrier input port 1,2,3,4.RAM2 reads address simultaneously (1/16*N)～(1/8*N-1), (1/8*N)～(3/16*N-1), (3/16*N)～(1/4*N-1), 0～the data of (1/16*N-1) the number as butterfly carrier input port 2,3,4,1 According to.RAM3 read address (1/8*N)～(3/16*N-1), (3/16*N)～(1/4*N-1), 0～(1/16*N-1), (1/16*N)～(1/8*N-1) data and as butterfly computation device input The data of port 3,4,1,2.RAM4 reading address (3/16*N)～(1/4*N-1), 0～(1/16*N-1), (1/16*N)～(1/8*N-1), (1/8*N)～(3/16*N-1) number According to and as the data of butterfly carrier input port 4,1,2,3.By butterfly after calculating The 0th～(1/64*N-1) individual data of shape output port arithmetical unit 1～4 be sequentially stored into RAM1, 2,3,4, sequence number (1/64*N)～(1/32*N-1) are sequentially stored into RAM2,3,4,1, Sequence number (1/32*N)～(3/64*N-1) are sequentially stored into RAM3,4,1,2, sequence number (3/64*N) ～(1/16*N-1) is sequentially stored into RAM4,1,2,3.Equally remaining number is done identical Operation, i.e. sequence number (1/16*N)～(5/64*N-1) data be sequentially stored into RAM1,2,3, 4, sequence number (5/64*N)～(6/64*N-1) are sequentially stored into RAM2,3,4,1, sequence number (6/64*N)～(7/64*N-1) is sequentially stored into RAM3,4,1,2, sequence number (7/64*N)～(8/64*N-1) be sequentially stored into RAM4,1,2,3....

(4) 3 grades of addressing: RAM1 read addresses 0～(1/64*N-1), (1/64*N)～(2/64*N-1), (2/64*N)～(3/64*N-1), (3/64*N)～(4/64*N-1) data and respectively as The data of butterfly carrier input port 1,2,3,4.RAM2 reads address (4/64*N) simultaneously ～(5/64*N-1), (5/64*N)～(6/64*N-1), (6/64*N)～(7/64*N-1), (7/64*N)～(8/64*N-1) data and respectively as butterfly carrier input port 2,3, 4, the data of 1.RAM3 read address (8/64*N)～(9/64*N-1), (9/64*N)～ (10/64*N-1), (10/64*N)～(11/64*N-1), (11/64*N)～(12/64*N-1) Data the data respectively as butterfly carrier input port 3,4,1,2.RAM4 reads Address (12/64*N)～(13/64*N-1), (13/64*N)～(14/64*N-1), (14/64*N) ～(15/64*N-1), (15/64*N)～(16/64*N-1) data and respectively as butterfly transport The data of defeated device input port 3,4,1,2.Same do remaining address date is grasped equally Make.By individual to the 0th～(1/256*N-1) of butterfly computation device output port 1～4 after calculating Data are sequentially stored into RAM1,2,3,4, sequence number (1/256*N)～(2/256*N-1) Being sequentially stored into RAM2,3,4,1, sequence number (2/256*N)～(3/256*N-1) are successively Being stored in RAM3,4,1,2, sequence number (3/256*N)～(4/256*N-1) are sequentially stored into RAM4、1、2、3.Equally remaining number is done same operation, i.e. sequence number (4/256*N) ～(5/256*N-1) data are sequentially stored into RAM1,2,3,4, sequence number (5/256*N)～(6/256*N-1) is sequentially stored into RAM2,3,4,1, sequence number (6/256*N)～(7/256*N-1) is sequentially stored into RAM3,4,1,2, sequence number (7/256*N)～(8/256*N-1) be sequentially stored into RAM4,1,2,3....

(5) 4,5,6,7 grades of addressing ....

(6) afterbody addressing: first RAM1 is successively read address 0,2/16*N, 3/16*N, 1/16*N, the data of output input, simultaneously as the port 1 of butterfly computation device RAM2 is successively read address (a+a1..), (2/16*N+a+a1..), (3/16*N+a+a1..), (1/16*N+a+a1..), the data of output input as the port 2 of butterfly computation device, right RAM3 be successively read address 2* (a+a1...), [2/16*N+2* (a+a1...)], [3/16*N+2* (a+a1...)], [1/16*N+2* (a+a1...)], the data of output are as butterfly computation The port 3 of device inputs, RAM4 is successively read address 3* (a+a1...), [2/16*N+3* (a+a1...)], [3/16*N+3* (a+a1...)], [1/16*N+3* (a+a1...)], defeated The data gone out input as the port 4 of butterfly computation device.By direct for the result of calculation of port 1 As final output data output, the data former address of port 2 is stored in RAM3, by port The data former address of 3 is stored in RAM2, the data former address of port 4 is stored in RAM4, connects down Continue RAM1 is carried out read operation, reading address is 2/64*N, (2/16*N+2/64*N), (3/16*N+2/64*N), (1/16*N+2/64*N), to RAM2 read address be (2/64*N+a+a1...)、[(2/16*N+2/64*N)+a+a1...]、 [(3/16*N+2/64*N)+a+a1...], [(1/16*N+2/64*N)+a+a1...], read RAM3 Take address for [2/64*N+2* (a+a1...)], [(2/16*N+2/64*N)+2* (a+a1...)], [(3/16*N+2/64*N)+2* (a+a1...)], [(1/16*N+2/64*N)+2* (a+a1...)] are right RAM4 read address be [2/64*N+3* (a+a1...)], [(2/16*N+2/64*N)+3*(a+a1...)]、[(3/16*N+2/64*N)+3*(a+a1...)]、 [(1/16*N+2/64*N)+3* (a+a1...)], the data of same output are as the end of butterfly computation device Mouth 4 input.By the result of calculation of port 1 directly as final output data output, by end The data former address of mouth 2 is stored in RAM3, and the data former address of port 3 is stored in RAM2, will The data former address of port 4 is stored in RAM4... wherein a, and a1... represents level, if afterbody is 3 Level computing, i.e. 64 points, then a=4, a1=1.If afterbody is 4 grades of computings, i.e. 256 points, then a=16, a1=4, a2=1.If afterbody is M level computing, i.e. 4^M Point, then a=4^M/16, a1=4^M/64...aM-1=1.

(7) after completing afterbody computing, the data of 1/4*N Sequential output is complete, Next the data of RAM2～4 it are sequentially output.

Sum up addressing it is seen that:

The data of the first order can be sequential read out from each RAM and are sequentially input to butterfly 4 ports of computing carry out computing.The output data of computing are divided into 16 groups, and (each butterfly is transported Calculating device and export 4 groups of data simultaneously, each butterfly computation device output port produces 4 groups of data), It is sequentially stored into RAM1,2,3,4, RAM2,3,4,1, RAM3,4,1,2, RAM4, 1, in 2,3.

During to the addressing data of intergrade, RAM1 starts reading out from address 0 all the time, and The data of reading are separately input to the input port 1,2,3,4 of butterfly computation device, successively Circulation.Every time the data length of conversion inputs mouth is followed successively by 1 grade of 1/4*N, 2 grades 1/16*N..M level 1/4^M*N, wherein N=4^M.The initial address of reading of RAM2 is a1+a2..., If 1 grade of calculating, a1=1/4*N, a2, a3...=0, if 2 grades of computings, a1=1 / 4*N, a2=1/16*N, a3...=0, if M level calculates, A1=1/4*N, a2=1/16*N...aM=1/4^M*N, sequentially read address afterwards, and address is being read Return to address 0 during maximum and continue addressing it is known that complete the one of whole address spatial depth Secondary circulation.RAM3,4 initial addresses of reading are respectively 2 (a1+a2...) and 3 (a1+a2...), its It operates with RAM2.When butterfly computation is complete start to write time, with read address location consistent, Realize former address storage, it should be noted that every 4 groups of each butterfly computation device output port Data need to be placed in different RAM, and the data length often organized defines according to progression, 1 grade Computing is output as 1/16*N, and 2 grades of computings are output as the data storage position of 1/64*N.... port 1 Putting and be followed successively by RAM1,2,3,4 circulation, port 2 is RAM2,3,4,1 circulation, Port 3 is RAM3,4,1,2 circulation, and port 4 is RAM4,1,2,3 circulation.

During to the addressing data of last 1 grade, according to radix-4 butterfly arithmograph finally output order Feature, addressing rule as follows: the first step, the number in RAM is divided into 4 according to address Group, referred to as 1 grade group.The degree of depth often organized is 1/16*N, numbered group 1～4, then RAM1 According to 0,2/16*N, 3/16*N, 1/16*N be addressed, i.e. order be group 1, organize 3, Group 2, the first address of group 4, plus a1+a2... on the basis of other RAM at this point location. Second step, divides 4 groups again to each 1 grade of group, referred to as 2 grades groups, and the degree of depth often organized is 1/64*N, Numbered group 1～4, then RAM1 is according to group 1, group 3, group 2, the addressing of address of group 4, Other RAM2 is on this basis plus a1+a2....(to secondary groups in the first step 1 addressing) the 3rd step, divide 4 groups again to each 2 grades of groups .... until the number often organizing data is 1 Stop packet.After butterfly computation, the data of output port 1 are fed directly to total module Output port, the data of port 2 are stored in RAM3, and the data of port 3 are stored in RAM2 In, the data of port 4 are stored in RAM4.After butterfly computation, the number of 1/4*N Complete according to output, it is successively read the data output of RAM2～4 the most in order.

According to above procedure, the FFT method based on FPGA of the present embodiment uses radix-4 butterfly Arithmetical unit, improve arithmetic speed, use ping-pong buffer, the mode of cyclic addressing to achieve The in-place computation of data, in storage between data time need not extra RAM, suitable in data Extra RAM is need not, (contrast table 2 and table 3 understand) base-2 butterfly fortune during sequence output Calculate the FFT device of unit quite in the case of decrease the total depth of RAM of storage data, Improve the utilization rate to RAM, save the resource of FPGA.Storage added up by table 3 The RAM total depth that data take and the quantity carrying out the DSP multiplier that computing takies, its The bit wide of middle storage RAM is data bit width, as follows:

The stock number that the FFT device of base-4 butterfly computation device takies applied by table 3

Those skilled in the art it should be appreciated that embodiment of the present utility model can be system, Or computer program.Therefore, device of the present utility model can use complete hardware embodiment Form.This utility model is the stream with reference to the equipment (system) according to this utility model embodiment Journey figure and/or block diagram describe.Although having been described for preferred embodiment of the present utility model, But those skilled in the art once know basic creative concept, then can implement these Example makes other change and amendment.So, claims are intended to be construed to include preferably Embodiment and fall into all changes and the amendment of this utility model scope.

The FFT device based on FPGA that the utility model proposes, uses radix-4 butterfly computing Device, improves arithmetic speed, need not between using the mode of cyclic addressing in storage during data Extra RAM, need not extra RAM when data Sequential output, at DSP multiplication Reduce in the case of device usage quantity is suitable with the FFT device of application base-2 butterfly processing element The total depth of the RAM of storage data, improves the utilization rate to RAM, saves The resource of FPGA.

Although be described in conjunction with the accompanying embodiment of the present utility model, but people in the art Member can make in the case of without departing from spirit and scope of the present utility model various amendment and Modification, within the scope of such amendment and modification each fall within and are defined by the appended claims.

Claims

1. a FFT device based on FPGA, it is characterised in that including:

Cache module (1), control module (2) and base-4 butterfly computation device (3)；

Described control module (2) respectively with cache module (1) and base-4 butterfly computation device (3) It is connected, for controlling the input of data, output, for controlling data in the way of ping-pong buffer Cache to cache module (1), for controlling data in the way of cyclic addressing base-4 butterfly Shape arithmetical unit (3) completes FFT computing；

Described cache module (1) is the data of 3/4, the computing of 3/4 after output before initial input As a result, and be used for preserving intermediate data；

Described base-4 butterfly computation device (3) is the data of 1/4 after initial input, export front 1/4 Operation result.

FFT device based on FPGA the most according to claim 1, it is characterised in that Described cache module (1) is multiple dual port RAM or multiple single port RAM.

FFT device based on FPGA the most according to claim 2, it is characterised in that The number of described dual port RAM is 7 or 8, by the decision of counting of FFT computing.

FFT device based on FPGA the most according to claim 2, it is characterised in that

FFT device based on FPGA the most according to claim 1, it is characterised in that The number of described base-4 butterfly computation device (3) is 1 or 2, by the decision of counting of FFT computing.