A kind of point-variable real-time FFT processing chip
Technical field
The invention belongs to the signal Processing field, relate to a kind of signal processor chip, particularly relate to a kind of real-time 2
nThe variable fast Foourier transform processor chip of point (n≤10).
Background technology
Discrete Fourier transform (DFT) (DFT) plays central role as signal is transformed into the basic tool of frequency domain from time domain in various digital signal processing.Its fast algorithm FFT has a wide range of applications in radio communication, speech recognition, Flame Image Process, digital filtering and spectrum analysis field.In actual applications, usually fft processor has been proposed to calculate in real time, accurately the requirement of different point-number sequence.Because the Cooley-Tukey algorithm has the former address computing, be easy to hardware and realize, therefore the pipeline organization fft processor of realizing based on this algorithm is obtaining using widely in the application specific processor in real time.
The realization of FFT is the problem that people endeavour to solve always.In the past, the system that carries out FFT can adopt general processor or independent digital signal processor (DSP) to carry out the FFT computing.In recent years, because the fast development of field programmable gate array (FPGA), increasing FFT handles and adopts FPGA to realize.But realize that with FPGA FFT handles some shortcomings part in addition: the one, because the intrinsic programmable structure of FPGA makes FPGA realize that the speed of FFT is difficult to further raising; The 2nd, FPGA realizes that FFT institute power consumed is bigger; The 3rd, the confidentiality of FPGA is not as special IC (ASIC); The 4th, realize that with FPGA FFT is not suitable for volume production; The 5th, fpga chip is a general-purpose chip, and pin is many, and peripheral circuit is many, signal processing system complexity height.Now people seek simple in structure, fast operation, FFT computing method that memory space is little on the one hand, adopt advanced VLSI technology to realize the hardware configuration of FFT on the other hand, with hardware algorithmization, be developed into asic chip, improved the handling property of FFT greatly, simplify the design effort of signal processing system, be fit to the armament systems of typing.
Yet, realize that on ASIC the adjustable length real-time FFT of list entries faces following problem:
(1) die area in the asic chip is limited, so the function that realizes on ASIC need all be optimized aspect area, speed and the power consumption, to improve the function of ASIC overall design.When guaranteeing to carry out the FFT conversion rate, the stock number that is exclusively used in FFT should minimize, and therefore needs to optimize the structure that realizes FFT in the ASIC integrated circuit;
When (2) realizing that on asic chip FFT uses, have and to reconfigure the function that FFT counts in real time, so that adapt to the different application demand;
(3) owing to have finite word length effect in the FFT computing, the intermediate data of computing has the bottleneck problem of overflowing, and has influence on result's accuracy, therefore must adopt scaling algorithm to prevent to overflow, and improves the signal Processing precision simultaneously.
Many running point FFT implementation methods are arranged at present, document " the ASIC design of real-time reconfigurable fft processor " (Beijing Institute of Technology's journal, 2006 04 phases) mentioned a kind of running point pipeline FFT processor that can carry out 4 points, 16 points, 64 points, 256 or 1024 FFT computings, but it can not carry out 2
nRunning point FFT handles, and the not compression of middle twiddle factor, chip occupying area.Document " design and the realization of variable 2n point pipeline FFT processor " (Beijing Institute of Technology's journal, 2005 03 phases) has been mentioned and a kind ofly can have been calculated 2 continuously
nThe pipeline organization processor of point sequence of complex numbers FFT, but it realizes the pipeline FFT processing because of adopting the conventional stage linked method, and chip occupying area is big, is not suitable for ASIC and realizes.
Summary of the invention
The objective of the invention is in order to overcome the defective of prior art, solve and how to handle 2 in real time with the smallest chip area
nThe problem of the variable FFT computing of point (n≤10) proposes a kind of point-variable real-time FFT processing chip.
The present invention based on know-why as follows:
(Cooley-Tu Ji) fft algorithm has the characteristics of former address computing to Cooley-Tukey fast, and module is convenient to recycling when hardware design, is easy to realize parallel processing, has therefore obtained using widely.That often adopts has base-2 or base-4 Cooley-Tukey algorithm, and in the comparison of base-2 and the realization of base-4 hardware algorithms, base-4 computings can make the necessary communication between storer and arithmetic unit reduce half, concurrency is than base-2 height, and can improve numerical precision significantly, so can preferentially adopt base-4 Cooley-Tukey algorithm.But, although base-4 operand the operand of base-2 reduce to some extent, it has the deficiency of self: the FFT computing of base-4 can only handle 4
nCount, and the algorithm of base-2 can handle any 2
nCount.
If adopt base-4 and base-2 incorporation times to extract fast algorithm, so not only can realize higher operation efficiency, handle count configurable, and can save resource, be convenient to ASIC and realize.
The FFT expression formula of N sample point is
(formula 1)
Suppose N=r
1* r
2, n can be expressed as:
n=n
1r
2+n
0(n
1=0,1,…,r
1-1;n
0=0,1,…r
2-1)
K is expressed as:
k=k
1r
1+k
0(k
1=0,1,…,r
2-1;k
0=0,1,…r
1-1)
x(n)=x(n
1r
2+n
0)=x(n
1,n
0)
Make X (k)=X (k
1r
1+ k
0)=X (k
1, k
0)
Then formula 1 can be expressed as:
(formula 2)
Formula 2 shows that calculation combination is counted N=r
1* r
2Point FFT is equivalent to and obtains r earlier
2Group r
1The FFT of point ,-after its result multiply by twiddle factor, calculate r again
1Group r
2The FFT of point.Therefore big N point FFT just is converted into less r in the formula 1
1Point FFT and r
2Point FFT realizes.r
1Point FFT and r
2Data and the twiddle factor memory space of point FFT will be far smaller than a N point FFT.Formula 2 has also been expressed independent calculating r simultaneously
1Point FFT and r
2The method of point FFT.Carrying out r
1During point FFT computing, only need the computing shielding that [] is outer to get final product.In like manner, carrying out r
2During point FFT computing, only need the shielding of the computing in { } is got final product.
N is 1024 points to the maximum, gets r when carrying out two-dimensional process
1=64 points, r
2=16 points.And, make that first dimension of Two-dimensional FFT in handling is that or are variable at 16 at 64 in order to reach the purpose of point-variable, and that second dimension is 2 points, 4 points, 8 or 16 is variable, the two dimension combination realizes 2
nRunning point FFT.
According to above principle, the technical solution adopted in the present invention is:
A kind of point-variable real-time FFT processing chip, be made up of following module: control module, output buffer module are selected in input buffering module, 64 base-4FFT processing modules, twiddle factor processing module, intermediate sequence memory module, 16 base-2FFT processing modules, unit.
Input buffering module: be used to finish maximum 1024 the input metadata cache and the addressing of data.
64 base-4FFT processing modules: be used for or 64 variable FFT computings at 16.This module employing base-4FFT cascade pipeline organization realizes the running point operation by second level base-4 arithmetic element being increased output select circuit, and 64 FFT processing employing block floating point calibration modes, can obtain higher processing accuracy.
The twiddle factor processing module: be used to finish the middle twiddle factor generation of Two-dimensional FFT processing and multiply each other, middle twiddle factor produces and multiplies each other all based on cordic algorithm.This module comprises: binary counter, circulating register and CORDIC processor.
The intermediate sequence memory module: be used to store 64 bases-4FFT processing module result, and with its input as 16 bases-2FFT processing module.This module constitutes " table tennis " storage organization by two 1024 point data storeies.
16 base-2FFT processing modules: be used for 2 points, 4 points, or 16 variable FFT computings at 8.This module employing base-2FFT cascade pipeline organization realizes the point-variable operation by each grade base-2 arithmetic element being increased output select circuit, and 16 FFT processing employing block floating point calibration modes, can obtain higher processing accuracy.
Output buffer module: be used to finish storage of output result data and output at maximum 1024.
Select and control module: be used for generation module and select and various control signals, entire chip is controlled.
The annexation of above-mentioned composition intermodule is as follows:
Input buffering module, 64 base-4FFT processing modules, twiddle factor processing module, intermediate sequence memory module, 16 base-2FFT processing modules, output buffer modules link to each other successively, and the unit selects control module to link to each other with each module.
The conversion of signals of above-mentioned intermodule is closed:
At first, input data (maximum is no more than 1024 points) are carried out the addressing of buffer memory and data in the input buffering module.Then, will import data and import 64 bases-4FFT processing module into and carry out the variable FFT of or at 16 at 64 and handle, obtain the first dimension FFT result.The first dimension reason result of place is sent in the twiddle factor processing module, finish multiplying each other of sequence and twiddle factor, obtain the product data.Afterwards, the product data are sent into the intermediate sequence memory module carry out buffer memory, after treating intact all the product data of buffer memory, give 16 bases-2FFT processing module according to the requirement of 16 bases-2FFT processing module with the product data and carry out 16 variable FFT processing, obtain the Two-dimensional FFT result.At last, the Two-dimensional FFT result is sent into output put in order in the buffer module, form final output result.
Beneficial effect
A kind of point-variable real-time FFT processing chip that the present invention proposes, the contrast prior art:
(1) stream treatment list entries data in real time, and real-time update output data;
(2) chip area is little, and cost is low;
(3) owing to the minimizing of storage resources, to the operation minimizing of storer, accelerate operation time therefore of the present invention;
(4) can realize that the variable FFT of 2n point handles;
(5) the present invention has adopted the block floating point calibration mode, and the precision of FFT operation result is higher.
Description of drawings
Fig. 1 running point fft processor structured flowchart;
16 or 64 variable base-4FFT pipeline organizations of Fig. 2;
Fig. 3 is used for the cordic algorithm structure that the FFT twiddle factor produces;
16 variable base-2FFT pipeline organizations of Fig. 4;
Fig. 5 FFT chip physical layout and package pins configuration;
Fig. 6 FFT chip material object.
Embodiment
Below in conjunction with accompanying drawing preferred implementation of the present invention is elaborated.
A kind of point-variable real-time FFT processing chip, comprise with the lower part: control module, output buffer module are selected in input buffering module, 64 base-4FFT processing modules, twiddle factor processing module, intermediate data storage module, 16 base-2FFT processing modules, unit, as shown in Figure 1.
The input buffering module
This module is used to finish maximum 1024 the input metadata cache and the addressing of data.Data input cell generates the peek address according to the data preparation.The input data are read in from the storer of chip exterior, and carry out the length of the data sequence of FFT calculating as required, generate the respective memory address, and finish the data inverted order and handle.Afterwards, the input data are sent in 64 bases-4FFT processing module.
64 base-4FFT processing modules
The structural drawing of this module is used for or 64 variable FFT processing at 16 as shown in Figure 2.This module adopts base-4FFT algorithm to realize, adopts the cascade pipeline organization.For base-4FFT, 64 FFT handle and need three grades, and every grade of inputoutput data storage adopts " table tennis " storage to reach the data stream water treatment.Wherein, second level base-4 elementary cell is increased output select circuit, if when finishing 16 FFT, select and control module control by the unit, data are finished 16 FFT computings by the output of second level base-4 elementary cell.The result of 64 base-4FFT processing modules is sent in the CORDIC processor of twiddle factor processing module.
This module adopts the block floating point calibration mode in every grade of computing, by the emulation contrast, adopt the system of block floating point algorithm poorer slightly than the system performance that adopts floating-point, but is far superior to adopt the system of fixed-point arithmetic.The block floating point computing requires to expand 3 bit sign positions in every grade of butterfly computation, carries out preceding 4 inspection of effective most significant digit butterfly computation after, determines that the next stage butterfly computation imports the required figure place that moves to right, and does not overflow in the next stage butterfly computation guaranteeing.The required figure place that moves to right is totally got off, to determine the scale factor or the power exponent of end product.
The twiddle factor processing module
Result according to 2,64 bases of formula-4FFT processing module need multiply by twiddle factor
, carry out buffer memory again.
The structure of twiddle factor processing module comprises as shown in Figure 3: binary counter, circulating register and CORDIC processor.
Binary counter is used to import the generation and the counting of data line sequence number and row sequence number, and circulating register is used for determining the anglec of rotation of twiddle factor.Binary counter and circulating register are united the generation that is used for twiddle factor, and twiddle factor is sent in the CORDIC processor.
The CORDIC processor is used to finish that data based on cordic algorithm add, subtraction and displacement, and the result of finishing twiddle factor and 64 base-4FFT processing modules thus multiplies each other.The CORDIC processor is sent the product data into the intermediate sequence memory module.
Annexation is: binary counter, circulating register and CORDIC processor are connected successively.
The implementation method that the twiddle factor processing module is traditional is to comprise a storer and a complex multiplier of depositing twiddle factor, carrying out difference when counting the FFT computing, value when the twiddle factor storer need be stored maximum number of points, other value can extract according to different situations.Traditional middle twiddle factor is handled implementation method and is taken more resources of chip, is not suitable for ASIC and realizes.In the present invention, combine the processing that cordic algorithm carries out middle twiddle factor.CORDIC applies to calculate the computing of trigonometric function, hyperbolic function and some other basic functions, adopts iterative idea, does not need multiplying and extra storage space, and this algorithm can reach higher precision simultaneously.
The ultimate principle of cordic algorithm is to obtain institute behind the initial vector anglec of rotation θ to ask vector.Iterative formula is unified in computing:
(formula 3)
In the formula (3), i=0,1 ..., n-1, n are the progression of total rotation, s
iThe direction of decision rotation:
(formula 4)
Like this, computing just has only addition, subtraction and has been shifted.The hardware system of realizing cordic algorithm can adopt pipeline organization, and multistage flowing water unit can be arranged as required.
After n the iteration, obtain following result:
(formula 5)
The thought of cordic algorithm is exactly arbitrarily angled θ of rotation, is divided into some steps, θ of per step rotation
i, simultaneously θ is cut a θ
i, judge the symbol of θ then, according to positive and negative next step rotation angle θ that decides of θ
iPositive and negative.Circulation goes to zero up to θ successively, promptly should rotate the θ angle by vector.
Adopt in the fft processor of Cooley-Tukey algorithm in the present invention, need carry out sequence and twiddle factor
The operation of multiplying each other, wherein
This multiply operation can be regarded as a vector (complex data) has been rotated θ=-2 π i/N degree.Therefore sequence and twiddle factor among the thought of cordic algorithm and the FFT
The requirement of multiplying each other is consistent, according to formula (3) and formula (4), can finish in the middle of twiddle factor generation with take advantage of again.
Common cordic algorithm needs storer to store S in the formula (3)
i, in the operation of an addition of circulation, need some corresponding control logic, for FFT multiply by the twiddle factor computing, can avoid using storer.Because if the index i of twiddle factor is known, the anglec of rotation of CORDIC processor has just been determined.The sequence number that the index of the twiddle factor between two-stage calculation equals to go multiply by the sequence number of row.
The intermediate sequence memory module
Product data after the result that this module is used for buffer memory twiddle factor and 64 base-4FFT processing modules multiplies each other, treat that the product data all arrive the intermediate sequence memory module after, it is sent into 16 bases-2FFT processing module.The intermediate sequence memory module is divided into two 1024 point data storeies, forms " table tennis " storage organization.
16 base-2FFT processing modules
This module is used to finish 2
nRunning point FFT computing, wherein n≤4.Its structural drawing as shown in Figure 4,16 base-2FFT processing modules realize with base-2FFT algorithm, adopt the cascade pipeline organization, for base-2FFT, 16 FFT handle and need level Four, and every grade of inputoutput data storage adopts " table tennis " storage to reach the data stream water treatment.Each grade base-2 basic processing unit is increased output select circuit, select and control module control by the unit, data can be finished 2 points, 4 points, 8 points, 16 FFT computings respectively respectively by the first order, the second level, the third level, fourth stage output.After finishing dealing with, result is sent in the output buffer module.This module adopts the block floating point calibration mode in every grade of computing.
The output buffer module
This module is used to finish result data buffer memory and output at maximum 1024.The result of calculation of 16 base-2 computing modules of second dimension deposits in the output buffer module, the Two-dimensional FFT result is sent into carried out the inverted order processing in the output buffer module, forms final output result.
Control module is selected in the unit
Be used for the difference of counting, the storage and the computing of each unit are controlled, produce corresponding control and select signal according to processing.
Embodiment
When carrying out the FFT computing, the length N that the unit selects control module to carry out conversion at first as required activates corresponding processing module.When carrying out 2 points, 4 points, or 16 FFT computings at 8, need to activate input buffering module, 16 base-2FFT processing modules and output buffer module.When carrying out 64 FFT computings, activate input buffering module, 64 FFT modules and data output buffer module.Carrying out other when counting the FFT computing, activate all modules.
2,4,8,16 or 64 point processings only need call independent 16 bases-2FFT processing module or 64 base-4FFT processing modules just can.When carrying out 32 points, 128 points, 256 points, or 1024 point processings at 512, need to activate whole processing modules.For example:
For 1024 FFT computing, according to formula 2,1024 point data that the input buffering module at first will receive are by the mode of data addressing, according to N=r
1* r
2=64 * 16 decompose.Use 64 bases-4FFT processing module to carry out 16 64 FFT computings then.1024 results that calculate multiply each other with 1024 twiddle factors respectively in the twiddle factor processing module, the product data are deposited in carry out buffer memory in the intermediate sequence memory module then.Treat that 1024 product data all after the intermediate sequence memory module, use 16 base-2FFT processing modules that 1024 product data are carried out 64 16 FFT computings, and operation result is sent into the output buffer module.The output buffer module is carried out inverted order to the result and is handled and export.
For 256 FFT computing, according to formula 2,256 point data that the input buffering module at first will receive are by the mode of data addressing, according to N=r
1* r
2=64 * 4 decompose.Use 64 bases-4FFT processing module to carry out 4 64 FFT computings then.256 results that calculate multiply each other with 256 twiddle factors respectively in the twiddle factor processing module, the product data are deposited in carry out buffer memory in the intermediate sequence memory module then.Treat that 256 product data all after the intermediate sequence memory module, use 16 base-2FFT processing modules that 256 product data are carried out 64 4 FFT computings, and operation result is sent into the output buffer module.The output buffer module is carried out inverted order to the result and is handled and export.
For 512 FFT computing, according to formula 2,512 point data that the input buffering module at first will receive are by the mode of data addressing, according to N=r
1* r
2=64 * 8 decompose.Use 64 bases-4FFT processing module to carry out 8 64 FFT computings then.512 results that calculate multiply each other with 512 twiddle factors respectively in the twiddle factor processing module, the product data are deposited in carry out buffer memory in the intermediate sequence memory module then.Treat that 512 product data all after the intermediate sequence memory module, use 16 base-2FFT processing modules that 512 product data are carried out 64 times
-8 FFT computings, and operation result sent into the output buffer module.The output buffer module is carried out inverted order to the result and is handled and export.
A kind of point-variable real-time FFT processing chip of the present invention has used VHDL hardware programming language to be described in the RTL level.Use SYNOPSYS DesignCompiler synthesis tool to carry out logic synthesis based on SMIC0.18 μ m standard block technology library; The method for designing that adopts sequential to drive has been carried out placement-and-routing with Astro placement-and-routing instrument; Used emulation tool VCS to carry out dynamic logic emulation; Use parameter extraction instrument Star-RCXT instrument to extract parasitic parameter and used continuous analysis when continuous analysis tool PrimeTime carries out static state to whole design when static.The present invention has especially considered grid oxygen integrity issue, i.e. antenna effect, and specific practice is manually to increase backward dioded.Adding a backward dioded on the metal wire that breaks the rules and between the ground wire.It can at first puncture this backward dioded when the voltage difference on the metal wire was big.The integrality of having saved the grid oxygen like this from damage has guaranteed the function of chip.
The ASIC domain of processor as shown in Figure 5, the coordinate when Ben Tu also shows Chip Packaging of the present invention.Storer according to the direction discharging of data stream as shown in Figure 1 so that the placement-and-routing of logical block.The power ring that has added broad simultaneously around storer is to reduce the reduction of the system performance of bringing because of voltage drop.
The core area 4578 μ m * 4578 μ m of chip finally adopt QFP208 to encapsulate, and finished product is seen shown in Figure 6.