Embodiment
For making the object of the invention, technical scheme and advantage clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, to further explain of the present invention.
Block decomposition method can solve the excessive hardware complexity that causes of data interlacing scope, and can improve the data closure of calculating.Single inside and outside pilotaxitic texture can be simplified hardware designs, reduces the data reading times.In order to adapt to the data point of multiple bit wide, according to the present invention, the user can generate all data interlacing addresses and twiddle factor numerical value automatically according to selected data bit width, deposits corresponding memory storage in and supplies hardware to call.
FFT arithmetic unit of the present invention support any 2
nThe FFT that (n is a natural number) counts calculates, and its technical scheme that adopts is following:
FFT arithmetic unit of the present invention mainly comprises: the outer data storage device of one or more block arithmetic units, piece, piece file memory controller and the piece address storage devices that interweaves outward.
In addition, FFT arithmetic unit of the present invention also can comprise the outer data path selector switch of twiddle factor memory storage and piece.
Block arithmetic unit is used for the independent calculating of N_local point FFT computing block, and (N_local is the number of data points that a block arithmetic unit once can read in and calculate, and is not more than 2
n).Always import under the definite situation of bit wide at block arithmetic unit, can import the data of difference " bit wide-number of data points " array configuration flexibly.
The outer data storage device of piece is connected with block arithmetic unit, is used to store the data of calculating through N_local point FFT.Depositing the data of the outer data storage device of this piece in, is the data that sequence by the proper data interleaved order, the completion subsequent operation of can directly peeking.
The piece file memory controller is connected with the outer data storage device of block arithmetic unit and piece respectively, is used for 1 N_local point of every completion FFT and calculates, and according to the outer data interlacing address of correct piece result of calculation is deposited into the outer data storage device of piece.Wherein the outer data interlacing address of the piece address storage devices that directly outside, interweaves reads in.
The piece address storage devices that interweaves outward is connected with the piece file memory controller, is used for according to the needed interleaving address of arithmetic unit execution order storage block file memory controller.The piece equal calculated in advance in address that interweaves outward is good and deposit the piece address storage devices that interweaves in outward, does not calculate at the scene during use, reads the address storage devices that directly outside this piece, interweaves.
The disclosed FFT arithmetic unit of the present invention, all pieces address that interweaves outward is all identical, this is not only less storage space, and reduced addressing circuit design complexities, increased the extent for multiplexing of hardware.
The present invention interweaves the position inverted sequence and general piece interweaves outward combines, and offers the piece file memory controller.Owing to except that last piece, interweave outside the difference of address, other any pieces address that interweaves outward is all identical, and the piece file memory controller only need read in two secondary data in the whole service process, significantly reduced the data accessing operation.
The twiddle factor memory storage is connected with block arithmetic unit, is used for storing block arithmetic unit according to the arithmetic unit execution order and carries out the needed twiddle factor of butterfly computation.The equal calculated in advance of twiddle factor is good and deposit the twiddle factor memory storage in, does not calculate at the scene during use, directly reads from memory storage.
The outer data path selector switch of piece is connected with the outer data storage device of piece, is used to the data of depositing in the outer data storage device of piece, selects different output whereabouts.
Block arithmetic unit mainly comprises: output data path selector switch in data storage device and the piece in butterfly computation device, piece stored controller, the piece.
The butterfly computation device is used to import multiplying each other of data point and twiddle factor and adds reducing.
Data storage device is connected with the butterfly computation device in the piece, is used to store the data through the one-level butterfly computation.Depositing the data of data storage device in this piece in, is the data that sequence by the proper data interleaved order, and the next stage butterfly computation can directly read in data according to this order.
Piece stored controller respectively with butterfly computation device and piece in data storage device be connected, be used for butterfly computation device result of calculation is deposited data storage device in the piece according to data interlacing address in the correct piece.Wherein in the piece data interlacing address directly in the piece interleaving address memory storage read in.
In addition, block arithmetic unit also can comprise input data path selector switch and the interior output data path selector switch of piece in the piece.
Input data path selector switch is connected with the butterfly computation device in the piece, is used to select to get into the Data Source of butterfly computation device.
The output data path selector switch is connected with the interior data storage device of piece in the piece, is used to the data of depositing in the data storage device in the piece, selects different output whereabouts.
Said interior output data path selector switch can be merged into a device with input data path selector switch in the piece, separately lists just for convenience of description at this.
FFT arithmetic unit of the present invention also can comprise interleaving address memory storage in the piece, and said interior interleaving address memory storage is connected with piece stored controller, is used for according to the needed interleaving address of arithmetic unit execution order storage block stored controller.The interior equal calculated in advance of interleaving address of piece is good and deposit interleaving address memory storage in the piece in, does not calculate at the scene during use, directly in this piece, reads the interleaving address memory storage.
The disclosed FFT arithmetic unit of the present invention, interleaving address is all identical in the piece of arbitrary number of level, this is not only less storage space, and reduced addressing circuit design complexities, increased the extent for multiplexing of hardware.In addition, because interleaving address is all identical in the piece of arbitrary number of level, piece stored controller only need read in a secondary data in the whole service process, significantly reduced the data accessing operation.
Below in conjunction with Figure of description a kind of embodiment of the present invention is elaborated.
As shown in Figure 1, FFT arithmetic unit of the present invention comprises: interweave outward address storage devices, the outer data storage device of piece and the outer data path selector switch of piece of interleaving address memory storage, piece file memory controller, piece in a plurality of block arithmetic units, twiddle factor memory storage, piece.
Said block arithmetic unit is mainly used in the independent calculating of carrying out small size FFT computing block.If N_local is the number of data points that a block arithmetic unit once can read in and calculate, N_bit is the bit wide of data point, and L_local is for calculating the required progression of N_local point FFT, then 2
nThe large scale FFT computing block that (n is a natural number) counts can be broken down into
(following explanation is all supposed
Greater than 1,
Equaling at 1 o'clock, also can release identical result) (N_local is not more than 2 to the individual N_local FFT computing block of ordering
n).
Shown in Fig. 2 example, 16 FFT can resolve into 48 FFT computing blocks.Always import under the definite situation of bit wide N_local*N_bit at block arithmetic unit, not only can import N_local N_bit bit data (not specified (NS) place in the following explanation is all by this form input), also can import following form data flexibly:
……
4N_local N_bit/4 bit data,
2N_local N_bit/2 bit data,
N_local/2 2N_bit bit data,
N_local/4 4N_bit bit data
……
As shown in Figure 1, block arithmetic unit comprises: output data path selector switch in input data path selector switch, butterfly computation device, piece stored controller, the interior data storage device of piece and the piece in the piece.
Input data path selector switch is used to select to get into the Data Source of butterfly computation device in the piece.When calculating preceding 2
nDuring the first order butterfly computation of/N_local N_local point FFT computing block, the data that get into the butterfly computation device directly from external memory read in (as Fig. 2 the 1st, the 1st grade of butterfly computation of 2FFT computing block); When calculating first order butterfly that other N_local points FFT calculates and calculate the 1st grade of butterfly computation of 4N_local point FFT computing block (in like Fig. 2 the 3rd), data data storage device outside piece of entering butterfly computation device reads in; When calculating other grades butterfly and calculate the 2nd, 3 grade of butterfly computation of 2N_local point FFT computing block (in like Fig. 2 the 1st), the data of entering butterfly computation device data storage device in the piece reads in.
The butterfly computation device is mainly used in the following operation of N_local data points:
Xo+W*Xe
Xo-W*Xe
Accomplish above computing, promptly accomplished the one-level butterfly computation.Wherein Xo is N_local/2 the vector that the input data point is formed that is positioned on the odd positions, and Xe is positioned at the vector that the locational N_local/2 of even number input data point formed, and W is corresponding twiddle factor vector.The data of forming Xo and Xe are all through input data path selector switch input in the piece.Forming the data of W imports from the twiddle factor memory storage.A block arithmetic unit once reads in the N_local data points, need carry out L_local level butterfly computation.When to carry out
individual to
when individual N_local point FFT calculates (in like Fig. 2 the 3rd, 4N_local point FFT computing block); Need carry out n mod L_local (, then needing the L_local level) level butterfly computation if n mod L_local is 0.Device among the present invention is an example with the butterfly computation of base 2, but practical application is not limited only to the situation of base 2.
The twiddle factor memory storage is used for storing successively the needed twiddle factor of butterfly computation.The equal calculated in advance of twiddle factor is good and deposit the twiddle factor memory storage in, does not calculate at the scene during use, directly reads from memory storage.The generation formula of twiddle factor is:
wherein i be the progression that is arranged in whole FFT computation process (down with).Can calculate the twiddle factor of any one-level through the twiddle factor computing formula.
Piece stored controller is used for the butterfly computation device is calculated the N_local data points of accomplishing, and deposits data storage device in the piece according to data interlacing address in the correct piece.Wherein in the piece data interlacing address directly in the piece interleaving address memory storage read in.
The interleaving address memory storage is used for the needed interleaving address of storage block stored controller successively in the piece.The interior equal calculated in advance of interleaving address of piece is good and deposit interleaving address memory storage in the piece in, does not calculate at the scene during use, directly reads from memory storage.The computing formula of interleaving address is the interior interlace mode example of piece of 8 spot sizes for
Fig. 3 in the piece.
The disclosed FFT arithmetic unit of the present invention, interleaving address is all identical in the piece of arbitrary number of level, this is not only less storage space (only needing N_local data of storage), and reduced addressing circuit design complexities, increased the extent for multiplexing of hardware.In addition, because interleaving address is all identical in the piece of arbitrary number of level, piece stored controller only need read in a secondary data in the whole service process, significantly reduced the data accessing operation.
When data point is read in the array configuration of counting according to different bit wides, as:
……
4N_local N_bit/4 bit data,
2N_local N_bit/2 bit data,
N_local/2 2N_bit bit data,
N_local/4 4N_bit bit data
……
This device all only needs to calculate
and deposit interleaving address memory storage in the piece in, can offer piece stored controller and be used for data addressing.
Data storage device is used to store the data through the one-level butterfly computation in the piece.Depositing the data of this memory storage in, is the data that sequence by the proper data interleaved order, and the next stage butterfly computation can directly read in data according to this order.
The output data path selector switch is mainly the data of depositing in the data storage device in the piece in the piece, selects different output whereabouts.When having calculated
individual n mod L_local (if n mod L_local is 0 to
individual N_local point FFT computing block; Then be the L_local level) level is during butterfly computation (as among Fig. 2 the 3rd, the 1st grade of butterfly computation of 4N_local point FFT computing block), data directly deposit piece data storage device outward in; When having calculated the L_local level butterfly computation of other small sizes FFT, data deposit in the outer data storage device of piece (as among Fig. 2 the 1st, the 3rd level butterfly computation of 2N_local point FFT computing block); When having calculated other situation butterfly computations, data read in by the butterfly computation device (as among Fig. 2 the 1st, the 1st, 2 grade of butterfly computation of 2N_local point FFT computing block).This device can be merged into a device with input data path selector switch in the piece, separately lists just for convenience of description at this.
When the piece file memory controller is used for 1 N_local point FFT calculating of every calculating completion, result of calculation is deposited into the outer data storage device of piece according to the outer data interlacing address of correct piece.Wherein the outer data interlacing address of the piece address storage devices that directly outside piece, interweaves reads in.
The piece address storage devices that interweaves outward is used for the needed interleaving address of storage block file memory controller successively.The piece equal calculated in advance in address that interweaves outward is good and deposit the piece address storage devices that interweaves in outward, does not calculate at the scene during use, and the address storage devices that directly outside piece, interweaves is read.Interweave the outward computing formula of address of piece is piece that 64 FFT computing blocks are broken down into 8 the FFT computing blocks pattern example that interweaves outward for
Fig. 4.
The disclosed FFT arithmetic unit of the present invention, the piece of the arbitrary number of level address that interweaves outward is all identical, this is not only less storage space, and reduced addressing circuit design complexities, increased the extent for multiplexing of hardware.
When data point is read in the array configuration of counting according to different bit wides, as:
……
4N_local N_bit/4 bit data,
2N_local N_bit/2 bit data,
N_local/2 2N_bit bit data,
N_local/4 4N_bit bit data
……
This device all only needs to calculate
and deposit the piece address storage devices that interweaves in outward, can offer the piece file memory controller and be used for data addressing.
Calculating last
individual during (in the 3rd, 4N_local point FFT computing block) like Fig. 2 to
individual N_local point FFT computing block; Piece interweave outward the computing formula of address for
this position inverted sequence is interweaved with general piece outside interweave and combine, offer the piece file memory controller.Owing to except that last piece, interweave outside the difference of address, other any pieces address that interweaves outward is all identical, and the piece file memory controller only need read in two secondary data in the whole service process, significantly reduced the data accessing operation.
When data point is read in the array configuration of counting according to different bit wides, as:
……
4N_local N_bit/4 bit data,
2N_local N_bit/2 bit data,
N_local/2 2N_bit bit data,
N_local/4 4N_bit bit data
……
This device all only needs to calculate:
……,
……
And deposit the piece address storage devices that interweaves in outward, can offer the piece file memory controller and be used for data addressing.
The outer data storage device of piece is used to store the data of calculating through N_local point FFT.Every calculating accomplishes 2
N-L_localIndividual N_local point FFT calculates, and has deposited complete 2 in this memory storage in
nThe individual data that sequenced by the correct data interleaved order.Carry out next 2
N-L_localWhen individual N_local point FFT calculates, can directly data be read in according to this order.
The outer data path selector switch of piece is mainly the data of depositing in the outer data storage device of piece, selects different output whereabouts.When to have calculated
individual to
![Figure BDA0000123751920000106](https://patentimages.storage.googleapis.com/58/08/2b/1a7ffd0f0b0555/BDA0000123751920000106.png)
when individual N_local point FFT calculates the 1st grade of butterfly computation of 4N_local point FFT computing block (in like Fig. 2 the 3rd), data directly deposit external memory in; When the N_local point FFT that has calculated other situation calculates (in like Fig. 2 the 1st, 2N_local point FFT computing block), data are read in by block arithmetic unit.The invention provides a kind of variable-sized block FFT arithmetic unit of single inside and outside pilotaxitic texture, the thought that this device uses piecemeal to handle is decomposed into a plurality of independently small size FFT computing blocks with large scale FFT computing.The data interlacing scope of small size FFT computing block is little and interleaving mode is consistent, has reduced structure complexity and has read power consumption, has improved the access speed and calculating closure of data.Piecemeal is handled the back only needs less global data to interweave, and the mode that all global datas interweave is consistent.After the piece fixed size, this device can be handled the data of different bit wides flexibly.
FFT arithmetic unit among the present invention has carried out block decomposition too to be disperseed to avoid addressing space, and the calculating of can once peeking simultaneously is multistage, reduces the access data number of times.Its advance is embodied in:
The data bit width that can select according to the user generates corresponding block of decomposition pattern automatically.If hardware configuration uses fixing block size, when the user changes bit wide, only need to generate corresponding twiddle factor, inside and outside interleaving address, other structures do not need to change;
Data interlacing pattern in each piece is identical, and the data interlacing pattern outside all pieces is identical.Adopt the variable-sized block fft algorithm of single inside and outside pilotaxitic texture, not only retrained the discretize degree of data address, reduced communication-cost, and increased the closure of data, make algorithm be easier to parallelization.
Support any 2
nThe FFT computing of counting has greatly improved the dirigibility of hardware designs, need not change hardware structure for the FFT computing that difference is counted.
Above-described specific embodiment; The object of the invention, technical scheme and beneficial effect have been carried out further explain, it should be understood that the above is merely specific embodiment of the present invention; Be not limited to the present invention; All within spirit of the present invention and principle, any modification of being made, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.