CN100547580C

CN100547580C - Be used to realize the method and apparatus of the fast orthogonal transforms of variable-size

Info

Publication number: CN100547580C
Application number: CNB2005800230943A
Authority: CN
Inventors: 多龙·所罗门; 吉拉德·加龙
Original assignee: ASOCS Ltd
Current assignee: ASOCS Ltd
Priority date: 2004-07-08
Filing date: 2005-07-08
Publication date: 2009-10-07
Anticipated expiration: 2025-07-08
Also published as: CN101031910A

Abstract

A kind of reconfigurable structures and method that is used for carrying out the fast orthogonal transforms of vector in a plurality of levels, the size of vector is N, wherein, N variable and the level quantity be the function of N, described structure comprises: computing unit (182), and it is configured and arranges so that comprise one or more butterfly units; Module, it comprises one or more multipliers (184) that are coupled to the output terminal of described computing unit, its all butterflies that are configured and are arranged at least one grade of carrying out described conversion are calculated; Storage unit (180), it is configured and is arranged to intermediate result and pre-determined factor that the described butterfly of storage is calculated, carries out the usefulness of each butterfly calculating for described computing unit, and described storage unit comprises storer and multiplexing structure (180).

Description

Be used to realize the method and apparatus of the fast orthogonal transforms of variable-size

Technical field

The disclosure relates to a kind of system and method, be used to provide the on-line reconfiguration of hardware, so that allow the realization of the vector orthogonal transformation of variable-size, for example FFT/IFFT (inverted-F FT) conversion, Walsh-Hadamard (Walsh-Hadamard) conversion or the like comprises the combination of the above alternative types.Described system and method is particularly useful in utilizing the communication facilities of described conversion.

Background technology

(related application)

The right of priority of request U.S. Provisional Patent Application of the present invention, U. S. application is:

The No.60/586 that submits on July 8th, 2004,390, exercise question is " Low-PowerReconfigurable Architechture for Simultaneous Implementation ofDistinct Communication Standards " (acting on behalf of scheme 66940-016);

The No.60/586 that submits on July 8th, 2004,391, exercise question is " Method andArchitechture for Implementation of Reconfigurable Matrix-VectorComputations " (acting on behalf of scheme 66940-017);

The No.60/586 that submits on July 8th, 2004,389, exercise question is " Method andArchitechture for Implementation of Reconfigurable OrthogonalTransformation " (acting on behalf of scheme 66940-018);

The No.60/604 that submits on August 25th, 2004,258, exercise question is " Method andDevice for On-line Reconfigurable Viterbi Decoding of Recursive andNon-recursive Systematic Convolution Codes with Varying Parameters " (acting on behalf of scheme 66940-020); And

And the right of priority of following non-interim U. S. application:

The No.11/071 that submits on March 3rd, 2005,340, exercise question is " Low-PowerReconfigurable Architecture For Simultaneous Implementation OfDistinct Communication Standards " (acting on behalf of scheme 66940-021).

Common orthogonal transformation provides a kind of effective tool that information transmitted in wireless communication system is encoded, and adopts different described conversion according to the agreement in order to transmission information.For example, FFT (Fast Fourier Transform (FFT))/IFFT (inverted-F FT) is a kind of computing block of key, for example in ofdm system and bank of filters.For example, referring to N.West, andD.J.Skellern, " VLSI for OFDM ", IEEE Communications Magazine, pp.127-31, vol.36, (no.10), in October, 1998, and R.van Nee and R.Prasad, " OFDM for Wireless Multimedia Communication ", Artech HousePublisher, 2000.

The attracting feature of FFT/IFFT is: by input end and the output terminal of paired combination (conjugating) FFT, and divide described output terminal with the size of processed vector, can utilize fft block to carry out IFFT.Therefore, identical hardware can be used for FFT and IFFT.Some standards of carrying out FFT/IFFT realize it being known, and wherein some provide reconfigurability.The FFT/IFFT of a standard realizes utilizing the FFT core algorithm.

The FFT core algorithm:

The digital computation of N point DFT (discrete Fourier transformation) (for example, referring to A.V.Oppenheim and R.W.Schafer, " Discrete-Time Signal Processing ", Prentice Hill, New Jersey, 1989) be:

(1)

X [k] = Σ_{n = 0}^{N - 1} X [n] W_{N}^{nk}, k &Element; [0, N)

Wherein said complex exponential coefficient is:

W_{b}^{a} = e^{- J 2 π \frac{a}{b}} .

The direct calculating of DFT (for all k) needs the sub-addition of N * N multiplying and N * (N-1).Fft algorithm is to realize more efficiently, and its quantity with multiplying is reduced to Nlog ₂N.Described basic thought is: with length is that the FFT of N is divided into two FFT parts that length is N/2, and it is two FFT parts of N/2 or the like that each that then will be wherein further is divided into length.This processing lasts till that the length of each FFT part is reduced to 2, and it can directly be calculated by a kind of so-called " butterfly " unit.The grid of so a kind of butterfly unit has been described in Fig. 1.

Two other fft algorithm commonly used is: at decimation in frequency (DIF) and decimation in time (DIT) algorithm, it is similar in itself.Utilize the DIF algorithm to describe the realization of described structure, wherein the intermediate result with FFT is divided into even number and odd number part, utilizes:

X [2 r] = Σ_{n = 0}^{N / 2 - 1} x [n] W_{N}^{2 rn} + Σ_{n = N / 2}^{N - 1} x [n] W_{N}^{2 rn}

= Σ_{n = 0}^{N / 2 - 1} x [n] W_{N}^{2 rn} + Σ_{n = 0}^{N / 2 - 1} x [n + N / 2] W_{N}^{2 r (n + N / 2)}, r &Element; [0, \frac{N}{2} - 1)

(2)

And similarly,

The realization of standard:

In the art methods of standard,, at first must analyze described computation structure for the reconfigurability of specific function is provided.Can shuffle exchange (shuffle-exchange) interconnection network to what FFT saw a kind of butterfly piece as, its size with FFT changes, and therefore makes it be difficult to support the dirigibility of parallel realization fully of maximal efficiency.In parallel fully realization, the signal flow graph can be mapped directly on the hardware.For example, for 16 FFT, always have 32 butterfly units and they with as by interconnecting in the mode shown in the grid among Fig. 2.Usually, N point FFT needs

Individual butterfly unit.This parallel to greatest extent structure possesses the potential of high-performance and low energy consumption, yet it has brought and has taken the expensive of huge silicon area, particularly for huge FFT size.

The output that is produced by DIF FFT is that the position is opposite.For example,

X[10]＝X[1010 ₂]＝Y[0101 ₂]＝Y[5]。

When carrying out described realization with the algorithm of fixedly counting, for the correct running of described transducer, it is critical calibrating (scaling) and overflowing processing.Butterfly operation at each grade place of FFT comprises complex addition and complex multiplication.Each complex addition is made up of two real number additions, its input word extended length 1 bit.Each complex multiplication is made up of four real multiplications and two real number additions.Real multiplications is with described input word length doubles.Therefore,, perhaps described output word length is increased to (M+1) bit, perhaps need described output truncation or be rounded to the M bit in order to ensure correct running.If the execution truncation then can abandon the most important bit of described output by the maximal value of M bit description simply by described value is truncated to.Round off if carry out, then before 1 bit that described output is moved to right, at first " 1 " is added to positive output, and abandon least important bit.Rounding off to cause that totalizer overflows, and is zero (even number) because minimum and maximum number (a+b) makes their least important bit after described addition.After rounding off, the scope of described output is identical with a and b, for example, and the M bit.

Per-column method:

In per-column FFT structure, rearrange described calculating, thereby make in each grade shown in grid among Fig. 3 keep as described in interconnection be identical.Owing to, just no longer need input, therefore described output can be sent to the input end of identical butterfly, thereby identical butterfly is used further to next and continuous level (displacement (in-place) computing) in the mode of iteration to butterfly in case calculate output.Thereby, only need the butterfly of single row, by the described row (time-multiplexed) that utilize again not at the same level that calculate.Yet, change the FFT coefficient between need be from the level to the level.Usually, N point FFT needs N/2 butterfly unit, and for example, 16 FFT need 8 butterflies.Its power consumption is in close proximity to parallel fully structure, and it needs less area.Because described simple iteration structure is optimized specific size, therefore be converted into the task that a kind of reconfigurable design remains a complexity.Conversion from parallel to per-column realization need the clock cycle be handled FFT frame (frame) more for a long time.In fact, described parallel method allowed to handle a complete FFT frame in a clock period, and described row method needs log ₂N (when the butterfly structure that utilizes based on the radix 2) clock period, this is caused by described iteration time multiplexing structure.

Reconfigurable design:

By selecting conventional pipeline organization to move fft algorithm, just may realize the reconfigurable design of the low-power consumption that has, even with compare by the power consumption of the design that standard min provided of FFT conversion complexity.

Pipelining technique:

With the total complexity N/2 * log in parallel fully method ₂N compares with total complexity N/2 in per-column method, in the pipeline organization of routine, only uses a butterfly unit for each level, and total complexity of generation is log ₂N.In Fig. 4, described and be used for the example that length is the pipelining technique of 16 FFT.The multiplier 40 of each

grade

42a, 42b and 42c is different from

butterfly unit

44a, 44b and 44c, so that distinguish between hsrdware requirements.Among described

butterfly unit

44a, 44b, 44c and the 44d each is a time multiplexing in calculating for N/2 the butterfly of each grade.For the level that comprises butterfly unit 44c, multiplier 40c is " j ".Output for last butterfly unit 44d does not need multiplier.Realization based on streamline is compared with per-column method, every FFT frame needs clock periodicity more for a long time, because can in N (when the butterfly structure that utilizes based on radix 2 time) clock period, realize complete FFT frame based on the method for streamline, and because the time multiplexing structure of iteration, method needs log ₂N (when the butterfly structure that utilizes based on the radix 2) clock period.In the hardware of all grades was realized, the clock quantity that is used to handle the FFT frame was not obstacle, because data insert frame by frame with serial mode, and the clock periodicity of every frame is converted to constant initial delay, keeps high-throughput simultaneously.

Referring to, for example, E.H.Wold and A.M.Despain, " Pipelined andparallel-pipeline FFT processor for VLSI implementation ", IEEE Trans.Comput, P.414-426, in May, 1984, single-path delay feedback (SDF) realize by butterfly output is stored in feedback shift register or FIFO 46 in (their size provides in Fig. 4, the length of register is 8,4,2 and 1 accordingly in this example), more effectively utilize storer.In each level, individual traffic is by described multiplier.

Mixed method

Mixed method combines the advantage of described row and feedback method.It utilizes the element of feedback method to save storer, and utilizes row levels (column stage) to realize that better hardware uses.The use of the row level butterfly unit of 4 bit widths can adopt bigger BUS width and suitable restructural multiplier to combine.Can also be structure with described Structure Conversion with high spatial utilization factor and the necessary accurate highway width of high efficiency of algorithm.

Figure 5 illustrates a kind of general structure that is used to move iterative process.This FFT realizes having used single butterfly unit 50.Optimization Dispatching and memory access scheme are mainly paid close attention in the design of described single butterfly unit, that is, and and when by reusing identical butterfly unit, when realizing each in the described level with the time multiplexing of iterative manner, providing a kind of pipeline scheme.For example, the Spiffee processor, referring to B.M.Baas, " A Low-power; high-performance; 1024-point FFT processor ", Journal of Solid-StateCircuits, in March, 1999, it is an example that utilizes cache architecture, comprise RAM 52 and multiplier 56, to make full use of the conventional memory access mode of fft algorithm, so that realize low-power consumption.Can programme to the processor that is depicted as controller 54, to carry out the FFT of random length, but only specific FFT size is optimized some feature of cache size of for example being provided by RAM 52 and so on, and this method is carried out with low-down speed, because the necessary N of the calculating to the FFT frame clock period by the complete realization of pipelining algorithm is carried out, produced constant initial delay.This means because the iteration time of the level by the butterfly unit 50 that is reused is multiplexed, need before it can begin to handle next FFT frame, calculate complete frame (when utilization needs N clock period during based on the butterfly unit of radix 2).

By using, for example, can obtain fft processor more efficiently based on the structure of radix 4 based on the butterfly unit of large cardinal more.This just will be used for the necessary calculating clock period of processes complete FFT frame and be reduced to N/2.Most of FFT accelerator of realizing in senior DSP and chip is based on the fft processor of radix 2 or radix 4.Their purposes are restricted (only being used for the FFTs conversion), have the utilization factor of unusual low speed and bear the needs of high clock rate design.

Wave filter based on multiplexed streamline method is realized:

Utilize reconfigurable iterative scheme,, can realize having the wave filter or the correlation function of high efficiency any kind of than scheme as shown in FIG. 6.Described scheme is that the multiplier by the final stage of utilizing the FFT conversion multiply by filter coefficient (time domain multiplication), carries out subsequently that IFFT realizes again, preferably referring to 60 places among Fig. 6.It and also is (it can also be used for equalization, prediction, interpolation and calculate relevant) efficiently in any algorithm as the combination of the above-mentioned algorithm (for example filtering) that uses cascade FFT and IFFT algorithm in any accessory products (sub-products) of the FFT/IFFT that realizes for example discrete cosine/sine transform (DCT and DST) and so on.

Have not homoimerous FFT:

Radix 2 ₂Algorithm interesting especially.It possesses respectively and radix 4 and split the identical multiplicative complexity of (split) radix algorithm, has but kept the butterfly structure of conventional radix 2.Compare with other algorithm, the regularity in this space realizes providing remarkable structural advantage for VLSI.At described radix 2 ₂The basic thought that the algorithm back hides is to adopt two levels of conventional DIF fft algorithm, and passes through

W_{N N}^{\frac{}{4}} = - j

Maximize the quantity of ordinary multiplication (trivial multiplication),

W_{N N}^{\frac{}{4}} = - j

Only relate to exchange of real number imaginary number and sign inversion.In other words, rearrange the FFT coefficient, and the non-trivial multiplication is focused in the level, so that in per two levels, only need a complex multiplier (having reduced total logic area).Fig. 7 has described and has represented that such coefficient rearranges the grid of (with parallel form): for any two butterfly coefficient W _N ⁱWith

With W _N ⁱBe extracted as common factor (factored out) and be sent to next stage, stay coefficient 1 He in the relevant position

To all coefficients to carrying out after this coefficient resets, the next one of staying down does not have the non-trivial multiplication.

Mixed production line/multichannel multiplexing method:

In the past 10 years have proposed a plurality of pipeline FFT structures.Owing to kept the space regularity of signal flow graph in pipeline organization, so they are high modularization and upgradeable (scalable).By the feedback of a kind of single-path delay shown in Fig. 8 A, realized shuffling network 80, wherein deal with data and will feed back fifo register 84 and be used to store new input and intermediate result between the level 82 in single path.The basic thought of hiding in this scheme back is: storage data and to its encode (scramble), so that next stage can receive this data with correct order.When described fifo register 84 is filled by the first half parts (first half) of described input, previous result's the most later half part (last half) is moved to next stage.During this period, the element of bypass running.When the first half parts with described input shift out fifo register, its second half parts that to be ready arriving together with input are delivered in the described treatment element.During this period, the element work of running also produces two outputs, and one directly offers next stage 82, and another moves in the corresponding fifo register.Where necessary, according to described radix 2 ₂Or the algorithm of radix 2, between level, insert the multiplier (not shown).The grid and the packet that use in such realization have been described respectively in Fig. 8 B and 8C.

Description of drawings

With reference to the accompanying drawings, wherein possess parts like the element representation class of same reference numeral sign from start to finish, and wherein:

Fig. 1 is the description of FFT butterfly computing grid;

Fig. 2 is the description at 16 FFT grids of decimation in frequency;

Fig. 3 is based on the description of 16 FFT grids of row;

Fig. 4 is the description that is used to realize based on the block scheme of the structure of 16 FFT of the radix 2 (N=16) of streamline;

Fig. 5 is the description of block scheme of structure that is used to realize the fft processor of simple radix 2;

Fig. 6 is based on the description of block scheme of 16 Filter Structures of the radix 2 (N=16) of streamline;

Fig. 7 is the description of resetting the grid of the multiplication technology for eliminating that carries out by coefficient;

Fig. 8 is the description of the figure of a kind of streamline of shuffle exchange interconnection transducer grid, block scheme and the bag realized;

Fig. 9 is the description in the matrix operation of the butterfly structure of radix 4 of being used for according to an aspect of method and system of the present disclosure;

Figure 10 is the description according to the level grid of the radix 22 of an aspect of method and system of the present disclosure;

Figure 11 is the description according to the block scheme of the level butterfly structure arranged of the reconfigurable radix 22 of an aspect of method and system of the present disclosure;

Figure 12 is the description based on 16 wave filters of the radix 2 (N=16) of streamline according to an aspect of method and system of the present disclosure;

Figure 13 is the radix 2 based on semi-fluid waterline/iteration according to 16 FFT of an aspect of method and system of the present disclosure ₂The description of realization (N=16);

Figure 14 be according to 16 wave filters of an aspect of method and system of the present disclosure based on pipelining radix 2 ₂The description of realization (N=16);

Figure 15 be according to an aspect of method and system of the present disclosure 16 Walsh (Walsh) spread spectrum/separate spread spectrum function (spreading/dispreading function) based on parallel radix 2 ₂The description of the grid of realization (N=16);

Figure 16 is the description based on the grid of the realization of parallel radix 2 (N=16) according to 16 Walsh expansion/the planning functions of an aspect of method and system of the present disclosure;

Figure 17 is the description according to the block scheme of the structure that reconfigurable MF-I core processor is provided of an aspect of method and system of the present disclosure;

Figure 18 is the description according to the block scheme of the structure that reconfigurable MF-I core processor is provided of an aspect of method and system of the present disclosure; And

Figure 19 is a kind of block scheme of communication system of the transducer that is configured to comprise any kind described herein.

Embodiment

Following discloses have been described a kind of method and system that is used to realize the orthogonal transformation such as the Fast Fourier Transform (FFT) (FFTs) of the vector with variable-size (real number and complex vector).Realized adaptive algorithm, the input to described algorithm can onlinely be determined and depend on to the size of wherein said conversion.The example of described adaptive algorithm is: any accessory products of (1) FFTs, (2) inverted-F FT (IFFT), (3) FFTs and IFFTs, for example discrete cosine/sine transform (DCT and DST), (4) Walsh-Hadamard transform with and any accessory products, for example CDMA, DSSS, spread spectrum/separate spread spectrum (Spreading/De-spreading) core algorithm, and any combination of algorithm as mentioned above.Described method and system can also be used for filtering and other functions, for example the function (it can also and then be used for equalization, Hilbert conversion, prediction and difference and relevant) that is obtained during with the IFFT algorithm as cascade FFT.Described method and system allows expeditiously and realizes FFT/IFFT and above all algorithms by the quick on-line reorganization of hardware in very wide parameter area.It has reduced the hardware quantity of the parallel or equipment that serial realizes of the FFT conversion that is used for having different sizes or above-mentioned algorithm significantly.

Disclosed method will be revised orthogonal transform processor, so that a kind of interconnection structure of simplification is provided, this interconnection structure is adapted to the length and the corresponding adjustment memory size of FFT vector, for example, change the length of described shift register (or FIFO), revise interconnect bus and the simply multiplexed of I/O module is provided on demand, thereby be easy to realize dirigibility.Employing is according to the clock frequency of input sample speed, invalid by mapping directly to hardware and being used in the unnecessary module of FFT of shorter length, perhaps the hardware time of carrying out is divided, can be adapted to the four corner of FFT by folding (folding) described processing level and for the situation of long (but low character rate).This structure does not need to cushion or serial-to-parallel conversion.

Utilization is according to radix 2, radix 2 ₂, radix 2 ₃, radix 4, radix 8 or similar type, can realize described structure.Described radix 4 (multiplier that does not have coefficient of rotary) can also be expressed as the matrix operation shown in Fig. 9, and shown in the grid among Figure 10, realize.

A kind of reconfigurable radix 2 has been described in Figure 11 ₂The embodiment of realization of level, be included in two levels of input multiplexer 111,

butterfly unit

110a and 110b, only have 112b and the controller 118 of two feedback memory 112a of a general multipliers 114 and an improved cross connect (cross junction) (having the sign inversion ability) module 116.Module 116 is in order to switching between IFFT and FFT processing, thereby elimination is at the needs of butterfly unit 110a output terminal to multiplier.In realization, can revise available memory size among storer 112a and the 112b by controller 118, just to adapt to length at processed FFT.Can detect and determine the length of transformation vector by detecting device 117 by controller 118.In addition, provide storer 119, be used to store the coefficient that multiplier 114 uses the level of each calculating.

Figure 12 described a kind of 16 FFT based on pipelining radix 2 ₂The embodiment of realization (N=16).In this embodiment, controller 128 provides input, and the size of each storer is set, and in this case, is the shift register 124 that each grade is set.Multiplexer 121 also is set, so that desired continuous input is provided to the input end of the butterfly unit 122a of the

first order.Multiplier

126a, 126b and 126c are placed in the junior three level output terminal of each respectively, and afterbody does not need multiplier.As shown in the figure, by multiply by " j "

multiplier

126a and 126c will change the level output be converted to the coupling imaginary number plural number " j ".

Figure 13 illustrates alternative embodiment, it is combined with a kind of iterative process structure that is used to carry out.Especially, Figure 13 shows the radix 2 based on semi-fluid waterline/iteration of 16 FFT of a kind of processing ₂The example of realization (N=16).In this embodiment, only need two butterfly level 130a and 130b, and the output terminal of multiplier 130b provides the output of feedback and transform processor.Especially, the output with multiplexer 131 provides to the input end of butterfly unit 132a.Butterfly unit 132a provides to feed back to and deposits described reservoir (for example, shift register 134a), and provides and export " j " multiplier 136a to.The output of " j " multiplier 136a is applied to the input end of butterfly unit 132b.Butterfly unit 132b provides and feeds back to described storer (for example, shift register 134b), and provides and export multiplier 136b to.Output terminal 136b is connected to the input end of butterfly unit 132a by feedback path.In operation, controller 138 is according to the level of handling, the size of control store 134.Under first kind of situation, when receiving signal phasor at first, respectively register 134a and 134b are arranged on " 8 " and " 4 ", and come processing signals by two levels.Described processor output terminal is invalid, and the output of second level butterfly unit 132b is applied to the input end of butterfly unit 132a by described feedback path.During next iteration, the described storer of described controller is set to " 2 " and " 1 ".Processing signals serially, and the output terminal of the through second butterfly unit 132b then.Then, make that the output terminal of processor is effective, and make feedback path invalid, so that the output of processor is provided at 139 places.

Figure 14 has described the radix 2 based on streamline of 16 wave filters ₂The embodiment of the example of realization (N=16).Controller 148 is provided with the size of storer once more for each grade.Filter factor is applied to multiplier 140.This structure can be mixing iteration or streamline/iteration or parallel.

The structure of Figure 14 can be revised as once more mixing iteration or streamline/iteration or parallel structure.

Walsh-Hadamard transform starts:

Regard as the argumentation of matrix operation from previous conversion, be easy to see easily to change described structure, for example Walsh spread spectrum/separate spread spectrum function to handle other orthogonal signal with radix 4.By using simply ± coefficient of an ordinary replacement multiplier in 1, can adopt described existing structure easily to realize Walsh spread spectrum/separate spread spectrum function.Further analyze and illustrate: only need to change the non-trivial coefficient and with-coefficient that j multiplies each other.In addition, the multiplier coefficients of described non-trivial has possessed and is used for realizing for Walsh spread spectrum/separate the spread spectrum function, at FFT

The ability that changes between the IFFT and with all necessary conditions of the necessary ordinary multiplier of multiplication of-j.For hardware, unique additional demand is a Management Controller 148.

As an example, the Walsh spread spectrum of " radix 4 "/separate the spread spectrum butterfly unit can be expressed as matrix operation:

(4)

(\begin{matrix} Y (1) \\ Y (2) \\ Y (3) \\ Y (4) \end{matrix}) = (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - 1 & 1 & - 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \end{matrix}) \times (\begin{matrix} X (1) \\ X (2) \\ X (3) \\ X (4) \end{matrix})

By more described two matrix representations, the relation between two conversion as can be seen:

(5)

Walsh (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - 1 & 1 & - 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \end{matrix}) &DoubleLeftRightArrow; (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - j & - 1 & j \\ 1 & - 1 & 1 & - 1 \\ 1 & j & - 1 & - j \end{matrix}) FFT

Because the conversion of radix 4 is a kind of complex operations, therefore obtained to be used for two of the real number vector independently Walsh (Walsh) spread spectrums/separate spread spectrum to handle, this is because ± 1 ordinary multiplier does not exchange between I and Q signal.Therefore, this feature can be used for realizing, for example, adopts the RAKE receiver of a kind of 2 finger of new WCDMA standard or plural Walsh spread spectrum/separate spread spectrum function.Can also use the second independent Walsh spread spectrum/separate spread spectrum function as extra level, perhaps alternately, by convening (conjured) I and Q in position, use it for that bigger Walsh spread spectrum/(this possibility is at the reconfigurable radix 2 shown in Fig. 9-14 to separate spread spectrum ₂Structure in realized).

For example, the realization shown in Figure 14 needs a kind of fft algorithm, and it only uses the inferior computing of NLog (N) to carry out fft algorithm, and it promptly for big data payload, is very efficiently therefore for the computing such as several CDMA modulating/demodulatings that are coded in together.

Now, described complex multiplier can be used for being implemented in the wave filter in the frequency field, its be used to use accurate random series with very high efficient randomization/separate randomization walsh sequence (when handling several CDMA modulating/demodulating that is coded in together, that is, be used for big data payload (as shown in the CDMA/WCDMA standard)).Owing to only need modulated data is multiplied each other once (for all codings), therefore realized high efficiency, and be not that each sign indicating number is multiplied each other.

Figure 15 has described when the twiddle multiplication device that will be used for the randomization walsh codes is used in the head and the tail of parallel organization, to the grid of the example of the embodiment of the conversion of the level of the radix 4 of Walsh spread spectrum/separate spread spectrum function.Figure 15 show especially 16 Walsh spread spectrums/separate spread spectrum function based on parallel radix 2 ₂The example of realization (N=16).

Utilize the twiddle multiplication device to realize the basic FFT of radix 2, only multiplier can be become " 1 ".Figure 16 shows the example based on the grid of the realization of parallel radix 2 (N=16) of 16 Walsh spread spectrums/separate frequency expansion sequence, that is, and and Walsh spread spectrum during modulating/demodulating/the separate example of 16 chip sequence of frequency expansion sequence.

Can for example, be used for realizing wave filter once more according to the above-mentioned complex multiplier that uses, perhaps be used to adopt accurate randomized sequence randomization/the separate described walsh sequence of randomization at frequency domain.Owing to only need modulated data is multiplied each other once (for all codings), therefore realized high efficiency, thereby and and do not required each sign indicating number is multiplied each other.

The multiplexed method of reconfigurable hybrid flow alignment:

As shown in figure 17, can be with the little radix 2 of one " group ", 4 bit widths ₂Butterfly unit combine, form wideer BUS radix 2 ₂, each little radix all is connected to reconfigurable controlled RAM " group " simultaneously:, described RAM " group " can be by merging/fractionation.Can also be according to above method, use restructural " processing " core to realize being used for the restructural multiplier that BUS splits (splitting), described restructural " processing " core for the IFFT/FFT/ wave filter/correlator of random length and Walsh-Hadamard transform or its any accessory products (for example, CDMA DSSS core and even DDS frequency filter) have high usage and a low-power consumption, and when some algorithms can be in any configuration, comprise various parallel/streamline/iterative algorithm organization plan, have necessary arbitrarily BUS width during following operation.Realize that this core has maximum clock rate owing to adopt silicon, therefore reconstruct as required can produce any amount of walking abreast/streamline/iterative algorithm organization plan, wherein each all realizes resource for described algorithm and silicon chip at any time, and optimize, thereby produce the very compact reconstruct structure that possesses the high usage performance for any standard that modulator-demodular unit is realized.Figure 17 shows the example of the reconfigurable MF-I core that is used to handle the FFT/IFFT vector.

Sum up the disclosure, current method comprises by utilizing the modification to basic fft processor of the interconnection structure simplified.Simply the length by shift register (or FIFO ' s) in changing storer, change the simply multiplexed of bus size and I/O module on demand, allow the dirigibility when adjusting according to the size of FFT.Employing is according to the clock frequency of input sample speed, invalid by mapping directly to hardware and being used in the unnecessary module of FFT of shorter length, perhaps the hardware time of carrying out is divided, can be adapted to the four corner of FFT by folding described processing level and for the situation of long (but than low character rate).This structure does not need to cushion or serial-to-parallel conversion.

With radix 2 ₂Structure be example, radix 4 (multiplier that does not have coefficient of rotary) can also be expressed as matrix operation as shown in Figure 9.Figure 10 illustrates corresponding butterfly structure.Therefore, as described in above-mentioned example, radix 2 ₂The realization of level will need only to have a general multipliers and a two-stage butterfly unit that possesses the improved cross connect (IFFT/FFT changes necessary) of sign multiplication, thereby and eliminate demand to multiplier.Figure 11 illustrates corresponding structure.In Figure 12, provided the radix 2 of 16 FFT ₂The corresponding multistage realization (referring to Fig. 4) of realization.Same conversion, but be to use being implemented in shown in Figure 13 (single-stage) and Figure 14 (multistage) of the reconfigurable handover mechanism of iteration.

Walsh-Hadamard transform starts: in conjunction with processor and the method that FFT/IFFT describes, can also be used to realizing being used for the processor and the method for other conversion.From the expression of the conversion of above radix 4 as matrix operation, in order with operation change to be Walsh spread spectrum/separate the spread spectrum function, needed is the multiplier that replaces being used for FFTs with ordinary ± 1 multiplier.Further analyze and demonstrate: only need to change the non-trivial coefficient and with-coefficient that j multiplies each other.In addition, described non-trivial multiplier is exactly the whole necessary conditions that are used to realize necessary ordinary multiplier for Walsh spread spectrum/separate spread spectrum function, and it has at FFT

Change between the IFFT and with the ability of the multiplication of-j.For hardware, unique additional demand is the controller that is used to manage with processor controls work.

Matrix operation shown in the Walsh spread spectrum of " radix 4 "/separate spread spectrum butterfly can also being expressed as:

(6)

(\begin{matrix} Y (1) \\ Y (2) \\ Y (3) \\ Y (4) \end{matrix}) = (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - 1 & 1 & - 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \end{matrix}) \times \begin{matrix} (\begin{matrix} X (1) \\ X (2) \\ X (3) \\ X (4) \end{matrix}) \end{matrix}

By comparing two matrix representations, can see the relation between two conversion:

(7)

Walsh (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - 1 & 1 & - 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 & - 1 & 1 \end{matrix}) &DoubleLeftRightArrow; (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & - j & - 1 & j \\ 1 & - 1 & 1 & - 1 \\ 1 & j & - 1 & - j \end{matrix}) FFT

Because the conversion of radix 4 is a kind of complex operations, two of real number vector spread spectrum/separate spread spectrum independently Walsh (Walsh) spread spectrums/separate spread spectrum to handle (this is because ± 1 ordinary multiplier does not exchange) between I and Q signal have therefore been obtained to be used for.This aspect can be used to adopt the RAKE receiver of a kind of 2 finger of new WCDMA standard or plural Walsh spread spectrum/separate spread spectrum processor.Can also use the second independent Walsh spread spectrum/separate spread spectrum processor as extra level, and by convening I and Q in position, use it for that bigger Walsh spread spectrum/(this possibility is at reconfigurable radix 2 to separate spread spectrum ₂Structure in realize easily).

Realization shown in Figure 17 only needs the individual operation of NLog (N), and for several CDMA modulating/demodulatings that are coded in together, promptly for big data payload, is very efficiently.

Now, described complex multiplier can be used for realizing the configuration such as the wave filter in frequency field is real, its be used to use accurate random series with very high efficient randomization/separate randomization walsh sequence (when handling several CDMA modulating/demodulating that is coded in together, that is, be used for big data payload (as shown in the CDMA/WCDMA standard)).Owing to only need modulated data is multiplied each other once (for all codings), therefore realized high efficiency, and be not that each sign indicating number is multiplied each other respectively.

Figure 15 has described when the twiddle multiplication device that will be used for the randomization walsh codes is used in parallel organization (from beginning to end), to the conversion of the level of the radix 4 of Walsh spread spectrum/separate spread spectrum function.For for the example of the FFT of radix 2, only need the twiddle multiplication device to become " 1 ".Figure 16 illustrates the Walsh spread spectrum that is used for 16 chips that modulating/demodulating handles/the separate example of frequency expansion sequence.Can use complex multiplier as mentioned above, for example, be used for realizing wave filter, perhaps be used to adopt accurate sequence randomization/separate randomization walsh sequence at random at frequency domain.Owing to only need modulated data is multiplied each other once (for all codings), therefore realized high efficiency.Do not require each sign indicating number is multiplied each other respectively.

The multiplexed method of reconfigurable hybrid flow alignment:

At last, be shown schematically in the common structure of the restructural equipment that is used to realize common orthogonal transformation in Figure 18, it is used for radix 2 _iThe situation of the butterfly conversion of/x.By using radix 2, radix 2 ₂, radix 2 ₃, radix 4, radix 8 or the like butterfly unit, can realize described computing unit.Described evaluation method selecting optimal equipment ground comprises restructural RAM bunch and restructural BUS multiplexer module 180, comprises computing unit 182, restructural multiplication module 184, control and storage element 186 and the detecting device 188 of one or more butterfly units.At each level place of described conversion, unit 186 be modified in according to described conversion multiplier in 2 the butterfly unit coefficient (corresponding coefficient can value 1,1, j, the last value of-j}).The result of unit 182 operations is stored in the register of unit 180 (it is also controlled by unit 186).The size of described register changes between level and level.According to described level and algorithm, a part of stored data is inserted in the restructural multiplication module 184, use the coefficient that forms by control and storage element 186 to multiply by data.The result of described multiplication is stored in the module 180.The multiplexer of module 180 is used for carrying out multiplexed to stored data.Obviously, only butterfly unit and a multiplexer each level can be used for, and described butterfly unit and multiplier each level can be used for once more by the described hardware of reconstruct simply.

Title the having-pending application 11/071 that we proposed on March 3rd, 2005 for " Low-Power ReconfigurableArchitecture For Simultaneous Implementation Of DistinctCommunication Standards " (acting on behalf of scheme 66940-021), described an above-mentioned application-specific in 340, quoted its content with for referencial use.Figure 18 has shown and has described in this total-pending application and the structural drawing of the system of prescription.

Correspondingly, as described at Figure 19, the embodiment that is fabricated to the integrated chip that satisfies the said chip structural requirement comprises following basic functions element:

CPU190 is preferred less relatively computer processing unit, for following be essential: (a) member of opertaing device (configware) part, promptly, network-bus 192, I/O module 194, RAM module 196, (one or more) huge unit (megafunction) module 198, interconnecting modules 200, flash memories module 202 and clock 204, and (b) according to the agreement by the handled signal of chip, fixing (one or more) huge unit module 198 and bus 192, I/O module 194, RAM module 196, interconnecting modules 200, the configuration of flash memories module 202 and clock 204.CPU190 also can be used in and calculates less important and simple distribution services (assignment) or task, and the bus of be used to interconnect huge unit and I/O module is configured.

Network-bus 192 can be reconstructed according to agreement.I/O module 194 is reconfigurable I/O module preferably, and it is connected to the outside with chip.Its task comprises " composing software " that receives application algorithm, and receives the input data and transmit the treated data of being exported.RAM196 is a random access memory, and it preferably is configured to storage " composing software instruction ", and buffer memory and buffered data.Huge unit module 198 preferably is configured to comprise two or more application functions, i.e. agreement, main application function, handle these agreements by calculating as the territory of each application function of function with given efficacy.Under current situation, huge unit module 198 is configured to comprise one or more orthogonal transformation, perhaps its any combination of describing herein.Interconnecting modules 200 preferably includes reconfigurable net bus, and all component that it connects chip comprises CPU 190, I/O module 194, RAM module 196, huge unit module 198 and flash memories 202 and clock module 204.Interconnecting modules can also be configured to carry out less important and simple distribution services or task, preferably in extra storer.At last, flash memories 200 is preferably used for storage data when chip moves its program.Flash memories preferably adopts the form of EEPROM, this form allows in a programming operation position of a plurality of storeies to be wiped or write, when using it to read and write in different positions simultaneously with convenient system, it can carry out work with higher effective speed.It should be understood that for complicated operations not too, can use the storer of other type.Preferentially, by adopt do not need energy in chip, to keep information mode with information stores on silicon chip, with information stores on flash memories.Therefore, can cancel power supply, and not need to consume any power consumption and just information can be kept on the flash memories chip.In addition, flash memories can provide quick read access time and solid-state shock resistance impedance, makes flash memories desirable especially in application, such as the data storage on the battery supply set of portable phone and PDA and so on.

Therefore, described thus structure can be implemented as integrated circuit.Believe that described structure can be adapted to the orthogonal signal of any kind, wherein vector can change (real number and complex vector) in size.Such orthogonal signal can comprise, and be not limited to the FFT conversion, inverted-F FT conversion (IFFT) or as discrete cosine/sine transform (DCT and DST), any accessory products of the row of Walsh-Hadamard transform, perhaps as CDMA DSSS spread spectrum/separate any accessory products of spread spectrum and so on, and the two or more any algorithm that combines in these algorithms, with such other functions (functionality), for example, use the filtering of the cascade of FFT and IFFT conversion, it can also be used to equalization, the Hilbert conversion, prediction, difference, be correlated with or the like.

Disclosed herein structure of the present invention, and all elements wherein all are included in the scope of at least one claim of back.Element in the chip structure disclosed in this invention all is claimed, and its purpose is not in order to limit the explanation of claim.

Claims

1, a kind of reconfigurable device that is used for carrying out the fast orthogonal transforms of vectors in a plurality of level, the size of vector is N, and wherein, N is variable, and the quantity of level is the function of N, and described device comprises:

Computing unit, it is configured and is arranged to and comprises one or more butterfly units;

Comprise one or more modules that are coupled to the multiplier of described computing unit output terminal, its all butterflies that are configured and are arranged at least one grade of carrying out described conversion are calculated;

Storage unit, it is configured and is arranged to intermediate result and pre-determined factor that the described butterfly of storage is calculated, carries out the usefulness of each butterfly calculating for described computing unit, and described storage unit comprises storer and multiplexing structure;

Multiplexer module, it is configured and is arranged to the described computing unit that is used for a level time multiplexing is carried out in all butterflies calculating of described conversion, thereby only needs a computing unit for described level; And

Controller, it is configured and is arranged to provides coefficient to described computing unit, and is controlled at the size and the multiplexing structure of the storer in the described storage unit;

Wherein, the size and the multiplexing structure of the coefficient of the described multiplier of each grade, the coefficient of described computing unit, described storer are made amendment as the function of N value.

2, reconfigurable device as claimed in claim 1 wherein, adopts the described butterfly unit of one of following structure configuration: radix 2, radix 2 ₂, radix 2 ₃, radix 4 or radix 8.

3, reconfigurable device as claimed in claim 1, wherein, described storer is a fifo shift register.

4, reconfigurable device as claimed in claim 1, wherein, the length of described storer is the function of the level of described conversion.

5, reconfigurable device as claimed in claim 1, wherein, the length of described storer is along with each continuous level is successively decreased.

6, reconfigurable device as claimed in claim 5 wherein, is that each level is adjusted with the length of described storer as the function of N value.

7, reconfigurable device as claimed in claim 6, wherein, described multiplexer module comprises the input/output module that is connected to described computing unit.

8, reconfigurable device as claimed in claim 1, wherein, N changes in preset range, also comprises clock unit, and described clock unit is configured and is arranged to the clock frequency that provides according to input sample speed in whole described preset range.

9, reconfigurable device as claimed in claim 8, wherein, described structure comprises a plurality of computing units, described a plurality of computing unit is arranged to hardware, so that to described hardware, adapt to described whole preset range M by transformed mappings with preset range, and, when described conversion makes unwanted computing unit invalid during less than M.

10, reconfigurable device as claimed in claim 8, wherein, described structure comprises a plurality of computing units, described a plurality of computing unit is arranged to hardware, so that adapt to than little " m " of described whole preset range M, and described level is shared hardware at least in part, to carry out the big conversion than described " m ".

11, reconfigurable device as claimed in claim 1, wherein, each level needs N/2 calculating.

12, reconfigurable device as claimed in claim 1 also comprises a plurality of computing units, and each all is used for each described level, and realizes described computing unit, so that pipeline organization is provided.

13, reconfigurable device as claimed in claim 1, also comprise a plurality of computing units, each all is used for each described level, and realizes described computing unit, so that the structure that adopts one or more configuration in the following type is provided: streamline, iteration and parallel.

14, reconfigurable device as claimed in claim 1 wherein, is realized the whole frame of described conversion in N clock period.

15, reconfigurable device as claimed in claim 1, wherein, described butterfly unit comprises the structure of radix 2.

16, reconfigurable device as claimed in claim 1, wherein, described butterfly unit comprises the structure of radix 4.

17, reconfigurable device as claimed in claim 16 wherein, is realized the whole frame of described conversion in N/2 clock period.

18, reconfigurable device as claimed in claim 1, wherein, described computing unit, storage unit and multiplexer module are included in the conversion accelerator, and wherein, described conversion accelerator is configured and is arranged to execution each butterfly of all grades in iterative process and calculates.

19, reconfigurable device as claimed in claim 1, wherein, described storage unit is configured and is arranged to and comprises filter coefficient, the multiplier of the computing unit of the final stage of described conversion is suitable for one or more the multiplying each other in the output of described final stage and the described filter coefficient, to produce the output through filtering.

20, reconfigurable device as claimed in claim 19, wherein, described output through filtering is applied to a plurality of grades input end of the inverse transformation of described orthogonal transformation, wherein, in the described level each comprises computing unit, and described unit forms pipeline organization.

21, reconfigurable device as claimed in claim 1, wherein, described conversion is Fast Fourier Transform (FFT).

22, reconfigurable device as claimed in claim 21, wherein, described Fast Fourier Transform (FFT) comprises different radixes.

23, reconfigurable device as claimed in claim 1, wherein, described vector comprises real number vector and complex vector.

24, reconfigurable device as claimed in claim 1, wherein, described conversion comprises the Walsh orthogonal transformation.

25, a kind of integrated chip comprises the reconfigurable structures that is used for carrying out in a plurality of level the fast orthogonal transforms of vectors, and the size of vector is N, and wherein, N is variable, and the quantity of level is the function of N, and described structure comprises:

26, a kind of communication system that comprises integrated chip as claimed in claim 25.

27, communication system as claimed in claim 26 also comprises the detecting device that is used for determining described vector size.

28, a kind of method of in a plurality of level, carrying out the fast orthogonal transforms of vector, the size of vector is N, and wherein, N is variable, and the quantity of level is the function of N, and described method comprises:

Configuration and arrangement computing unit are so that it comprises one or more butterfly units;

Configuration and arrangement module, so that it comprises one or more multipliers that are coupled to described computing unit output terminal, configuration and arrange described one or more butterfly unit and described one or more multiplier is so that be all described butterfly calculating of at least one grade execution of described conversion;

Intermediate result and pre-determined factor that described butterfly is calculated are stored in the storage unit, carry out the usefulness of each butterfly calculating for described computing unit, and described storage unit comprises storer and multiplexing structure;

The described computing unit that use is used for a level carries out time multiplexing to all butterflies calculating of described conversion, thereby only needs a computing unit for described level; And

Provide coefficient to described computing unit, and be controlled at the size and the multiplexing structure of the storer in the described storage unit;

29, a kind of method of in a plurality of level, carrying out the fast orthogonal transforms of vector, the size of vector is N, and wherein, N is variable, and the quantity of level is the function of N, and described method comprises:

Utilize the restructural group of butterfly unit and the restructural group of multiplier, the restructural group of described butterfly unit and the restructural group of described multiplier are configured and are arranged to and make at least one computing unit to be configured and to be arranged to: comprise at least one butterfly unit and the multiplier that is coupled to described butterfly unit output terminal, so that described computing unit can be carried out all described butterflies at least one grade of described conversion and calculate, and utilize reconfigurable memorizer, its be coupled to described computing unit so as to store intermediate result that described butterfly calculates and pre-determined factor for the usefulness of carrying out each butterfly calculating;

Wherein, the coefficient and the memory size of each grade are made amendment as the function of N value.

30, a kind of system that in a plurality of level, carries out the fast orthogonal transforms of vector, the size of vector is N, and wherein, N is variable, and the quantity of level is the function of N, and described system comprises:

The restructural group of butterfly unit and the restructural group of multiplier are configured and are arranged to and make at least one computing unit to be configured and to be arranged to: comprise at least one butterfly unit and the multiplier that is coupled to described butterfly unit output terminal, carry out all described butterflies and calculate so that described computing unit can be at least one grade of described conversion, and reconfigurable memorizer be coupled to described computing unit so as to store intermediate result that described butterfly calculates and pre-determined factor for the usefulness of carrying out each butterfly calculating;