US20050256917A1 - Address generators integrated with parallel FFT for mapping arrays in bit reversed order - Google Patents
Address generators integrated with parallel FFT for mapping arrays in bit reversed order Download PDFInfo
- Publication number
- US20050256917A1 US20050256917A1 US11/187,673 US18767305A US2005256917A1 US 20050256917 A1 US20050256917 A1 US 20050256917A1 US 18767305 A US18767305 A US 18767305A US 2005256917 A1 US2005256917 A1 US 2005256917A1
- Authority
- US
- United States
- Prior art keywords
- address
- bit
- log2n
- array
- reversed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
Definitions
- the present invention is a method and apparatus to reduce the amount of required memory and instruction cycles when implementing Fast Fourier Transforms (FFTs) on a computer system. More particularly, the preferred embodiment of the present invention optimizes FFT software using in-place bit reversal (IPBR) implemented on a processor capable of bit reversed incrementation.
- IPBR in-place bit reversal
- FFTs Fast Fourier Transforms
- the Fourier transform is a mathematical operator for converting a signal from a time-domain representation to a frequency-domain representation.
- the inverse Fourier transform is an operator for converting a signal from a frequency-domain representation to a time-domain representation.
- the Discrete Fourier Transform may be viewed as a special case of the continuous form of the Fourier transform.
- the DFT determines a set of spectrum amplitudes and phases or coefficients from a time-varying signal defined by samples taken at discrete time intervals.
- FFT fast Fourier transform
- Some patents in the field of processing FFTs include U.S. Pat. No. 3,673,399 to Hancke et al for FFT PROCESSOR WITH UNIQUE ADDRESSING; U.S. Pat. No. 6,035,313 to Marchant for a MEMORY ADDRESS GENERATOR FOR AN FFT; U.S.
- a fast Fourier transform of the type known as a radix-two dimension-in-time FFT
- the size of the transform is successively halved at each stage.
- a 32-point FFT is split into a pair of 16-point FFTs, which are in turn split into four 8-point FFTs, then eight 4-point FFTs, and finally sixteen 2-point FFTs.
- the resulting computation for a 32-point FFT is shown in the signal flow graph of FIG. 2 .
- the quantities on the left hand side of the signal flow graph ranging from x(0) to x(31) are the sampled inputs to the FFT, while the signals appearing at the right-hand side of the signal flow graph and numbered 0 through 31 are the resulting FFT coefficients.
- the signal flow graph illustrates that there are five passes or phases of operation, derived from the relationship that the number 32 is two to the fifth power.
- the convention used in the signal flow graph is that an arrowhead represents multiplication by the complex quantity Wk adjacent to the arrowhead.
- the small circles represent addition or subtraction as indicated in FIG. 2 a .
- the W values are usually referred to as “twiddle factors” and represent phasors of unit length and an angular orientation which is an integral multiple of 2B/32.
- An aspect of FFT computation is that the results of each butterfly computation may be stored back in memory in the same location from which the inputs to the butterfly were obtained. More specifically, the C and D outputs of each butterfly may be stored back in the same locations as the A and B inputs of the same butterfly.
- This FFT computation is referred to as an “in-place” algorithm. Most discrete transforms are executed “in-place” to conserve memory, which in turn reduces system size, power consumption, cost, and allocates memory for other tasks. For such “in-place” FFTs, the reordering required to counteract the effect of the transform decompositions is achieved by a particular permutation of the elements of the data sequence.
- Bit-reversed address mapping is commonly used in performing radix-2 FFTs.
- radix-2 FFT When the radix-2 FFT is computed, data must be rearranged in bit-reversed order.
- the FFT process uses an algorithm to pre-place data in memory in bit-reversed order, typically prior to executing the butterfly computations.
- FFT efficiency is a high priority in the computer processor industry.
- the FFT algorithm has high intrinsic value and is widely used.
- the instruction cycle requirement of custom optimized FFT software is the accepted benchmark standard for measuring a processor's computational efficiency.
- the number of FFTs/sec executed is a more accurate relative measure of a processor's computational power than MIPs (millions of instructions per second).
- FFT software requiring fewer resources enhances both the real and projected capabilities of the processor.
- DSPs Digital Signal Processors
- DSPs Digital Signal Processors
- DSPs Digital Signal Processors
- This is done by special instructions that allow address registers to be incremented so that carry (or borrow) bits propagate toward less significant bits (backward).
- carry bits must propagate toward more significant bits.
- the present invention is primarily intended to optimize FFT software implemented on a processor capable of bit-reversed address register incrementing in the described manner. However, the invention also has applications on processors that lack this capability.
- Table I listing a binary address, contents of memory before bit reversed ordering, the corresponding bit reversed binary addresses, and contents of memory after bit reversed ordering.
- an input array is stored in 2 ⁇ circumflex over ( ) ⁇ (log2N+M) contiguous words of memory, beginning at start address S_in.
- the array has 2 ⁇ circumflex over ( ) ⁇ log2N elements and each element is stored in 2 ⁇ circumflex over ( ) ⁇ M contiguous words of data memory.
- four words of contiguous memory would accommodate two words of precision for both the real and imaginary part of complex input data elements.
- Sequential output array elements are rearranged in bit reversed order relative to the input array.
- the output buffer must be “aligned”, i.e., S_out for S_in must be a multiple of 2 ⁇ circumflex over ( ) ⁇ (log2N+M) for bit reversed address register incrementation to work properly.
- Out of place bit reversal refers to the technique of bit reversing an input data array so that the output data array falls elsewhere in data memory, i.e., S_in ⁇ S_out
- IPBR in place bit reversal
- OOPBR may be advantageous if input data is located in slower, hence cheaper, memory, and faster “scratch” or “volatile” memory is available to generate the bit reversed output array. The subsequent FFT operations on the bit reversed array exploit the faster memory.
- the input data for the FFT is already located in fast data memory.
- the input data may be arrived at as the result of many computations, and for optimal reduction of required cycles, the FFT input array may already be in fast memory.
- OOPBR increases the amount of fast data memory required by the entire FFT by a factor of two. This is the case because the rest of the FFT embodies an intrinsically in place algorithm, requiring no additional data memory other than the input array itself.
- the cycles required for IPBR can be made more competitive relative to OOPBR, for many applications the additional data memory requirement of OOPBR cannot be justified.
- the second and third columns of Table II illustrate the same sequence of address pairs given in columns one and three of Table I.
- the fourth column indicates which address pairs are needed for IPBR, i.e., unique address pairs referencing data that needs to be swapped.
- the fourth column of Table II also illustrates that for an array of eight elements, the address pair generator conventionally used for IPBR produces useful address pairs for address pair numbers two and four, which is only two out of eight bit reversed pairs.
- a flawed IPBR algorithm is now described to illustrate the problems encountered attempting to optimize IPBR.
- the first address register is initialized to S_in, and each iteration of this first address register is advanced linearly to reference the next array element in their natural order.
- a second address register is also initialized to S_in and is incremented each iteration in a bit reversed manner to obtain the corresponding bit reversed version of the first address.
- a new pair of addresses is generated each iteration, as illustrated by columns 2 and 3 of Table II.
- the contents of memory referenced by the first and second address registers are exchanged. This technique will work for OOPBR. But for IPBR, all the self-reversed address contents are needlessly exchanged once.
- the conventional IPBR algorithm in the prior art involves a modification of this flawed approach.
- the conventional IPBR algorithm generates address pairs in a manner identical to the described flawed algorithm.
- the swap is only executed if the address generated by linear incrementing is less than the address produced by bit-reversed incrementing.
- Note the criterion of the first address being less than the second identifies the first occurrences of useful address pairs for IPBR in Table II.
- This condition for swapping eliminates transferring data from self-reversed addresses and prevents swapping for one of the redundant pairs of non-self-reversed addresses.
- Implementing the conditional swapping typically requires transferring both address registers into accumulators, subtracting, and conditionally branching. For this reason, typical IPBR implementations require two to ten times as many instruction cycles as OOPBR implementations.
- the conventional IPBR method is inefficient because it relies on an address pair generator that yields extraneous address pairs.
- the process facilitates the identification of computationally efficient patterns for sequentially generating a unique set of bit reversed address pairs.
- Five exemplary new IPBR methods and modifications of these methods are presented.
- the size of the array to be bit reversed is 2 ⁇ circumflex over ( ) ⁇ (log2N).
- optimized program code implementing Method 1 requires minor changes to work for odd and even log2N.
- optimized code implementing Method 1 works for all values of log2N.
- Method 2 further reduces cycles for odd log2N.
- Method 3 reduces cycles for the even log2N arrays relative to Method 1 .
- Method 4 is similar to Method 1 , however Method 4 does not pose any problem for processors with only one address increment register.
- Method 1 is unique in that it reduces the alignment requirement.
- Method 5 extends Method 3 to work for odd log2N.
- Methods 1 m , 2 m , 3 m , and 4 m are modifications of Method 1 , 2 , 3 , and 4 respectively. All these modified Methods require only two address registers. The cycle count for Method 2 m and Method 2 will be very close, if not identical. The other modified methods require fewer address registers, but increase the number of nested inner loops. Thus Methods 1 m , 3 m and 4 m may reduce or increase cycles relative to their un-modified counterparts, depending on the processor.
- IPBR software that removes the typical input buffer alignment restriction for bit reversed addressing is an application for this efficient process. This application is important because the rest of an FFT can be implemented without any buffer alignment restriction. By giving up some of the cycles this invention saves, the requirement for input buffer alignment is completely removed. Efficient removal of the alignment requirement may require inner loops that always bit reverse increment the same element of the address pair. This can make Methods 1 and 4 the optimal choice for IPBR without an alignment restriction. Method 1 is unique in that even without alignment removal, its inherent alignment requirement is relaxed to 2 ⁇ circumflex over ( ) ⁇ (log2N/2 ⁇ 1) for even log2N and 2 ⁇ circumflex over ( ) ⁇ ((log2N ⁇ 1)/2) for odd log2N. All other methods have an inherent 2 ⁇ circumflex over ( ) ⁇ (log2N) alignment requirement.
- the present invention improves the in-place bit reversal (IPBR) process on computer processors and systems by defining an address generator for generating address pairs used for processing an input array using IPBR in parallel with processing a stage of a Fast Fourier Transform (FFT).
- IPBR in-place bit reversal
- FFT Fast Fourier Transform
- the present invention creates an address pair generator that is used to combine IPBR and one FFT stage.
- Computing the IPBR and the first FFT stage in parallel increase processing efficiency by removing instructions to store output from a stand-alone IPBR mapping and then fetch the same data as input for the FFT stage.
- FIG. 1 illustrates a decisional flowchart to choose a method of IPBR
- FIG. 2 is an illustrative signal flow graph of a fast Fourier transform in the prior art
- FIG. 2 a is an illustration of computations made in FIG. 2 ;
- FIG. 3 is an illustrative graph of a conventional IPBR address generation
- FIG. 4 is an illustrative graph of Method 1 for IPBR address generation
- FIG. 5 is an illustrative graph of Method 1 for IPBR address generation
- FIG. 8 is an illustrative graph of Method 1 IPBR address generation for odd log2N;
- FIG. 9 is an illustrative graph of Method 4 IPBR address generation for odd log2N.
- FIG. 10 is an illustrative graph of Method 2 IPBR address generation for odd log 2N;
- FIG. 11 is an illustrative graph of Method 5 IPBR address generation
- FIG. 12 is an illustrative graph of Method 2 m IPBR address generation
- FIG. 13 is an illustrative graph of Method 1 m IPBR address generation
- FIG. 14 is an illustrative graph of Method 4 m IPBR address generation
- FIG. 15 is a flowchart of an exemplary embodiment that combines IPBR and one Fast Fourier Transform stage
- FIG. 16 is an illustrative graph of the combined FFT stage for IPBR address generation according to Method 1 C;
- FIG. 17 is an illustrative graph of the combined FFT stage for IPBR address generation according to Method 4 C.
- the preferred and alternative exemplary embodiments of the present invention include methods of in place bit reversal (IPBR) that are computationally efficient patterns to generate sequential address pairs for computing fast Fourier transforms (FFTs) in parallel with the address pair generation, in a processor.
- IPBR in place bit reversal
- FFTs fast Fourier transforms
- FIG. 1 To decide which IPBR methods is most efficient for a specific application, reference is made to the decisional flowchart of FIG. 1 . Assume an input array 10 is stored in 2 ⁇ circumflex over ( ) ⁇ (log2N+M) contiguous words of memory, beginning at start address S_in. The array has 2 ⁇ circumflex over ( ) ⁇ log2N elements and each element is stored in 2 ⁇ circumflex over ( ) ⁇ M contiguous words of data memory. For example, four words of contiguous memory would accommodate two words of precision for both the real and imaginary part of complex input data elements.
- the three sets of addresses are defined so that simple and efficient means exist for systematically stepping through every address in set A.
- the “filtered” conventional IPBR address pair generator defined as the conventional IPBR address generator after extraneous pair removal, is segregated using the first way.
- the “filter” accepts only address pairs with first address, given by “a”, that satisfy a ⁇ bit_rev(a).
- xy the number with MSBs equal to x
- LSBs equal to y.
- a ⁇ bit_rev(a) implies xy ⁇ bit_rev(xy) and thus x ⁇ bit_rev(y) so the bit reversed Q LSBs are greater than the Q MSBs.
- this method includes a criterion equivalent to the conventional IPBR criterion, but uses a more useful form of this criterion earlier in the conceptual process to avoid later extraneous pair removal.
- Method 1 is unique in that even without being modified for alignment removal, its inherent alignment requirement is relaxed to 2 ⁇ circumflex over ( ) ⁇ (log2N/2 ⁇ 1) for even log2N and 2 ⁇ circumflex over ( ) ⁇ ((log2N ⁇ 1)/2) for odd log2N. All other methods have a 2 ⁇ circumflex over ( ) ⁇ (log2N) alignment requirement.
- the figure is a decisional flowchart providing selections to implement specific methods for address pair generators based upon certain information.
- the address generators can perform without an alignment restriction or with merely a relaxed alignment restriction.
- Methods 1 and 1 m are appropriate 12 . If an elimination, instead of reduction, of the alignment constraint 14 is preferred, then Methods 1 , 1 m , 4 , and 4 m are appropriate 16 .
- the address generator generates bit reversed addresses for an FFT with a size log2N input array 18 for use on a digital signal processor or other processing means capable of performing FFT operations.
- IPBR Methods 1 , 3 , 4 , and 5 should be avoided 22 .
- Methods 1 and 1 m should be avoided 26 in processing an FFT.
- Methods 2 , 2 m , 3 , and 3 m should not be used.
- specific methods can be chosen for optimal reduction of MIPS while processing.
- Method 2 is the most efficient method in most operations 34 .
- Method 3 is the most efficient method in most operations 38 .
- x, y plots are used to plan the path to follow with a method prior to defining the method itself.
- Specific cases for IPBR methods of the present invention and the conventional method are plotted in FIGS. 3-14 .
- Each IPBR method generates a sequence of address pairs. The first address of an address pair is represented by AR 1 and the second address by AR 2 .
- Sequential AR 1 and sequential AR 2 values are shown in the plots.
- Each square in the plots, formed by the x and y axis grid, represents the address of a unique element in the input array.
- every array address is represented by one square.
- the x axis value gives the three most significant bits (MSB) of an address
- the y axis value gives the three least significant bits (LSB) of a six bit address.
- Address coordinates are offset by (1 ⁇ 2, 1 ⁇ 2) to force the plots into the middle of a square made by the plot's grid.
- the address corresponds to the square's lower left corner coordinates.
- the first addresses of each bit reversed pair (the AR 1 s ) are graphed using small circles.
- the second address of the each address pair (the AR 2 s ) are graphed using small squares.
- Sequential AR 1 address values are connected with a dashed line connecting the circles.
- Sequential AR 2 address values are connected with a solid line connecting the small squares.
- FIG. 3 illustrates the sequence of addresses generated using the conventional IPBR method found in the prior art.
- both a circle and a square symbol land on every grid square in FIG. 3 .
- the address generation scheme “lands” on every square twice.
- the conventional address generation scheme has three computational penalties: (1) because every non-self-bit-reversed address is generated twice, twice as many iterations are needed; (2) testing and conditional branching is required to break the degeneracy and swap only once per address; and (3) the self-bit-reversed addresses are also generated by the sequence of address pairs.
- the address (5,5) corresponds to binary address 101 101 b, which remains the same after bit reversal. Since the memory referenced by a self-bit-reversed address does not need to be exchanged with itself, it wastes additional cycles when the IPBR address generation scheme generates self-bit-reversed addresses.
- the five IPBR methods presented are defined by sequential increments or “moves” of the two “bit reversed pairs” (AR 1 , AR 2 ) and (AR 3 , AR 4 ).
- the array size is 2 ⁇ circumflex over ( ) ⁇ log2N.
- Variable “Q” is defined as the truncated integral quotient of log2N/2, i.e., odd log 2N is (log2N ⁇ 1)/2 and even log2N is (1og2N/2); and where variable “R” is defined as the remainder of log2N/2.
- Method 1 may be implemented for both odd and even log2N input array sizes.
- This address generation scheme generates only unique address pairs referencing data that needs to be swapped for IPBR, thereby eliminating the testing and conditional branching found in methods of the prior art and eliminating the waste of additional instruction cycles due to IPBR address generation for redundant and self-reversed addresses.
- FIG. 4 illustrates the result of Method 1 for generating bit reversed address pairs.
- (1,0) initiates the sequence of first addresses in each sequential address pair generated
- (0,4) initiates the sequence of second addresses.
- the second address gives the first address bit-reversed. Note that every square (unique address) is generated only once, and no self-bit-reversed addresses are generated.
- the address generation scheme never lands on the (5,5) square of address 101 101b, which thus has no circle or square symbol in FIG. 4 . Because the address generation scheme generates only unique address pairs referencing data that needs to be swapped for IPBR, the testing and conditional branching is eliminated.
- IPBR Methods presented can be modified in three different ways by replacing part or all of the address pair sequence with a “topologically similar” sequence. Variations of the IPBR Methods include 1) x and y axis inversions of the original sequence, and 2) reversing the order of the original subsequences, 3) replacing an (A,B) address pair with (B,A) address pair for arbitrary numbers of terms in sequences.
- Method 1 uses the first way of defining three sets, so the y axis data is bit reversed.
- Set A contains all the array element addresses with bit_rev(y) ⁇ x
- Set B contains addresses with bit_rev(y)>x
- bit_rev(y) x.
- Methods 3 and 4 use the “second” way of bit reversing the x axis data.
- Set A contains all the addresses with y>bit_rev(x)
- FIG. 5 for Method 1 .
- Set A is the lower triangle
- Set B the upper triangle
- Set C elements lie along the diagonal.
- Any method can be altered by interchanging the order of the first and second addresses of an address pair, which is a third way of defining sets for bit reversal. Such exchanges may be favorable for reducing program code or cycles but should not be thought of as producing a different address pair generator that is not included in this invention. The only difference is that in alternating subsequences, the choice of first and second address is exchanged. Such an exchange does not result in a new address pair, and is therefore an IPBR address pair generator within the scope of the present invention.
- the address pair sequence generated for Method 1 is defined by all the values that AR 1 , AR 2 take on after moves that affect these values (not Move 3 ).
- FIG. 7 illustrates Method 3 .
- set A For the segregation into three sets for odd log2N in FIG. 8 , set A is in the lower triangle, which is systematically covered by the first address of the Method 1 IPBR address pair generator.
- the self reversed set, C forms a line with slope 2 .
- the vertical axis data referred to as zy, has been prefixed by the middle bit z.
- set A For the new zy vertical axis in FIG. 8 , set A is defined by (bit_rev(zy)>>1) ⁇ x. This is equivalent to bit_rev(y) ⁇ x. Thus bit_rev(y) ⁇ x defines set A here.
- Method 2 is a special case where Set A is defined as the union of two sets with reversed inequalities, depending on whether the Q+1 st bit is zero or one. This definition of Set A facilitates a special technique exploited by Method 2 for continuing to use the same address advance increment scheme even when faced with the Set A zy-axis vertical boundary.
- the address sequence “wraps around” while continuing to use the same increment with no special treatment required for handling Set A's boundary.
- Method 2 cannot easily be extended to the even log2N case, so Method 2 only works for odd log2N.
- Method 3 can be extended to work for odd log2N. This is done by Method 5 , which reduces to Method 3 for log2N. Combining even and odd log2N capability in Method 5 is awkward, however. For some applications branching to Method 2 and 3 for odd and even log2N will be preferable to Method 5 .
- Method 2 reduces processor cycles over Method 1
- Method 2 also has a 2 ⁇ circumflex over ( ) ⁇ (log2N) alignment requirement not found in Method 1 .
- three moves are implemented as defined in Table V. The third move combines the operations of the first two moves. TABLE V Moves for Method 2 Move 1 Move 2 Move 3
- AR1 AR1 ⁇ 1 ⁇ I0B
- AR2 AR2 ⁇ 1
- AR2 AR2 ⁇ I0B ⁇ 1
- the values of AR 1 , AR 2 after initialization and all moves define the Method 2 address pair sequence.
- Processor cycles may be further reduced over Method 1 for input arrays of an even log2N size by implementing Method 3 .
- Method 3 considers log2N/2 LSBs and MSBs to define the sets A, B, and C for input array elements.
- Method 3 defines input array element set A by those addresses that have address LSBs>bit_rev(address MSBs).
- Method 3 may or may not reduce processor cycles over Method 1 , depending on the processor.
- Method 3 also has a 2 ⁇ circumflex over ( ) ⁇ (log2N) alignment requirement not found in Method 1 .
- To perform Method 3 four moves are implemented as defined in Table VI.
- Method 4 is similar to Method 1 in that it can be implemented for both odd and even log2N input arrays. Differences between the two include implementation for different processor capabilities and how the methods define input sets of array elements. Method 4 may operate on processors with only one address increment register, whereas Method 1 requires more than one such register. Method 4 considers LSBs and MSBs from Q bits to define the sets A, B, and C for input array elements and defines input array element set A by those addresses that have address LSBs>bit_rev(address MSBs). However, Method 4 does not reduce the alignment requirement.
- the address pair sequence for Method 1 is defined by the AR 1 , AR 2 values after Moves 1 and 2 .
- the address pair sequence for Method 4 is also defined by the AR 1 , AR 2 values after Moves 1 and 2 .
- Method 5 extends Method 3 to work for odd log2N. Referring to FIG. 1 , processor cycles may be further reduced over Method 1 for input arrays of an odd log2N size by implementing Method 5 .
- the sequence of address pairs generated by Method 5 is defined by the AR 1 , AR 2 values after initialization and after all moves except Move 4 .
- All the IPBR Methods of the present invention can be modified by replacing part or all of the address pair sequence with a “topologically similar” sequence.
- Variations of the IPBR Methods include 1) reversing the order of the original subsequences, 2) x and y axis inversions of the original sequence, and 3) replacing an (A, B) address pair with (B,A) address pair for arbitrary numbers of terms in sequences.
- Method 1 m , 3 m and 4 m remove the need for auxiliary address registers AR 3 and AR 4 .
- every sequential “move” advances from the prior address pair location without periodically resetting to stored AR 3 , AR 4 values.
- Method 1 m , 3 m and 4 m may reduce some cycles (depending on the processor) but will add to program memory.
- An advantage of Method 1 m , 3 m and 4 m is that they require less address registers to implement.
- Method 2 m does not alter the cycle count, but is exemplary of an x and y axis inversion. Method 2 m “inverts” the entire address pair sequence of Method 2 .
- Method 2 generates a sequence of address pairs that is topologically similar to the original Method 2 shown previously in FIG. 10 .
- the data along both the x and y axis has been inverted. Placing an upside down graph of FIG. 12 on top of FIG. 10 results in a match.
- Method 2 is preferable to Method 2 m only because of a simpler initialization of the address pair sequence.
- This invention is inclusive of topologically equivalent address generation schemes and all address generation schemes that vary in some simple or obvious manner from Method 1 , 2 , 3 , and 4 .
- Method 1 m keeps the same subsequences shown on horizontal and vertical lines in FIG. 5 for Method 1 , but connects these subsequences in a different way.
- Method 4 m is formed by reconnecting the horizontal and vertical lines in a different manner. Note that none of the alternatives given in Table IV are satisfied by the entire Method 4 m address pair sequence given in FIG. 13 . However, all of the individual sub-sequences do satisfy Table IV.
- Any method can be altered by interchanging the order of the first and second addresses of an address pair. Such exchanges may be favorable for reducing program code or cycles but should not be thought of as producing a different address pair generator that is not included in this invention.
- An example of two address pair generators that give an identical address pair sequence, and vary only in the order of the first and second address for an arbitrary number of address pairs, can be illustrated by plots of Method 1 m ( FIG. 13 ) and Method 4 m ( FIG. 14 ).
- Method 1 m FIG. 13
- Method 4 m FIG. 14
- FIG. 13 changing the bit reversed axis from the y-axis to the x-axis results in a sequence of address pairs that is identical to that of FIG. 14 .
- the only difference is that in alternating subsequences, the choice of first and second address is exchanged.
- Such an exchange does not result in a new address pair, and thus a new IPBR address pair generator, outside the scope of the present invention.
- Method 3 m is implemented to reduce processor cycles for input arrays of even log2N size.
- Method 3 m requires only two address registers to operate.
- six moves are implemented as defined in Table XI.
- Method 4 m illustrates a scheme different from Method 1 m for reconnecting subsequences of address pairs.
- five moves are implemented as defined in Table XII.
- Table XII For processors with only one address increment register, note after two Move 1 's the final resulting AR 1 , AR 2 changes are equivalent to Move 5 .
- an address increment register can be reserved for the start address that is subtracted and added to ARx.
- AR 0 only one address increment register, is available for use by the IPBR Methods 1 through 4 presented herein.
- step one can be removed from the inner loop.
- An alternative method generates addresses using out of place bit reversal (OOPBR) to reduce cycles for processors that do not support bit reversed address register incrementation and consequently require many cycles to generate a bit reversed address.
- OOPBR out of place bit reversal
- the conventional OOPBR approach is to generate one address pair per data move. With the present invention, about half as many bit reversed address offsets are generated by using bit reversed offsets twice.
- the OOPBR algorithm copies data from the contents referenced by S_in+AR 1 into S_out+bit_rev(AR 1 ) and copies data referenced by S_in+bit_rev(AR 1 ) into S_out+AR 1 .
- a second address generator is used to generate all self-reversed offsets, and for each self reversed offset only one data transfer is made.
- This OOPBR method removes the start address offset from the address pair sequence generated, and consequently this OOPBR method need not impose any alignment constraints on the input or output buffer.
- IPBR Method 1 is extended for OOPBR applications. Address pairs AR 3 and AR 4 are initialized to zero instead of S_in, because for OOPBR, relative address offsets are generated, not actual addresses. Beyond the moves for the chosen IPBR method, three additional moves are needed. These additional moves only affect one address register. To perform the OOPBR Method, six moves are implemented as defined in Table XIII.
- the present methods can be efficiently implemented even when data elements are represented by multiple contiguous words.
- the initial address pair(s) and all increment registers are multiplied by 2 ⁇ circumflex over ( ) ⁇ M when data elements are represented by 2 ⁇ circumflex over ( ) ⁇ M contiguous words.
- scaling up by 2 ⁇ circumflex over ( ) ⁇ M is normally not needed as demonstrated below.
- methods for dealing with the lowest M bits of the address that needs a bit reversed increment are described below. For some FFTs, each sequential data element may require two or four words of memory.
- An alternative approach is to treat alternating sequential swaps differently. The first data swap lets the two least significant bits advance to three by advancing from the R_MSW to the I_MSW. The second swap starts by swapping I_MSW and advances backwards to swap the R_MSW data last.
- IPBR methods reduce the number of cycles by more than 80% over the conventional method, in most cases. Cycles per address pair are reduced from 14 or 12 cycles down to 4 or 3 cycles and the number of address pairs is reduced in half.
- the modified methods only vary from the corresponding un-modified method in that their preferred implementation use only two address registers.
- the present invention of a design technique for a “stand alone” in-place bit reversal mapping can also be extended for integration of the bit reversed mapping with parallel FFT computations.
- This enables the design of address generators that combine IPBR and one FFT stage.
- Computing IPBR and the first stage in parallel increases efficiency by removing instructions to store output from a standalone IPBR mapping and then fetching the same data as input for the FFT stage.
- self-reversed addresses cannot be ignored. If self-reversed addresses were ignored in the combined method, then the FFT stage would be missing required input from these memory locations.
- an exemplary embodiment may modify a “stand-alone” IPBR bit reversed address pair generator for integration into an FFT stage and assumes the FFTs bit reversed mapping precedes a first radix-2 stage with a one element “skip” in each butterfly 40 .
- the method first extends the 2-D plotted path to hit the self-reversed addresses 42 .
- This extended single address pair generator's first address is designated as “AR 1 .”
- AR 1 This extended single address pair generator's first address is designated as “AR 1 .”
- AR 1 This extended single address pair generator's first address is designated as “AR 1 .”
- AR 1 After each AR 1 move, a four to eight element set of addresses and their corresponding bit reversed compliments are determined 44.
- three address are generated.
- a second address is a self-reversed address of the first and the third is the bit-reversed compliment of the address that is not self-reversed.
- the exemplary combined method processes four pairs of addresses and their bit reversed compliments such that no conflicting addresses will be produced using the in-place memory.
- the preferred address pairs include AR 1 , AR 1 +2 M , AR 1 +2 log2N ⁇ 1+M , AR 1 +2 M +2 log2N ⁇ 1+M and their four bit reversed compliments.
- Each iteration of AR 1 advances according to extended stand-alone IPBR methods described above until a move lands in a set that has not already been covered 46 . Then, each new address set forms the output of a new address generator for one iteration 48 .
- This procedure splits the single address pair generator into a quadruple bit reversed address pair generator that typically makes eight non-self-reversed addresses each iteration. For some iterations only four addresses are generated and two of these addresses are self-reversed.
- the C program segment in Table XVI processes the real parts for the 8th iteration of the quadruple address pair generator.
- Table XVI initializes AR 0 through AR 7 to the values the address generator yields in the 8th iteration. Each element address is suffixed by 0 for real, 1 for imaginary.
- the values of A, B, C, and D are essential to be saved into memory for the procedure. If the processor only has three registers, then D can be placed into an external memory since it is only used once. Some processors only have two registers, but perform parallel processing.
- Table XVII illustrates a basic butterfly computation for the eighth iteration of the Method One combination. Note that each address AR 0 through AR 7 is equal to a bit reversed compliment of a different address (e.g., AR 0 is equal to the bit reversed compliment of AR 2 and AR 2 is equal to the bit reversed compliment of AR 0 ). TABLE XVII Butterfly for the Eighth Iteration Real Numbers for the Method 1 Combination
- FIGS. 16 and 17 illustrate Address Generators for the combination of new “Method 1 C” IPBR and one radix-2 FFT stage. Numbers in the plots indicate the iteration where an address is generated.
- FIG. 16 is a similar plot of Method 1 shown in FIG. 4 with bit reversed LSBs on the y-axis and MSBs on the x-axis. The arrows in FIG. 16 track the path of AR 2 to the 8th iteration.
- the generated addresses in FIG. 16 's plot stop at 10 because at that point the plot becomes full.
- FIG. 4 how Method 1 for generating bit reversed address pairs stepped through the plot such that every square (unique address) is generated only once with no self-reversed addresses generated.
- the squares on a diagonal line between the path on the left and the right sides of the plot were not hit.
- the same diagonal of self-reversed addressed must be hit in order to provide all input to the FFT stage
- FIG. 17 of Method 4 C using the preferred combined procedure corresponds to the plot in FIG. 10 for Method 4 .
- the arrows in FIG. 17 track the path of AR 2 through the 10th address when the plot is filled in.
- Method 1 requires auxiliary address registers to periodically reset the primary address registers.
- the use of multi-tasking address registers to cover more than one of eight new addresses for each iteration may also be implemented. Further, the use of one of the modified methods of the alternative embodiments above can implement the preferred combined method with fewer address registers.
Landscapes
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Theoretical Computer Science (AREA)
- Discrete Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Advance Control (AREA)
Abstract
Reducing the amount of required memory and instruction cycles when implementing Fast Fourier Transforms (FFTs) on a computer system is described. The invention optimizes FFT software using in-place bit reversal (IPBR) implemented on a processor capable of bit reversed incrementation. Enables the design of address generators that combine IPBR and one FFT stage in parallel. Increases efficiency by removing instructions to store output from a stand-alone IPBR mapping and then fetch the same data as input for the FFT stage.
Description
- This application is a continuation-in-part of U.S. patent application Ser. No. 10/097,407, entitled “ADDRESS GENERATORS FOR MAPPING ARRAYS IN BIT REVERSED ORDER,” filed on Mar. 15, 2002.
- The present invention is a method and apparatus to reduce the amount of required memory and instruction cycles when implementing Fast Fourier Transforms (FFTs) on a computer system. More particularly, the preferred embodiment of the present invention optimizes FFT software using in-place bit reversal (IPBR) implemented on a processor capable of bit reversed incrementation.
- Algorithms that perform discrete transforms such as Fast Fourier Transforms (FFTs) are well known. The Fourier transform is a mathematical operator for converting a signal from a time-domain representation to a frequency-domain representation. The inverse Fourier transform is an operator for converting a signal from a frequency-domain representation to a time-domain representation. The Discrete Fourier Transform (DFT) may be viewed as a special case of the continuous form of the Fourier transform. The DFT determines a set of spectrum amplitudes and phases or coefficients from a time-varying signal defined by samples taken at discrete time intervals.
- As is well known, in the mid-1960's techniques were developed for more rapid computation of the discrete Fourier transform. These techniques became known as the fast Fourier transform (FFT), first described in a paper by J. W. Cooley and J. W. Tukey, entitled “An Algorithm for the Machine Calculation of Complex Fourier Series,” Mathematics of Computation (1965), Vol. 19, No. 90, pp. 297-301. Some patents in the field of processing FFTs include U.S. Pat. No. 3,673,399 to Hancke et al for FFT PROCESSOR WITH UNIQUE ADDRESSING; U.S. Pat. No. 6,035,313 to Marchant for a MEMORY ADDRESS GENERATOR FOR AN FFT; U.S. Pat. No. 6,247,034 B1 to Nakai et al for a FAST FOURIER TRANSFORMING APPARATUS AND METHOD, VARIABLE BIT REVERSE CIRCUIT, INVERSE FAST FOURIER TRANSFORMING APPARATUS AND METHOD, AND OFDM RECEIVER AND TRANSMITTER; U.S. Pat. No. 4,823,297 to Evans for a DIGIT-REVERSAL METHOD AND APPARATUS FOR COMPUTER TRANSFORMS; U.S. Pat. No. 5,329,474 to Yamada for an ELEMENT REARRANGEMENT METHOD FOR FAST FOURIER TRANSFORM; U.S. Pat. No. 5,473,556 to Aguilar et al for DIGIT REVERSE FOR MIXED RADIX FFT; and U.S. Pat. No. 4,977,533 to Miyabayashi et al for a METHOD FOR OPERATING AN FFT PROCESSOR.
- In performing a fast Fourier transform of the type known as a radix-two dimension-in-time FFT, the size of the transform is successively halved at each stage. In the illustrative circuit described in
FIG. 2 , a 32-point FFT is split into a pair of 16-point FFTs, which are in turn split into four 8-point FFTs, then eight 4-point FFTs, and finally sixteen 2-point FFTs. The resulting computation for a 32-point FFT is shown in the signal flow graph ofFIG. 2 . The quantities on the left hand side of the signal flow graph, ranging from x(0) to x(31) are the sampled inputs to the FFT, while the signals appearing at the right-hand side of the signal flow graph and numbered 0 through 31 are the resulting FFT coefficients. The signal flow graph illustrates that there are five passes or phases of operation, derived from the relationship that thenumber 32 is two to the fifth power. - The convention used in the signal flow graph is that an arrowhead represents multiplication by the complex quantity Wk adjacent to the arrowhead. The small circles represent addition or subtraction as indicated in
FIG. 2 a. If the input to each of the butterfly computational modules shown inFIG. 2 a is indicated by signal names A and B, and the outputs are indicated by signal names C and D, then the computations performed in the butterfly module are: C=A+BW and D=A−BW. The W values are usually referred to as “twiddle factors” and represent phasors of unit length and an angular orientation which is an integral multiple of 2B/32. - An aspect of FFT computation is that the results of each butterfly computation may be stored back in memory in the same location from which the inputs to the butterfly were obtained. More specifically, the C and D outputs of each butterfly may be stored back in the same locations as the A and B inputs of the same butterfly. This FFT computation is referred to as an “in-place” algorithm. Most discrete transforms are executed “in-place” to conserve memory, which in turn reduces system size, power consumption, cost, and allocates memory for other tasks. For such “in-place” FFTs, the reordering required to counteract the effect of the transform decompositions is achieved by a particular permutation of the elements of the data sequence.
- Bit-reversed address mapping is commonly used in performing radix-2 FFTs. When the radix-2 FFT is computed, data must be rearranged in bit-reversed order. The FFT process uses an algorithm to pre-place data in memory in bit-reversed order, typically prior to executing the butterfly computations.
- Obtaining FFT efficiency is a high priority in the computer processor industry. The FFT algorithm has high intrinsic value and is widely used. The instruction cycle requirement of custom optimized FFT software is the accepted benchmark standard for measuring a processor's computational efficiency. For a specific type of FFT (e.g., in-place, using relocatable data memory, single precision,
radix 2, complex, 256 point, unconditional ½ scaling per butterfly, etc.) the number of FFTs/sec executed is a more accurate relative measure of a processor's computational power than MIPs (millions of instructions per second). FFT software requiring fewer resources enhances both the real and projected capabilities of the processor. - Because an optimized FFT computation includes bit reversed addressing, many DSPs (Digital Signal Processors) include customized instructions to facilitate an efficient implementation of bit reversed addressing. Typically, this is done by special instructions that allow address registers to be incremented so that carry (or borrow) bits propagate toward less significant bits (backward). For normal addition carry bits must propagate toward more significant bits. The present invention is primarily intended to optimize FFT software implemented on a processor capable of bit-reversed address register incrementing in the described manner. However, the invention also has applications on processors that lack this capability.
- Reference is made to Table I, listing a binary address, contents of memory before bit reversed ordering, the corresponding bit reversed binary addresses, and contents of memory after bit reversed ordering. Assume an input array is stored in 2{circumflex over ( )}(log2N+M) contiguous words of memory, beginning at start address S_in. The array has 2{circumflex over ( )}log2N elements and each element is stored in 2{circumflex over ( )}M contiguous words of data memory. For example, four words of contiguous memory would accommodate two words of precision for both the real and imaginary part of complex input data elements. An arbitrary address for data memory containing the input array can be expressed in the form,
(each binary B_k coefficient can be zero or one, and P=0,1,2, . . . (2{circumflex over ( )}M)−1). The corresponding bit reversed address is obtained by reversing the order of the B_k values: - An array has been “bit reversed” after all input data is copied from its original location at address AR1, to its new location at address AR2=bit_rev(AR1). Sequential output array elements are rearranged in bit reversed order relative to the input array. Table I illustrates a bit reversed array for the case log2N=3, M=S_in=S_out=0. The sequential addresses in the bit reversed address column are obtained by incrementing the prior address with 100 binary, and propagating any carry bit that results backwards. Self-reversed addresses occur when AR1=bit_rev(AR1). For typical processors and software optimized to the reduce cycles, the output buffer must be “aligned”, i.e., S_out for S_in must be a multiple of 2{circumflex over ( )}(log2N+M) for bit reversed address register incrementation to work properly.
TABLE I Bit Reversed Mapping of an Exemplary Array Memory before bit reversed Mapping of array Memory after bit reversed mapping of array elements mapping of array Binary Contents of Move contents Data Binary Contents of address memory before bit to Bit reversed move Address memory after bit reversed ordering Binary address reversed ordering 000 001 010 011 100 101 110 111 56 13 −4 23 18 9 −24 66 000 100 010 110 001 101 011 111 000 001 010 011 100 101 110 111 56 18 −4 −24 13 9 23 66 - Out of place bit reversal (OOPBR) refers to the technique of bit reversing an input data array so that the output data array falls elsewhere in data memory, i.e., S_in≠S_out, whereas in place bit reversal (IPBR) refers to the technique of re-ordering elements of an input data array in bit reversed order so that the output array overwrites the input array, i.e. S_in ═S_out. For some applications, OOPBR may be advantageous if input data is located in slower, hence cheaper, memory, and faster “scratch” or “volatile” memory is available to generate the bit reversed output array. The subsequent FFT operations on the bit reversed array exploit the faster memory. For this case the cycles required may exceed the benchmark OOPBR FFT cycles, because the digital signal processor (DSP) manufacturer will measure the benchmark case with both the input and output OOPBR array in the fastest memory. An FFT using OOPBR may have a hidden cycle penalty beyond the bit reversal itself, when the output is eventually copied back to the location of the input array. Computational processes that use more of the available scratch memory than necessary can lead to future problems when converting to an operating system that permits multiple computational processes to interrupt each other.
- For other applications, the input data for the FFT is already located in fast data memory. For example, the input data may be arrived at as the result of many computations, and for optimal reduction of required cycles, the FFT input array may already be in fast memory. In that event, OOPBR increases the amount of fast data memory required by the entire FFT by a factor of two. This is the case because the rest of the FFT embodies an intrinsically in place algorithm, requiring no additional data memory other than the input array itself. In the event that the cycles required for IPBR can be made more competitive relative to OOPBR, for many applications the additional data memory requirement of OOPBR cannot be justified.
- The second and third columns of Table II illustrate the same sequence of address pairs given in columns one and three of Table I. The conventional IPBR address generator yields these address pairs for N=8. The fourth column indicates which address pairs are needed for IPBR, i.e., unique address pairs referencing data that needs to be swapped. The fourth column of Table II also illustrates that for an array of eight elements, the address pair generator conventionally used for IPBR produces useful address pairs for address pair numbers two and four, which is only two out of eight bit reversed pairs.
TABLE II Conventional IPBR Address Pair Generator Results for an N = 8 Element Array Address pair Binary Bit reversed Address pair needed for IPBR number address Binary address mapping array in bit reversed order? 1 000 000 No, self-reversed 2 001 100 YES 3 010 010 No, self-reversed 4 011 110 YES 5 100 001 No, redundant with address pair 26 101 101 No, self-reversed 7 110 011 No, redundant with address pair 48 111 111 No, self-reversed - A flawed IPBR algorithm is now described to illustrate the problems encountered attempting to optimize IPBR. The first address register is initialized to S_in, and each iteration of this first address register is advanced linearly to reference the next array element in their natural order. A second address register is also initialized to S_in and is incremented each iteration in a bit reversed manner to obtain the corresponding bit reversed version of the first address. Thus a new pair of addresses is generated each iteration, as illustrated by
columns - The conventional IPBR algorithm in the prior art involves a modification of this flawed approach. The conventional IPBR algorithm generates address pairs in a manner identical to the described flawed algorithm. However, instead of always swapping the contents referenced by each address pair that is generated, the swap is only executed if the address generated by linear incrementing is less than the address produced by bit-reversed incrementing. Note the criterion of the first address being less than the second identifies the first occurrences of useful address pairs for IPBR in Table II. This condition for swapping eliminates transferring data from self-reversed addresses and prevents swapping for one of the redundant pairs of non-self-reversed addresses. Implementing the conditional swapping typically requires transferring both address registers into accumulators, subtracting, and conditionally branching. For this reason, typical IPBR implementations require two to ten times as many instruction cycles as OOPBR implementations.
- The conventional IPBR method is inefficient because it relies on an address pair generator that yields extraneous address pairs.
- In co-pending parent U.S. patent application Ser. No. 10/097,407, an IPBR process based on the method that yields every non-self-bit-reversed address in the input array only once, thereby avoiding production of extraneous address pairs, is described. To optimize IPBR, every non-self-bit-reversed address in the input array needs to be generated only once, while making simple, computationally efficient increments, or moves, away from the previous pair of bit reversed addresses. The address pair generator of the present invention independently determines, or moves, only one address in each address pair. For any address pair, bit reversal of one address uniquely defines the other address.
- The process facilitates the identification of computationally efficient patterns for sequentially generating a unique set of bit reversed address pairs. Five exemplary new IPBR methods and modifications of these methods are presented. The size of the array to be bit reversed is 2{circumflex over ( )}(log2N). For use on a DSP capable of bit reversed incrementation of address registers but having only one address increment register, optimized program
code implementing Method 1 requires minor changes to work for odd and even log2N. For processors with more than one address increment register available, optimizedcode implementing Method 1 works for all values of log2N.Method 2 further reduces cycles for odd log2N.Method 3 reduces cycles for the even log2N arrays relative toMethod 1.Method 4 is similar toMethod 1, howeverMethod 4 does not pose any problem for processors with only one address increment register.Method 1 is unique in that it reduces the alignment requirement.Method 5 extendsMethod 3 to work for odd log2N. -
Methods Method Method 2 m andMethod 2 will be very close, if not identical. The other modified methods require fewer address registers, but increase the number of nested inner loops. ThusMethods - IPBR software that removes the typical input buffer alignment restriction for bit reversed addressing is an application for this efficient process. This application is important because the rest of an FFT can be implemented without any buffer alignment restriction. By giving up some of the cycles this invention saves, the requirement for input buffer alignment is completely removed. Efficient removal of the alignment requirement may require inner loops that always bit reverse increment the same element of the address pair. This can make
Methods Method 1 is unique in that even without alignment removal, its inherent alignment requirement is relaxed to 2{circumflex over ( )}(log2N/2−1) for even log2N and 2{circumflex over ( )}((log2N−1)/2) for odd log2N. All other methods have an inherent 2{circumflex over ( )}(log2N) alignment requirement. - The present invention improves the in-place bit reversal (IPBR) process on computer processors and systems by defining an address generator for generating address pairs used for processing an input array using IPBR in parallel with processing a stage of a Fast Fourier Transform (FFT). The method optimizes FFT software using IPBR that can be implemented on a processor.
- The present invention creates an address pair generator that is used to combine IPBR and one FFT stage. Computing the IPBR and the first FFT stage in parallel increase processing efficiency by removing instructions to store output from a stand-alone IPBR mapping and then fetch the same data as input for the FFT stage.
- Preferred embodiments of the invention are discussed hereinafter in reference to the drawings, in which:
-
FIG. 1 illustrates a decisional flowchart to choose a method of IPBR; -
FIG. 2 is an illustrative signal flow graph of a fast Fourier transform in the prior art; -
FIG. 2 a is an illustration of computations made inFIG. 2 ; -
FIG. 3 is an illustrative graph of a conventional IPBR address generation; -
FIG. 4 is an illustrative graph ofMethod 1 for IPBR address generation; -
FIG. 5 is an illustrative graph ofMethod 1 for IPBR address generation; -
FIG. 6 is an illustrative graph ofMethod 4 IPBR address generation scheme for N=64 addresses; -
FIG. 7 is an illustrative graph ofMethod 3 IPBR address generator; -
FIG. 8 is an illustrative graph ofMethod 1 IPBR address generation for odd log2N; -
FIG. 9 is an illustrative graph ofMethod 4 IPBR address generation for odd log2N; -
FIG. 10 is an illustrative graph ofMethod 2 IPBR address generation for odd log 2N; -
FIG. 11 is an illustrative graph ofMethod 5 IPBR address generation; -
FIG. 12 is an illustrative graph ofMethod 2 m IPBR address generation; -
FIG. 13 is an illustrative graph ofMethod 1 m IPBR address generation; -
FIG. 14 is an illustrative graph ofMethod 4 m IPBR address generation; -
FIG. 15 is a flowchart of an exemplary embodiment that combines IPBR and one Fast Fourier Transform stage; -
FIG. 16 is an illustrative graph of the combined FFT stage for IPBR address generation according toMethod 1C; -
FIG. 17 is an illustrative graph of the combined FFT stage for IPBR address generation according toMethod 4C. - The preferred and alternative exemplary embodiments of the present invention include methods of in place bit reversal (IPBR) that are computationally efficient patterns to generate sequential address pairs for computing fast Fourier transforms (FFTs) in parallel with the address pair generation, in a processor. To decide which IPBR methods is most efficient for a specific application, reference is made to the decisional flowchart of
FIG. 1 . Assume aninput array 10 is stored in 2{circumflex over ( )}(log2N+M) contiguous words of memory, beginning at start address S_in. The array has 2{circumflex over ( )}log2N elements and each element is stored in 2{circumflex over ( )}M contiguous words of data memory. For example, four words of contiguous memory would accommodate two words of precision for both the real and imaginary part of complex input data elements. - Five new IPBR address generators for mapping arrays in bit reversed order are disclosed.
Methods - The methods and devices for organizing array addresses into three sets, A, B, and C, to facilitate the creation of more optimal IPBR address pair generation. Every address in set A has a corresponding bit reversed address in set B. Set C contains all the self reversed addresses. Once these sets are defined, the new address pair generator systematically advances through every element of set A to define the first address of each address pair. Since only one address of each pair is independently defined, by using the appropriate complimentary bit reversed advance, the second address increment is also defined. The three sets of addresses are defined so that simple and efficient means exist for systematically stepping through every address in set A.
- The method for dividing addresses into sets is as follows. For an array of length 2{circumflex over ( )}(log2N), let Q equal the truncated integral quotient of log2N/2. Each array element address in binary form is divided into its Q most significant bits (MSBs), denoted by “x”, and its Q least significant bits (LSBs), denoted by “y”. For even log2N, there are two ways to uniquely define the set of sets A, B, and C. One way is to divide up the addresses with the bit reversed Q LSBs greater than, less than, or equal to the Q MSBs. The second way is to divide up addresses with the Q LSBs greater than, less than, or equal to the bit reversed Q MSBs.
- For odd log2N, there are three ways to divide the addresses in three sets listed in Table III. After discarding the middle Q+1th bit, the first two ways are the same as the even log2N case. The third way is to reverse the inequality in the inequality relationship defining a set, according to whether the Q+1th middle bit of the address is zero or one. For the purposes of graphically visualizing all array element addresses and recognizing an easy way to step through set A, appropriately append the middle Q+1th bit to either the x or y axis data. Here it is prefixed to the vertical axis data. For odd log2N,
IPBR Method 1 uses the first way for defining the three sets,Method 3 uses the second way, andMethod - For Q=log2N>>1 let x be the Q MSBs and y be the Q LSBs. Let z be the middle Q+1st bit for odd log2N. For Table III, the bit_rev( ) operator reverses Q bits.
TABLE III Methods for Dividing Addresses Into Three Sets log2N value Set division “way” Set A or Set B Set B or Set A Set C Even and First way bit_rev(y)>x bit_rev(y)<x bit_rev(y)=x odd log2N Second way y>bit_rev(x) y<bit_rev(x) y=bit_rev(x) Odd log2N Third way bit_rev(y)>x if z=0 bit_rev(y)<x if z=0 bit_rev(y)=x only bit_rev(y)<x if z=1 bit_rev(y)>x if z=1 y>bit_rev(x) if z=0 y<bit_rev(x) if z=0 y=bit_rev(x) y<bit_rev(x) if z=1 y>bit_rev(x) if z=1 - The “filtered” conventional IPBR address pair generator, defined as the conventional IPBR address generator after extraneous pair removal, is segregated using the first way. The “filter” accepts only address pairs with first address, given by “a”, that satisfy a<bit_rev(a). For even log2N, define xy as the number with MSBs equal to x and LSBs equal to y. Then a<bit_rev(a) implies xy<bit_rev(xy) and thus x<bit_rev(y) so the bit reversed Q LSBs are greater than the Q MSBs. Thus, this method includes a criterion equivalent to the conventional IPBR criterion, but uses a more useful form of this criterion earlier in the conceptual process to avoid later extraneous pair removal.
- An important application of this method is in IPBR address generators and methods that remove the typical input buffer alignment restriction for bit reversed addressing. This is important because the remaining FFT process can be implemented without any buffer alignment restriction. By contributing some of the cycles that are conserved by the present invention, software may be added that completely removes the requirement for input buffer alignment. Efficient removal of the alignment requirement may require inner loops that always bit reverse increment the same element of the address pair. This can make
Methods Method 1 is unique in that even without being modified for alignment removal, its inherent alignment requirement is relaxed to 2{circumflex over ( )}(log2N/2−1) for even log2N and 2{circumflex over ( )}((log2N−1)/2) for odd log2N. All other methods have a 2{circumflex over ( )}(log2N) alignment requirement. - Referring to
FIG. 1 , the figure is a decisional flowchart providing selections to implement specific methods for address pair generators based upon certain information. The address generators can perform without an alignment restriction or with merely a relaxed alignment restriction. For performing address pair generation with only a reduction of thealignment constraint 10,Methods alignment constraint 14 is preferred, thenMethods - The address generator generates bit reversed addresses for an FFT with a size
log2N input array 18 for use on a digital signal processor or other processing means capable of performing FFT operations. When only two address registers are available on aprocessor 20, thenIPBR Methods processor 24, then onlyMethods log2N input array 28,Methods log2N input array 32,Method 2 is the most efficient method inmost operations 34. To optimize an evenlog2N input array 36,Method 3 is the most efficient method inmost operations 38. - To create the IPBR generators and their modified versions, x, y plots are used to plan the path to follow with a method prior to defining the method itself. Specific cases for IPBR methods of the present invention and the conventional method are plotted in
FIGS. 3-14 . For all plots, M=S_in=S_out=0. Each IPBR method generates a sequence of address pairs. The first address of an address pair is represented by AR1 and the second address by AR2. Here AR1=bit_rev(AR2) and AR2=bit_rev(AR1). Sequential AR1 and sequential AR2 values are shown in the plots. Each square in the plots, formed by the x and y axis grid, represents the address of a unique element in the input array. In other words, on the graphs every array address is represented by one square. For log2N=6, the x axis value gives the three most significant bits (MSB) of an address, and the y axis value gives the three least significant bits (LSB) of a six bit address. Address coordinates are offset by (½, ½) to force the plots into the middle of a square made by the plot's grid. The address corresponds to the square's lower left corner coordinates. The first addresses of each bit reversed pair (the AR1 s) are graphed using small circles. The second address of the each address pair (the AR2 s) are graphed using small squares. Sequential AR1 address values are connected with a dashed line connecting the circles. Sequential AR2 address values are connected with a solid line connecting the small squares. -
FIG. 3 illustrates the sequence of addresses generated using the conventional IPBR method found in the prior art. For the conventional method, the initial address pair is graphed at AR1=0=(0,0) and AR2=0=(0,0). The second address pair is at AR1=(0,1) and AR2=(4,0). For this second address pair note (b indicates binary); AR2=bit_rev(AR1)=bit_rev(1)=bit_rev[(0,1)]=bit_rev(000 001 b)=100 000 b=(4,0)=32. - Note that both a circle and a square symbol land on every grid square in
FIG. 3 . For the conventional method, the address generation scheme “lands” on every square twice. For any array element address x, the address pair AR1=x, AR2=bit_rev(x) occurs in the sequence of address pairs, as well as AR1=bit_rev(x), AR2=x. If one swapped the contents of memory referenced in the bit reversed pair of addresses every time a new pair of addresses is generated, then the data referenced by these redundant bit-reversed pairs would be swapped twice, and data would end up back where it started. The conventional address generation scheme has three computational penalties: (1) because every non-self-bit-reversed address is generated twice, twice as many iterations are needed; (2) testing and conditional branching is required to break the degeneracy and swap only once per address; and (3) the self-bit-reversed addresses are also generated by the sequence of address pairs. For example, the address (5,5) corresponds to binary address 101 101 b, which remains the same after bit reversal. Since the memory referenced by a self-bit-reversed address does not need to be exchanged with itself, it wastes additional cycles when the IPBR address generation scheme generates self-bit-reversed addresses. - The five IPBR methods presented are defined by sequential increments or “moves” of the two “bit reversed pairs” (AR1, AR2) and (AR3, AR4). For two addresses, A and B, if B=bit_rev(A) then it follows that A=bit_rev(B) and (A,B) form a “bit reversed pair”. The array size is 2{circumflex over ( )}log2N. Variable “Q” is defined as the truncated integral quotient of log2N/2, i.e., odd log 2N is (log2N−1)/2 and even log2N is (1og2N/2); and where variable “R” is defined as the remainder of log2N/2. Address increments are I0=2{circumflex over ( )}(log2N−1), I1=2{circumflex over ( )}(log2N−Q−1), I2=2{circumflex over ( )}Q, I3=2{circumflex over ( )}(log2N−Q), I4=2{circumflex over ( )}(Q−1), I5=2{circumflex over ( )}(log2N−2). The address increments form four bit reversed pairs, i.e., (1,I0), (I1,I2), (I3, I4), and (2,I5). Bit reversed increments are indicated by a suffix of B. For the bit_rev operator that reverses the order of bits:
ARx=ARx+IyB=bit_rev[bit_rev(ARx)+bit_rev(Iy)]. -
Method 1 may be implemented for both odd and even log2N input array sizes. This address generation scheme generates only unique address pairs referencing data that needs to be swapped for IPBR, thereby eliminating the testing and conditional branching found in methods of the prior art and eliminating the waste of additional instruction cycles due to IPBR address generation for redundant and self-reversed addresses. -
FIG. 4 illustrates the result ofMethod 1 for generating bit reversed address pairs. The first pair of bit reversed addresses is AR1=(x1,y1)=(1,0)=[001 000b]=8 and AR2=(x2,y2)=(0,4)=[000 100b]=4. Thus inFIG. 4 , (1,0) initiates the sequence of first addresses in each sequential address pair generated, and (0,4) initiates the sequence of second addresses. For each address pair, the second address gives the first address bit-reversed. Note that every square (unique address) is generated only once, and no self-bit-reversed addresses are generated. For example, the address generation scheme never lands on the (5,5) square of address 101 101b, which thus has no circle or square symbol inFIG. 4 . Because the address generation scheme generates only unique address pairs referencing data that needs to be swapped for IPBR, the testing and conditional branching is eliminated. - To understand the concept behind
Method 1 and subsequent methods, it is helpful to bit reverse the y-axis data ofFIG. 4 , which is illustrated inFIG. 5 . After this mapping, self-reversed addresses all lie on a diagonal line. The plot is split in an imaginary line from (0,0) to (8,8) diagonally through the graph. This divide splits the graph area into two triangles: a top and a bottom triangle. The bit reversed address of every square in the upper triangle is located in the lower triangle. By keeping AR1 in the lower triangle, AR2 in the upper triangle, and systematically stepping through each square (or address),Method 1 avoids all redundant pairs and self-reversed addresses. - All the IPBR Methods presented can be modified in three different ways by replacing part or all of the address pair sequence with a “topologically similar” sequence. Variations of the IPBR Methods include 1) x and y axis inversions of the original sequence, and 2) reversing the order of the original subsequences, 3) replacing an (A,B) address pair with (B,A) address pair for arbitrary numbers of terms in sequences.
-
Method 1 uses the first way of defining three sets, so the y axis data is bit reversed. Set A contains all the array element addresses with bit_rev(y)<x, Set B contains addresses with bit_rev(y)>x, and for Set C, bit_rev(y)=x.Methods FIG. 5 forMethod 1. ForMethod 1, Set A is the lower triangle, Set B the upper triangle, and Set C elements lie along the diagonal. - Any method can be altered by interchanging the order of the first and second addresses of an address pair, which is a third way of defining sets for bit reversal. Such exchanges may be favorable for reducing program code or cycles but should not be thought of as producing a different address pair generator that is not included in this invention. The only difference is that in alternating subsequences, the choice of first and second address is exchanged. Such an exchange does not result in a new address pair, and is therefore an IPBR address pair generator within the scope of the present invention. There are many other methods, not explicitly defined herein, for systematically stepping through set A. For example, the generator could proceed through set A using horizontal lines instead of vertical lines as in
Method 1, which advances along vertical lines whenever possible in the lower triangle ofFIG. 5 . - To perform in place bit reversal,
Method 1 uses three “moves” defined in Table IV. For odd log2N, I2=I1. For even log2N, I2=I1+I1. This results in different optimized code for even and odd log2N cases on processors with only one address increment register.TABLE IV Moves for Method 1Move 1Move 2Move 3AR1=AR1+I1B AR1=AR3 AR3=AR3+I3 AR2=AR2+I2 AR2=AR4 AR4=AR4+I4B -
Method 1 is implemented with the following steps:Initialize AR3=S_in, AR4=S_in. Iterate from k=(R+1) to (R+1)*((2{circumflex over ( )}Q)−1) in steps of (R+1) Move 3 Move 2Iterate from j=1 to k−1 in steps of 1 Move 1End of j loop End of k loop - Therefore, to implement the operations of
Method 1, addresses AR3=S_in, AR4=S_in are initialized.Method 1 iterates from k=(R+1) to (R+1)*((2{circumflex over ( )}Q)−1) in steps of (R+1); performsMove 3; performsMove 2, iterates from j=1 to k−1 in steps of 1; performsMove 1; and then ends iterations of the j loop and then ends iterations of the k loop. The address pair sequence generated forMethod 1 is defined by all the values that AR1, AR2 take on after moves that affect these values (not Move 3). -
Method 4, illustrated in the graph inFIG. 6 , has initial pair AR1=1, AR2=32. Note x axis data is bit reversed, unlikeMethod 1 inFIG. 4 .FIG. 7 illustratesMethod 3. The first address pair is AR1=(0, 1) and AR2=(4,0). For even log2N, this varies fromMethod 4 by using a zig-zag pattern to step through the same sets, instead of advancing horizontally or vertically when possible. - For the segregation into three sets for odd log2N in
FIG. 8 , set A is in the lower triangle, which is systematically covered by the first address of theMethod 1 IPBR address pair generator. The self reversed set, C, forms a line withslope 2. The vertical axis data, referred to as zy, has been prefixed by the middle bit z. For the new zy vertical axis inFIG. 8 , set A is defined by (bit_rev(zy)>>1)<x. This is equivalent to bit_rev(y)<x. Thus bit_rev(y)<x defines set A here. - The “second way” described for segregating three sets of addresses for odd log2N is used by
Method 4, as illustrated byFIG. 9 . Next, the “third way” to segregate the addresses into three sets for odd log2N is illustrated in FIGS. 10 forMethod 2 andFIG. 11 forMethod 5.Method 2 is a special case where Set A is defined as the union of two sets with reversed inequalities, depending on whether the Q+1 st bit is zero or one. This definition of Set A facilitates a special technique exploited byMethod 2 for continuing to use the same address advance increment scheme even when faced with the Set A zy-axis vertical boundary. The address sequence “wraps around” while continuing to use the same increment with no special treatment required for handling Set A's boundary. This method used byMethod 2 cannot easily be extended to the even log2N case, soMethod 2 only works for odd log2N.Method 3 can be extended to work for odd log2N. This is done byMethod 5, which reduces toMethod 3 for log2N. Combining even and odd log2N capability inMethod 5 is awkward, however. For some applications branching toMethod Method 5. - Processor cycles are further reduced in FFTs with an odd log2N input array with
Method 2.Method 2 considers (log2N−1)/2 LSBs and (log2N−1)/2 MSBs for odd log 2N to define sets A, B, and C. Also, the (log2N+1)/2 th middle bit of each binary array element is considered.Method 2 defines set A as the union of the set of elements that have z=1 and LSBs<bit_rev (MSBs) with the set of elements that have z=0 and LSBs>bit_rev (MSBs). Sets A and B inequality criterion are reversed according to whether the middle bit value is one or zero. WhileMethod 2 reduces processor cycles overMethod 1,Method 2 also has a 2{circumflex over ( )}(log2N) alignment requirement not found inMethod 1. To performMethod 2, three moves are implemented as defined in Table V. The third move combines the operations of the first two moves.TABLE V Moves for Method 2Move 1Move 2Move 3AR1=AR1−I0B AR1=AR1−1 AR1=AR1−1−I0B AR2=AR2−1 AR2=AR2−I0B AR2=AR2−I0B−1 -
Method 2 is implemented with the following steps:Initialize AR1= S_in+ 1, AR2=S_in+I0.Iterate from k=1 to 2{circumflex over ( )}(log2N−2)−2{circumflex over ( )}Q in steps of 1 Move 1Move 2End of k loop Move 1 Iterate from j=1 to (2{circumflex over ( )}Q) −2 in steps of 1 Move 3End of j loop - Therefore, to implement the operations of
Method 2, address registers AR1=S_in+ 1, AR2=S_in+I0 are initialized.Method 2 iterates from k=1 to 2{circumflex over ( )}(log2N−2)−2{circumflex over ( )}Q in steps of 1; performsMove 1; performsMove 2; and ends the k loop. TheMethod 2 then performsMove 1; iterates from j=1 to (2{circumflex over ( )}Q)−2 in steps of 1; performsMove 3, and then ends the j loop. The values of AR1, AR2 after initialization and all moves define theMethod 2 address pair sequence. - Processor cycles may be further reduced over
Method 1 for input arrays of an even log2N size by implementingMethod 3.Method 3 considers log2N/2 LSBs and MSBs to define the sets A, B, and C for input array elements.Method 3 defines input array element set A by those addresses that have address LSBs>bit_rev(address MSBs).Method 3 may or may not reduce processor cycles overMethod 1, depending on the processor.Method 3 also has a 2{circumflex over ( )}(log2N) alignment requirement not found inMethod 1. To performMethod 3, four moves are implemented as defined in Table VI.TABLE VI Moves for Method 3Move 1Move 2Move 3Move 4AR1=AR1+1 AR1=AR1+I0B AR1=AR3 AR3=AR3+2 AR2=AR2+I0B AR2=AR2+1 AR2=AR4 AR4=AR4+I5B -
Method 3 is implemented with the following steps:Initialize AR3= S_in+ 1, AR4=S_in+I0Iterate for k=(2{circumflex over ( )}Q)−2 to 0 in steps of −2 Move 3Move 4Iterate for j=1 to k in steps of 1 Move 1Move 2End of j loop End of k loop - The sequence of address pairs generated by
Method 3 is defined by the AR1, AR2 values after initialization and after all moves exceptMove 4.Method 4 is similar toMethod 1 in that it can be implemented for both odd and even log2N input arrays. Differences between the two include implementation for different processor capabilities and how the methods define input sets of array elements.Method 4 may operate on processors with only one address increment register, whereasMethod 1 requires more than one such register.Method 4 considers LSBs and MSBs from Q bits to define the sets A, B, and C for input array elements and defines input array element set A by those addresses that have address LSBs>bit_rev(address MSBs). However,Method 4 does not reduce the alignment requirement. - To perform
Method 4, three moves are implemented as defined in Table VII.TABLE VII Moves for Method 4Move 1Move 2Move 3AR1=AR1+I0B AR1=AR3 AR3=AR3+1 AR2=AR2+1 AR2=AR4 AR4=AR4+I0B - To implement the operations of
Method 4, the following steps are performed:Initialize AR3=S_in, AR4=S_in. Iterate for m=0 to m=R in steps of 1 Iterate from k=1 to (2{circumflex over ( )}Q)−1 in steps of 1 Move 3Move 2Iterate from j=1 to k−1 in steps of 1 Move 1End of j loop End of k loop Move 3 End of m loop - The address pair sequence for
Method 1 is defined by the AR1, AR2 values afterMoves Method 4 is also defined by the AR1, AR2 values afterMoves -
Method 5 extendsMethod 3 to work for odd log2N. Referring toFIG. 1 , processor cycles may be further reduced overMethod 1 for input arrays of an odd log2N size by implementingMethod 5. To performMethod 5, five moves are implemented as defined in Table VIII.TABLE VIII Moves for Method 5Move 1Move 2Move 3Move 4Move 5AR1=AR1+1 AR1=AR1+I0B AR1=AR3 AR3=AR3+2 AR1= AR1+ 1+I0B AR2=AR2+ AR2=AR2+1 AR2=AR4 AR4=AR4+I5B AR2= AR2+ I0B 1+I0B - To implement the operations of
Method 5, the following steps are performed:Initialize AR3= S_in+ 1, AR4=S_in+10If(log2N>1) count=2{circumflex over ( )}Q−1 Iterate for k=(2{circumflex over ( )}Q)−2 to 2 in steps of −2 Move 3Move 4If(R==0) count=k; End of if; Iterate for j=1 to count in steps of 1 Move 1Move 2End of j loop If(R==1) Move 1 End of if;End of k loop Move 3 If(R==1) Iterate for j=1 to count in steps of 1 Move 5End of j loop End of if - The sequence of address pairs generated by
Method 5 is defined by the AR1, AR2 values after initialization and after all moves exceptMove 4. - All the IPBR Methods of the present invention can be modified by replacing part or all of the address pair sequence with a “topologically similar” sequence. Variations of the IPBR Methods include 1) reversing the order of the original subsequences, 2) x and y axis inversions of the original sequence, and 3) replacing an (A, B) address pair with (B,A) address pair for arbitrary numbers of terms in sequences. By reversing the order of alternating subsequences in
Method Method Method Method Method 2 m does not alter the cycle count, but is exemplary of an x and y axis inversion.Method 2 m “inverts” the entire address pair sequence ofMethod 2. - The address generation scheme for
Method 2 uses an address increment of AR0=2{circumflex over ( )}(log2N−1). One can modifyMethod 2 first by changing the starting address pair from AR1=1 and AR2=AR0 to the same address pair after x and y axis inversion, AR1=2*(AR0−1) and AR2=AR0−1. Next, change the sign of all address increments in the address generation scheme. For theoriginal Method 2 all increments (linear and bit reversed) are subtracted; forMethod 2 m, all increments are added. This results in a valid IPBR address generator for all odd log2N, and the N=32 address pair sequence is given byFIG. 12 . - Note the described modification of
Method 2 generates a sequence of address pairs that is topologically similar to theoriginal Method 2 shown previously inFIG. 10 . The data along both the x and y axis has been inverted. Placing an upside down graph ofFIG. 12 on top ofFIG. 10 results in a match.Method 2 is preferable toMethod 2 m only because of a simpler initialization of the address pair sequence. This invention is inclusive of topologically equivalent address generation schemes and all address generation schemes that vary in some simple or obvious manner fromMethod Method 1 m keeps the same subsequences shown on horizontal and vertical lines inFIG. 5 forMethod 1, but connects these subsequences in a different way. - A similar modification could be performed on
Method 4. For variety, however,Method 4 m is formed by reconnecting the horizontal and vertical lines in a different manner. Note that none of the alternatives given in Table IV are satisfied by theentire Method 4 m address pair sequence given inFIG. 13 . However, all of the individual sub-sequences do satisfy Table IV. - Any method can be altered by interchanging the order of the first and second addresses of an address pair. Such exchanges may be favorable for reducing program code or cycles but should not be thought of as producing a different address pair generator that is not included in this invention. An example of two address pair generators that give an identical address pair sequence, and vary only in the order of the first and second address for an arbitrary number of address pairs, can be illustrated by plots of
Method 1 m (FIG. 13 ) andMethod 4 m (FIG. 14 ). InFIG. 13 , changing the bit reversed axis from the y-axis to the x-axis results in a sequence of address pairs that is identical to that ofFIG. 14 . The only difference is that in alternating subsequences, the choice of first and second address is exchanged. Such an exchange does not result in a new address pair, and thus a new IPBR address pair generator, outside the scope of the present invention. - To perform IPBR, modified
Method 1 m uses eight “moves” to generate a new AR1, AR2 address pair as defined in Table IX. For moves seven and eight, a new bit reversed pair of address increments is defined: I6=2{circumflex over ( )}(Q−2) and I7=2{circumflex over ( )}(log2N−Q+1).TABLE IX Moves for Method 1mMove 1 Move 2Move 3Move 4Move 5Move 6Move 7Move 8AR1+=I1B AR1+=I2 AR1−=I1B AR1−=I2 AR1+=I3 AR1+=I4B AR1+=I6B AR1+=I7 AR2+=I2 AR2+=I1B AR2−=I2 AR2−=I1B AR2+=I4B AR2+=I3 AR2+=I7 AR2+=I6B -
Method 1 m is implemented with the following steps:Initialize AR1=S_in+I3, AR2=S_in+I4. Stop=(R+1)*((2{circumflex over ( )}Q)−2)−1 k=−1 while(2{circumflex over ( )}Q>1) k=k+(R+1) Iterate from j=1 to k in steps of 1 Move 1 End ofj loop Move 7 lf(log2N<4) go to Exit k=k+(R+1) Iterate from j=1 to k in steps of 1 Move 4 End of jloop Move 6 k=k+(R+1) Iterate from j=1 to k in steps of 1 Move 2 End of jloop if(k>stop) go to Exit Move 8 k=k+(R+1) Iterate from j=1 to k in steps of 1 Move 3 End ofj loop Move 5 End of while loop Exit:
The values of AR1, AR2 after initialization and all moves define theMethod 1 m address pair sequence. - To perform
Method 2 m, three moves are implemented as defined in Table X. The third move combines the operations of the first two moves. Relative to theunmodified Method 2,Method 2 m is an example of x and y axis inversion.TABLE X Moves for Method 2mMove 1 Move 2Move 3AR1=AR1+I0B AR1=AR1+1 AR1=AR1+1+I0B AR2=AR2+1 AR2=AR2+I0B AR2=AR2+ I0B+ 1 -
Method 2 m is implemented with the following steps:Initialize AR1=S_in−2+2{circumflex over ( )}log2N, AR2=S_in−1+2{circumflex over ( )}(log2N−1). Iterate from k=1 to 2{circumflex over ( )}(log2N−2)−2{circumflex over ( )}Q in steps of 1 Move 1Move 2End of k loop Move 1 Iterate from j=1 to (2{circumflex over ( )}Q) −2 in steps of 1 Move 3End of j loop - Similar to
Method 3, Method 3 m is implemented to reduce processor cycles for input arrays of even log2N size. Method 3 m requires only two address registers to operate. To perform Method 3 m, six moves are implemented as defined in Table XI. In the Table XI, ARx+=Iy represents ARx=ARx+Iy.TABLE XI Moves for Method 3m Move 1 Move 2Move 3Move 4Move 5Move 6AR1+=1 AR1+=I0B AR1−=1 AR1−=I0B AR1+=2 AR1−=I5B AR2+=I0B AR2+=1 AR2−=I0B AR2−=1 AR2+=I5B AR2−=2 - Method 3 m is implemented with the following steps:
Initialize AR1=−1, AR2=−1+2{circumflex over ( )}log2N If (log2N>2) Iterate for k=(2{circumflex over ( )}Q)−2 to 0 in steps of −4 Move 5Iterate for j=1 to k in steps of 1 Move 1Move 2End of j loop Move 6 Iterate for j=1 to k−2 in steps of 1 Move 4Move 3End of j loop End of k loop End of if - Excluding initialization, all values of AR1, AR2 after moves define the Method 3 m address pair sequence.
-
Method 4 m illustrates a scheme different fromMethod 1 m for reconnecting subsequences of address pairs. To performMethod 4 m, five moves are implemented as defined in Table XII. For processors with only one address increment register, note after twoMove 1's the final resulting AR1, AR2 changes are equivalent toMove 5.TABLE XII Moves for Method 4mMove 1 Move 2Move 3Move 4Move 5AR1=AR1+1 AR1=AR1+ AR1=AR1+ AR1=AR1− AR1=AR1+2 I0B 1+I0B I0B AR2=AR2+ AR2=AR2+1 AR2=AR2+ AR1=AR1−1 AR2=AR2+I5 I0B I0B+ 1 -
Method 4 m is implemented with the following steps:Initialize AR1= S_in+ 1, AR2=S_in+I0.Iterate from m=0 to R in steps of 1 Iterate from k=1 to (2{circumflex over ( )}Q) −2 in steps of 2 Move 1Iterate from j=1 to k in steps of 1 Move 2End of j loop Move 3 Iterate from j=1 to k+1 in steps of 1 Move 4End of j loop End of k loop Move 5 End of m loop - The values of AR1, AR2 after initialization and all moves define the
Method 4 m address pair sequence. - In
Methods -
- 1) Subtract the buffers start address from ARx. (After subtraction, the effective start address reference is zero, and thus the alignment restriction is satisfied);
- 2) Perform the bit reversed incrementation on ARx; and
- 3) Add the start address back to ARx.
- For the above approach, ideally an address increment register can be reserved for the start address that is subtracted and added to ARx. On the TI C54x, only one address increment register, AR0, is available for use by the
IPBR Methods 1 through 4 presented herein. An alternative procedure is to create a “shadow” address register, ARy, for each address register used by the IPBR method. Keep ARx=ARy+start address, so the shadow register references a zero start address and satisfies the alignment restriction. For each iteration the address pair is advanced, the address that is bit reversed is incremented according to the following steps: -
- 1) If ARy is not already up to date, force ARy=ARx−start address;
- 2) Perform the bit reverse incrementation on ARy; and
- 3) Add the start address and store in ARx, i.e., ARx=ARy+start address.
- If the inner loop always bit reverses increments for the same address of the address pair, then step one can be removed from the inner loop.
- An alternative method generates addresses using out of place bit reversal (OOPBR) to reduce cycles for processors that do not support bit reversed address register incrementation and consequently require many cycles to generate a bit reversed address. The conventional OOPBR approach is to generate one address pair per data move. With the present invention, about half as many bit reversed address offsets are generated by using bit reversed offsets twice. First, one of this invention's IPBR methods is used to generate address pair offsets [AR1, bit_rev(AR1)] as if S_in=0. The OOPBR algorithm copies data from the contents referenced by S_in+AR1 into S_out+bit_rev(AR1) and copies data referenced by S_in+bit_rev(AR1) into S_out+AR1. Finally, a second address generator is used to generate all self-reversed offsets, and for each self reversed offset only one data transfer is made. This OOPBR method removes the start address offset from the address pair sequence generated, and consequently this OOPBR method need not impose any alignment constraints on the input or output buffer.
- To implement the OOPBR Method,
IPBR Method 1 is extended for OOPBR applications. Address pairs AR3 and AR4 are initialized to zero instead of S_in, because for OOPBR, relative address offsets are generated, not actual addresses. Beyond the moves for the chosen IPBR method, three additional moves are needed. These additional moves only affect one address register. To perform the OOPBR Method, six moves are implemented as defined in Table XIII.TABLE XIII Moves for OOBPR Method Move 1 Move 2Move 3Move OOPBR.1 Move OOPBR.2 Move OOPBR.3 AR1=AR1+I1B AR1=AR3 AR3=AR3+I3 AR1=AR1+1+I0B AR1=AR3 AR3=AR3+I2 AR2=AR2+I2 AR2=AR4 AR4=AR4+I4B - To implement the operations of the OOPBR Method, the following steps are performed:
Initialize AR3=0, AR4=0. Iterate from k=(R+1) to (R+1)*2{circumflex over ( )}(Q−1) in steps of (R+1) Move 3 Move 2Iterate from j=1 to k−1 in steps of 1 Move 1End of j loop End of k loop - For all of the above moves (that affect AR1, AR2) transfer data from address S_in+AR1 to S_out+AR2, and transfer data from S_in+AR2 to S_out+AR1. When it is costly in cycles to calculate the result of bit reversed address incrementation, this is helpful because two data transfers are made for each bit reversed address computation. Note that making the two indicated data transfers for the address pair sequence given above will not complete the OOPBR operation because all the self-reversed addresses are omitted.
- The operations of the OOPBR Method are continued with the following steps:
Initialize AR3=0 Iterate for k=0 to R in steps of 1 Move OOPBR.2 Move OOPBR.3 Iterate for j=1 to (2{circumflex over ( )}Q)−1 Move OOPBR.1 End of j loop End of k loop
For all of the preceding moves for OOPBR that affect AR1, only transfer data from S_in +AR1 into S_out+AR1. - The present methods can be efficiently implemented even when data elements are represented by multiple contiguous words. For each of the methods disclosed, the initial address pair(s) and all increment registers are multiplied by 2{circumflex over ( )}M when data elements are represented by 2{circumflex over ( )}M contiguous words. For a linear increment of one, however, scaling up by 2{circumflex over ( )}M is normally not needed as demonstrated below. Also, methods for dealing with the lowest M bits of the address that needs a bit reversed increment are described below. For some FFTs, each sequential data element may require two or four words of memory. For example, double precision complex FFT data can be in the format, R_MSW(1), R_LSW(1), I_MSW(1), I_LSW(1), R_MSW(2), R_LSW(2), I_MSW(2), I_LSW(2), . . . for R_MSW=signed real most significant word; R_LSW=unsigned real least significant word; I_MSW=signed imaginary most significant word; I_LSW=unsigned imaginary least significant word.
- Assume for a particular IPBR swap and move, the goal is to advance AR1 linearly to the next element and advance AR2 bit reversed. Table XIV illustrates IPBR processing of single and four word elements.
TABLE XIV IPBR Processing of Single and Four Word Elements Single precision real M=0 Double precision complex M=2 1 word of contiguous memory per element in array 4 words of contiguous memory per array element AR0 = 2{circumflex over ( )}(log2N−1); address increment AR0 = 2 + 4*2{circumflex over ( )}(log2N−1) Start of Loop Start of Loop Swap (AR1, AR2) data, AR1=AR1+1 Swap (AR1, AR2) data, AR1=AR1+1, AR2=AR2+1 AR2=bitrev_add(AR2, AR0) Swap (AR1, AR2) data, AR1=AR1+1, AR2=AR2+1 End of Loop Swap (AR1, AR2) data, AR1=AR1+1, AR2=AR2+1 Swap (AR1, AR2) data, AR1=AR1+1 AR2=bitrev_add(AR2, AR0) End of Loop - For the exemplary double precision complex FFT, Table XIV adds two to the bit-reversed increment, AR0, relative to the single precision real case. This procedure avoids using another instruction to subtract three from AR2. Adding two to the bit-reversed increment for four words of contiguous memory clears an offset of three, since in bit reversed
addition 3+2B=0. An alternative approach is to treat alternating sequential swaps differently. The first data swap lets the two least significant bits advance to three by advancing from the R_MSW to the I_MSW. The second swap starts by swapping I_MSW and advances backwards to swap the R_MSW data last. - To estimate the number of cycles required to implement the methods of the present invention on a TI C54x processor, the number of address pairs generated is multiplied by the cycles required to generate a new address pair and perform (or decline to perform) an exchange of data referenced by the address pair. The cycle estimates presented in Table XV ignore the penalty for a limited number of outer loops when loops are nested. These results demonstrate that the new IPBR methods are competitive with OOPBR. IPBR methods reduce the number of cycles by more than 80% over the conventional method, in most cases. Cycles per address pair are reduced from 14 or 12 cycles down to 4 or 3 cycles and the number of address pairs is reduced in half. The modified methods only vary from the corresponding un-modified method in that their preferred implementation use only two address registers.
TABLE XV Requirements of the OOPBR and IPBR Methods TI C54x Alignment Number Inner loop Method of single word data Cycles Constraint of address element bit reversal for an array per Array start Nested Address increment with length equal to a power address Number of iterations address address must be loops registers registers of 2. pair pairs are generated a multiple of used? needed needed OOPBR Conventional Method 3 2{circumflex over ( )}(log2N − 1) 2{circumflex over ( )}log2N No 2 1 IPBR Conventional Method 14 or 12 2{circumflex over ( )}(log2N) 2{circumflex over ( )}log2N No 2 1 Method 1 (even log2N) 4 2{circumflex over ( )}(log2N − 1) − 2{circumflex over ( )}(log2N/2) 2{circumflex over ( )}(log2N/2 − 1) Yes 4 2 Method 1 (odd log2N) 3 2{circumflex over ( )}(log2N − 1) − 2{circumflex over ( )}(log2N/2 + 1/2) 2{circumflex over ( )}((log2N − 1)/2) Yes 4 1 Method 2 (odd log2N) 3 2{circumflex over ( )}(log2N − 1) − 2{circumflex over ( )}(log2N/2 + 1/2) 2{circumflex over ( )}log2N No 2 1 Method 3 (even log2N) 3 2{circumflex over ( )}log2N − 1) − 2{circumflex over ( )}(log2N/2) 2{circumflex over ( )}log2N Yes 4 1 Method 4 & 5 (even log2N) 3 2{circumflex over ( )}(log2N − 1) − 2{circumflex over ( )}(log2N/2 + 1/2) 2{circumflex over ( )}log2N Yes 4 1 Method 4 & 5 (odd log2N) 3 2{circumflex over ( )}(log2N − 1) − 2{circumflex over ( )}(log2N/2) 2{circumflex over ( )}log2N Yes 4 1 - The present invention of a design technique for a “stand alone” in-place bit reversal mapping can also be extended for integration of the bit reversed mapping with parallel FFT computations. This enables the design of address generators that combine IPBR and one FFT stage. Computing IPBR and the first stage in parallel increases efficiency by removing instructions to store output from a standalone IPBR mapping and then fetching the same data as input for the FFT stage. However, in the combined method, self-reversed addresses cannot be ignored. If self-reversed addresses were ignored in the combined method, then the FFT stage would be missing required input from these memory locations.
- Referring to
FIG. 15 , an exemplary embodiment may modify a “stand-alone” IPBR bit reversed address pair generator for integration into an FFT stage and assumes the FFTs bit reversed mapping precedes a first radix-2 stage with a one element “skip” in eachbutterfly 40. The method first extends the 2-D plotted path to hit the self-reversed addresses 42. This extended single address pair generator's first address is designated as “AR1.” After each AR1 move, a four to eight element set of addresses and their corresponding bit reversed compliments are determined 44. In an alternative case, three address are generated. A second address is a self-reversed address of the first and the third is the bit-reversed compliment of the address that is not self-reversed. Unlike the method described inFIG. 4 that uses a single pair of addressed and their bit reversed compliments, the exemplary combined method processes four pairs of addresses and their bit reversed compliments such that no conflicting addresses will be produced using the in-place memory. The preferred address pairs include AR1, AR1+2M, AR1+2log2N−1+M, AR1+2M+2log2N−1+M and their four bit reversed compliments. Each iteration of AR1 advances according to extended stand-alone IPBR methods described above until a move lands in a set that has not already been covered 46. Then, each new address set forms the output of a new address generator for oneiteration 48. This procedure splits the single address pair generator into a quadruple bit reversed address pair generator that typically makes eight non-self-reversed addresses each iteration. For some iterations only four addresses are generated and two of these addresses are self-reversed. - For a complex M=1 combination of the
Method 1 IPBR andStage 1, the C program segment in Table XVI processes the real parts for the 8th iteration of the quadruple address pair generator. Table XVI initializes AR0 through AR7 to the values the address generator yields in the 8th iteration. Each element address is suffixed by 0 for real, 1 for imaginary. In the computation, the values of A, B, C, and D are essential to be saved into memory for the procedure. If the processor only has three registers, then D can be placed into an external memory since it is only used once. Some processors only have two registers, but perform parallel processing. Thus, in the Computation Section of Table XVI, the paired calculations on the right side can be performed in parallel, to use one register for B and C.TABLE XVI Processing the 8th iteration real numbers for the Method 1 combinationAddress Register Values at the Eighth Iteration AR0=001 110 0b; AR1=001 111 0b; AR2=011 100 0b; AR3=011 101 0b; AR4=101 110 0b; AR5=101 111 0b; AR6=111 100 0b; AR7=111 101 0b; Computation for the Eighth Iteration A=*AR2+*AR6; B=*AR2−*AR6; C=*AR0; *AR0=A; D=*AR1; *AR1=B; A=C+*AR4; B=C−*AR4; *AR2=A; C=*AR3; *AR3=B; A=C+*AR7; B=C−*AR7; *AR4=A; C=*AR5; *AR5=B; A=D+C; B=D−C; *AR6=A; *AR7=B; - Table XVII illustrates a basic butterfly computation for the eighth iteration of the Method One combination. Note that each address AR0 through AR7 is equal to a bit reversed compliment of a different address (e.g., AR0 is equal to the bit reversed compliment of AR2 and AR2 is equal to the bit reversed compliment of AR0).
TABLE XVII Butterfly for the Eighth Iteration Real Numbers for the Method 1 Combination -
FIGS. 16 and 17 illustrate Address Generators for the combination of new “Method 1C” IPBR and one radix-2 FFT stage. Numbers in the plots indicate the iteration where an address is generated.FIG. 16 is a similar plot ofMethod 1 shown inFIG. 4 with bit reversed LSBs on the y-axis and MSBs on the x-axis. The arrows inFIG. 16 track the path of AR2 to the 8th iteration. The generated addresses inFIG. 16 's plot stop at 10 because at that point the plot becomes full. Recall inFIG. 4 howMethod 1 for generating bit reversed address pairs stepped through the plot such that every square (unique address) is generated only once with no self-reversed addresses generated. Thus, inFIG. 4 the squares on a diagonal line between the path on the left and the right sides of the plot were not hit. However, inFIG. 16 the same diagonal of self-reversed addressed must be hit in order to provide all input to the FFT stage - Similarly, the plot in
FIG. 17 ofMethod 4C using the preferred combined procedure corresponds to the plot inFIG. 10 forMethod 4. The arrows inFIG. 17 track the path of AR2 through the 10th address when the plot is filled in. - Note that MSBs=011b, post-bit reversal LSBs=100b, and the real part AR2=011 100 0b appears in Table XVI.
Method 1 requires auxiliary address registers to periodically reset the primary address registers. The use of multi-tasking address registers to cover more than one of eight new addresses for each iteration may also be implemented. Further, the use of one of the modified methods of the alternative embodiments above can implement the preferred combined method with fewer address registers. - Because many varying and different embodiments may be made within the scope of the inventive concept herein taught, and because many modifications may be made in the embodiments herein detailed in accordance with the descriptive requirements of the law, it is to be understood that the details herein are to be interpreted as illustrative and not in a limiting sense.
Claims (16)
1. A method for reordering the elements of a 2{circumflex over ( )}(log2N) length input array in bit reversed order to generate address pairs using a Fast Fourier Transform in a computer system processor, comprising:
generating a sequence of address pairs in said processor from said input array by processing said array; and
calculating a Fast Fourier Transform (FFT) using said sequence of generated address pairs and self-reversed addresses of said address pairs as input to said Fast Fourier Transform.
2. The method of claim 1 , wherein said generating a sequence of address pairs further comprises excluding one of said address pairs and bit reversed compliments to said address pairs in a previous generation process from a subsequent generation process.
3. The method of claim 1 , wherein said generating comprises generating a four element set of values of said generated address pairs after each said generation step.
4. The method of claim 1 , wherein said generating comprises generating an eight element set of values of said generated address pairs after each said generation step.
5. The method of claim 1 , wherein said generating further comprises plotting said generation of said sequence on a graph, a first axis of said graph is scaled to measure most significant bits in said array and a second axis of said graph is scaled to measure least significant bits in said array.
6. The method of claim 5 , wherein said generation comprises iterating through an organized path of said plotted address pairs.
7. A method for reordering the elements of a 2{circumflex over ( )}(log2N) length input array in bit reversed order to generate address pairs using a Fast Fourier Transform in a computer system processor, comprising:
providing a plot of addresses pairs of said array, wherein each address pair has a first address and a second address and the second address in each address pair is a bit reversed compliment of the first address,
wherein a first axis of said plot represents a scale of most significant bits and a second axis of said plot represents a scale of least significant bits for each address of said address pair values;
defining a path in said plot for processing a plurality of said plotted address pairs;
generating a set of output address pairs and self reversed addresses by processing through said path; and
calculating a Fast Fourier Transform (FFT) using said set of output address pairs and self-reversed addresses of said address pairs as an input to said FFT.
8. The method of claim 7 , wherein said generating said set of output address pairs further comprises excluding a set of address pairs and bit reversed compliments to said excluded address pairs of a previous processing from a subsequent processing.
9. The method of claim 7 , wherein said generating comprises generating a four to eight element set of address pair values and bit reversed compliments to said address pair values.
10. The method of claim 7 , wherein said processing comprises processing through and organized said path of said plot.
11. The method of claim 7 , wherein said generating comprises performing a discrete set of processing steps to advance each of said address pairs such that each address pair remains mutually bit reversed after each said move.
12. The method of claim 7 , wherein said generating comprises generating a sequence of address pairs from said input array to produce a set of address pairs that are not self-reversible.
13. The method of claim 7 , wherein said providing a plot comprises, for said array that is an odd log2N array, providing said first axis with Q+1 least significant bits and said second axis with the bit reversed Q most significant bits.
14. The method of claim 7 , wherein said providing a plot comprises, for said array that is an even log2N array, providing said first axis with Q least significant bits and said second axis with the bit reversed Q most significant bits.
15. The method of claim 7 , wherein said providing a plot comprises, for said array that is an odd log2N array, providing said first axis with bit reversed Q+1 least significant bits and said second axis with the bit reversed Q most significant bits.
16. The method of claim 7 , wherein said providing a plot comprises, for said array that is an even log2N array, providing said first axis with bit reversed Q least significant bits and said second axis with Q most significant bits.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/187,673 US20050256917A1 (en) | 2002-03-15 | 2005-07-22 | Address generators integrated with parallel FFT for mapping arrays in bit reversed order |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/097,407 US7047268B2 (en) | 2002-03-15 | 2002-03-15 | Address generators for mapping arrays in bit reversed order |
US11/187,673 US20050256917A1 (en) | 2002-03-15 | 2005-07-22 | Address generators integrated with parallel FFT for mapping arrays in bit reversed order |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/097,407 Continuation-In-Part US7047268B2 (en) | 2002-03-15 | 2002-03-15 | Address generators for mapping arrays in bit reversed order |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050256917A1 true US20050256917A1 (en) | 2005-11-17 |
Family
ID=29214389
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/097,407 Expired - Lifetime US7047268B2 (en) | 2002-03-15 | 2002-03-15 | Address generators for mapping arrays in bit reversed order |
US11/187,673 Abandoned US20050256917A1 (en) | 2002-03-15 | 2005-07-22 | Address generators integrated with parallel FFT for mapping arrays in bit reversed order |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/097,407 Expired - Lifetime US7047268B2 (en) | 2002-03-15 | 2002-03-15 | Address generators for mapping arrays in bit reversed order |
Country Status (2)
Country | Link |
---|---|
US (2) | US7047268B2 (en) |
EP (2) | EP2755128A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7640284B1 (en) * | 2006-06-15 | 2009-12-29 | Nvidia Corporation | Bit reversal methods for a parallel processor |
US20100145992A1 (en) * | 2008-12-09 | 2010-06-10 | Novafora, Inc. | Address Generation Unit Using Nested Loops To Scan Multi-Dimensional Data Structures |
US7836116B1 (en) | 2006-06-15 | 2010-11-16 | Nvidia Corporation | Fast fourier transforms and related transforms using cooperative thread arrays |
US7861060B1 (en) | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US9582474B2 (en) | 2013-07-01 | 2017-02-28 | International Business Machines Corporation | Method and apparatus for performing a FFT computation |
CN112822139A (en) * | 2021-02-04 | 2021-05-18 | 展讯半导体(成都)有限公司 | Data input and data conversion method and device |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7406494B2 (en) * | 2002-05-14 | 2008-07-29 | Texas Instruments Incorporated | Method of generating a cycle-efficient bit-reverse index array for a wireless communication system |
KR100492091B1 (en) * | 2002-12-10 | 2005-06-01 | 삼성전자주식회사 | The FFT Computation Circuits and Methods on Programmable Processors |
CN1831791B (en) * | 2006-04-12 | 2010-05-12 | 北京中星微电子有限公司 | Method for quickly changing address by software |
US20080034026A1 (en) * | 2006-08-01 | 2008-02-07 | Linfeng Guo | Method for improving computation precision in fast Fourier transform |
US8018597B2 (en) * | 2008-06-20 | 2011-09-13 | Com Dev International Ltd. | Slab waveguide spatial heterodyne spectrometer assembly |
US8572148B1 (en) * | 2009-02-23 | 2013-10-29 | Xilinx, Inc. | Data reorganizer for fourier transformation of parallel data streams |
CN104820581B (en) * | 2015-04-14 | 2017-10-10 | 广东工业大学 | A kind of method for parallel processing of FFT and IFFT permutation numbers table |
US9846678B2 (en) * | 2015-09-30 | 2017-12-19 | Apple Inc. | Fast Fourier Transform (FFT) custom address generator |
CN106415526B (en) * | 2016-08-10 | 2019-05-24 | 深圳市汇顶科技股份有限公司 | Fft processor and operation method |
CN115374022B (en) * | 2022-10-27 | 2023-02-07 | 北京象帝先计算技术有限公司 | Memory access method, device and system and electronic equipment |
GB2626959A (en) | 2023-02-08 | 2024-08-14 | Pragmatic Semiconductor Ltd | Memory circuitry |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USH570H (en) * | 1986-06-03 | 1989-01-03 | The United States Of America As Represented By The Secretary Of The Navy | Fast Fourier transform data address pre-scrambler circuit |
US6035313A (en) * | 1997-03-24 | 2000-03-07 | Motorola, Inc. | Memory address generator for an FFT |
US20020194235A1 (en) * | 2001-05-30 | 2002-12-19 | Fujitsu Limited | Processing apparatus |
US20040034677A1 (en) * | 2002-08-15 | 2004-02-19 | Zarlink Semiconductor Limited. | Method and system for performing a fast-fourier transform |
US7164723B2 (en) * | 2002-06-27 | 2007-01-16 | Samsung Electronics Co., Ltd. | Modulation apparatus using mixed-radix fast fourier transform |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3673399A (en) * | 1970-05-28 | 1972-06-27 | Ibm | Fft processor with unique addressing |
US4547862A (en) * | 1982-01-11 | 1985-10-15 | Trw Inc. | Monolithic fast fourier transform circuit |
US4823297A (en) * | 1986-12-12 | 1989-04-18 | Board Of Trustees Of The Leland Stanford Junior University | Digit-reversal method and apparatus for computer transforms |
JPS6432378A (en) * | 1987-07-29 | 1989-02-02 | Nec Corp | Bit inverting and transposing system |
JPH0795320B2 (en) * | 1988-10-11 | 1995-10-11 | 日本電子株式会社 | Large-capacity fast Fourier transform device |
US4974188A (en) * | 1988-12-09 | 1990-11-27 | The Johns Hopkins University | Address sequence generation by means of reverse carry addition |
US5038311A (en) * | 1990-08-10 | 1991-08-06 | General Electric Company | Pipelined fast fourier transform processor |
JPH05143633A (en) * | 1991-11-22 | 1993-06-11 | Nec Corp | Isogeometric fast fourier transform realizing system |
JP2950703B2 (en) * | 1992-04-30 | 1999-09-20 | シャープ株式会社 | Address generator, inverted field sequence generator and digit inverted sequence signal generating method for digit inversion for fast Fourier transform |
US5682340A (en) * | 1995-07-03 | 1997-10-28 | Motorola, Inc. | Low power consumption circuit and method of operation for implementing shifts and bit reversals |
SE509108C2 (en) * | 1997-01-15 | 1998-12-07 | Ericsson Telefon Ab L M | Method and apparatus for calculating FFT |
EP0855657B1 (en) * | 1997-01-22 | 2007-03-14 | Matsushita Electric Industrial Co., Ltd. | Fast fourier transforming apparatus and method |
JP3749022B2 (en) * | 1997-09-12 | 2006-02-22 | シャープ株式会社 | Parallel system with fast latency and array processing with short waiting time |
US6351758B1 (en) * | 1998-02-13 | 2002-02-26 | Texas Instruments Incorporated | Bit and digit reversal methods |
US6279096B1 (en) * | 1998-10-01 | 2001-08-21 | Intelect Communications, Inc. | Digital signal processing memory logic unit using PLA to modify address and data bus output values |
US6366937B1 (en) * | 1999-03-11 | 2002-04-02 | Hitachi America Ltd. | System and method for performing a fast fourier transform using a matrix-vector multiply instruction |
US6643761B1 (en) * | 1999-09-08 | 2003-11-04 | Massana Research Limited | Address generation unit and digital signal processor (DSP) including a digital addressing unit for performing selected addressing operations |
US6609140B1 (en) * | 1999-11-30 | 2003-08-19 | Mercury Computer Systems, Inc. | Methods and apparatus for fast fourier transforms |
US6789097B2 (en) * | 2001-07-09 | 2004-09-07 | Tropic Networks Inc. | Real-time method for bit-reversal of large size arrays |
US6988117B2 (en) * | 2001-12-28 | 2006-01-17 | Ceva D.S.P. Ltd. | Bit-reversed indexing in a modified harvard DSP architecture |
-
2002
- 2002-03-15 US US10/097,407 patent/US7047268B2/en not_active Expired - Lifetime
-
2003
- 2003-03-13 EP EP14162977.4A patent/EP2755128A1/en not_active Withdrawn
- 2003-03-13 EP EP03100645A patent/EP1378823A3/en not_active Ceased
-
2005
- 2005-07-22 US US11/187,673 patent/US20050256917A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USH570H (en) * | 1986-06-03 | 1989-01-03 | The United States Of America As Represented By The Secretary Of The Navy | Fast Fourier transform data address pre-scrambler circuit |
US6035313A (en) * | 1997-03-24 | 2000-03-07 | Motorola, Inc. | Memory address generator for an FFT |
US20020194235A1 (en) * | 2001-05-30 | 2002-12-19 | Fujitsu Limited | Processing apparatus |
US7164723B2 (en) * | 2002-06-27 | 2007-01-16 | Samsung Electronics Co., Ltd. | Modulation apparatus using mixed-radix fast fourier transform |
US20040034677A1 (en) * | 2002-08-15 | 2004-02-19 | Zarlink Semiconductor Limited. | Method and system for performing a fast-fourier transform |
US7024443B2 (en) * | 2002-08-15 | 2006-04-04 | 1021 Technologies Kk | Method and system for performing a fast-Fourier transform |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7861060B1 (en) | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US20110087860A1 (en) * | 2005-12-15 | 2011-04-14 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays |
US8112614B2 (en) | 2005-12-15 | 2012-02-07 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays with unique thread identifiers as an input to compute an identifier of a location in a shared memory |
US7640284B1 (en) * | 2006-06-15 | 2009-12-29 | Nvidia Corporation | Bit reversal methods for a parallel processor |
US7836116B1 (en) | 2006-06-15 | 2010-11-16 | Nvidia Corporation | Fast fourier transforms and related transforms using cooperative thread arrays |
US20100145992A1 (en) * | 2008-12-09 | 2010-06-10 | Novafora, Inc. | Address Generation Unit Using Nested Loops To Scan Multi-Dimensional Data Structures |
US8713285B2 (en) * | 2008-12-09 | 2014-04-29 | Shlomo Selim Rakib | Address generation unit for accessing a multi-dimensional data structure in a desired pattern |
US9582474B2 (en) | 2013-07-01 | 2017-02-28 | International Business Machines Corporation | Method and apparatus for performing a FFT computation |
CN112822139A (en) * | 2021-02-04 | 2021-05-18 | 展讯半导体(成都)有限公司 | Data input and data conversion method and device |
Also Published As
Publication number | Publication date |
---|---|
EP1378823A3 (en) | 2007-11-14 |
US20030200414A1 (en) | 2003-10-23 |
EP2755128A1 (en) | 2014-07-16 |
EP1378823A2 (en) | 2004-01-07 |
US7047268B2 (en) | 2006-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050256917A1 (en) | Address generators integrated with parallel FFT for mapping arrays in bit reversed order | |
US6609140B1 (en) | Methods and apparatus for fast fourier transforms | |
Swarztrauber | FFT algorithms for vector computers | |
US6304887B1 (en) | FFT-based parallel system for array processing with low latency | |
US7062523B1 (en) | Method for efficiently computing a fast fourier transform | |
US7761495B2 (en) | Fourier transform processor | |
US6993547B2 (en) | Address generator for fast fourier transform processor | |
JPH09153029A (en) | Memory distributed parallel computer for execution of fast fourier transform and its method | |
Swarztrauber et al. | Vector and parallel methods for the direct solution of Poisson's equation | |
Cetin et al. | An integrated 256-point complex FFT processor for real-time spectrum analysis and measurement | |
JP2010016830A (en) | Computation module to compute multi-radix butterfly to be used in dtf computation | |
US6963891B1 (en) | Fast fourier transform | |
US20060075010A1 (en) | Fast fourier transform method and apparatus | |
Elster | Fast bit-reversal algorithms | |
US6728742B1 (en) | Data storage patterns for fast fourier transforms | |
JP2677969B2 (en) | Orthogonal transformer | |
Kabal et al. | Performance of fixed-point FFT's: Rounding and scaling considerations | |
Arambepola | Discrete Fourier transform processor based on the prime-factor algorithm | |
JPH09212485A (en) | Two-dimensional idct circuit | |
US9582473B1 (en) | Instruction set to enable efficient implementation of fixed point fast fourier transform (FFT) algorithms | |
Cho et al. | Real-factor FFT algorithms | |
JP3709291B2 (en) | Fast complex Fourier transform method and apparatus | |
Wong et al. | Fast address generation for the computation of prime factor algorithms | |
JP3970442B2 (en) | Discrete cosine transform device and inverse discrete cosine transform device | |
Entacher | Generalized Haar function systems, digital nets, and quasi-Monte Carlo integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TELOGY NETWORKS, INC., MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HARLEY, THOMAS RANDALL;REEL/FRAME:016607/0969 Effective date: 20050713 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |