WO2004015558A1  Apparatus and method for computing a reciprocal of a complex number  Google Patents
Apparatus and method for computing a reciprocal of a complex number Download PDFInfo
 Publication number
 WO2004015558A1 WO2004015558A1 PCT/US2003/014504 US0314504W WO2004015558A1 WO 2004015558 A1 WO2004015558 A1 WO 2004015558A1 US 0314504 W US0314504 W US 0314504W WO 2004015558 A1 WO2004015558 A1 WO 2004015558A1
 Authority
 WO
 WIPO (PCT)
 Prior art keywords
 result
 corresponding
 multiplier
 configured
 providing
 Prior art date
Links
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
 G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
 G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using noncontactmaking devices, e.g. tube, solid state device; using unspecified devices
 G06F7/4806—Computations with complex numbers
Abstract
Description
APPARATUS AND METHOD FOR COMPUTING A RECIPROCAL OF A COMPLEX NUMBER
PRIORITY CLAIM
This application claims the benefit of United States Provisional Patent Application No. 60/380,620, filed August 7, 2002, entitled "SIMPLIFIED HARDWARE DIVIDER CIRCUIT," which is incorporated herein by reference.
FIELD OF THE INVENTION
The present invention relates to the field of digital data processing, and more particularly, to an apparatus and method for computing a reciprocal of a complex number.
BACKGROUND OF THE INVENTION
Memory sizes, integrated circuit ("IC") sizes, and computational speeds are increasingly valuable commodities in digital processing systems. Some processing applications require computing reciprocals (i.e., multiplicative inverses) of complex numbers. That is, for a number which may be expressed as (a +jb), where the "a" indicates a real magnitude and the "b" indicates an imaginary magnitude, some applications require computation of 1 /(a +jb). However, divider apparatuses are notoriously difficult to implement in integrated circuits due to their large size requirements. For dividing small operands (less than 5 bits each, for example), a lookup table ("LUT") may be used to reduce the space requirements of the computational circuitry. But for larger operands, historical LUT approaches have required impractical amounts of memory. Operands of 10 bits each, for example, would typically require an LUT of 512X512 entries. Iterative techniques have also been used to reduce the sizes and memory requirements of some divider circuits, but iteration processes typically provide slow results.
The present invention is directed to overcoming some of the drawbacks of conventional approaches for computing reciprocals of complex numbers.
SUMMARY OF THE INVENTION
An apparatus for computing a reciprocal of a complex number which may be expressed as a real part and an imaginary part includes a first multiplier (1 10) configured to provide a first result corresponding to a multiplication of the real part by itself, a second multiplier (144) configured to provide a second result corresponding to a multiplication of the imaginary part by itself, and a summer (180) coupled to the first multiplier (1 10) to receive the first result therefrom and coupled to the second multiplier (144) to receive the second result therefrom. The summer (180) is configured to provide a third result corresponding to an addition of the first result and the second result. The apparatus further includes a barrel shifter (220) coupled to the summer (180) to receive the third result therefrom. The barrel shifter (220) is configured to provide a fourth result corresponding to the third result shifted by a first number of bits. The apparatus also includes a converter (260, 4010) coupled to the barrel shifter (220) to receive the fourth result therefrom. The converter (260, 4010) includes a lookup table ("LUT") and the converter (260, 4010) is configured to provide a fifth result based on the fourth result and the LUT. The fifth result approximates a reciprocal of the fourth result.
A method for computing a reciprocal of a complex number which may be expressed as a real part and an imaginary part includes providing a first result corresponding to a multiplication of the real part by itself, providing a second result corresponding to a multiplication of the imaginary part by itself, providing a third result corresponding to an addition of the first result and the second result, providing a fourth result corresponding to the third result shifted by a first number of bits, and providing a fifth result based on the fourth result and a lookup table. The fifth result approximates a reciprocal of the fourth result.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings:
FIG. 1 is a block diagram of an exemplary apparatus for computing a reciprocal of a complex number according to the present invention;
FIG. 2 is a depiction of an exemplary lookup table for the exemplary LUT block of the exemplary apparatus of FIG. 1 ;
FIG. 3A is a table of exemplary operational data (including lookup table data from FIG. 2) for the exemplary apparatus of FIG. 1 ;
FIG. 3B is a continuation of FIG. 3A;
FIG. 4 is a block diagram of an exemplary alternative apparatus for computing a reciprocal of a complex number according to the present invention;
FIG. 5 is a block diagram of another exemplary alternative apparatus for computing a reciprocal of a complex number according to the present invention;
FIG. 6 is a depiction of an exemplary lookup table for the exemplary apparatus of FIG. 5;
FIG. 7A is a table of exemplary operational data (including lookup table data from FIG. 6) for the exemplary apparatus of FIG. 5; FIG. 7B is a continuation of FIG. 7A; and
FIG. 8 is a block diagram of yet another exemplary alternative apparatus for computing a reciprocal of a complex number according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The characteristics and advantages of the present invention will become more apparent from the following description, given by way of example.
FIG. 1 is a block diagram of an exemplary apparatus 100 for computing a reciprocal of a complex number according to the present invention. In general, exemplary apparatus 100 is configured to provide a hardware approximation of a scaled inverse of a complex number (i.e., K/(aj 4 jbj)), where the complex number (i.e., (ai+jb_{j})) has a 10bit, 2's complement real part (i.e., a and a 10bit, 2's complement imaginary part (i.e., b,), and where K represents a rough or approximate overall system gain (i.e., cumulative scaling factor) that results from a multiplicative product of various scaling factors (k,, k_{2}, . . . k_{n}) introduced by computational operations as discussed further below. It should be appreciated, then, that the scaled inverse of a; + jbj (i.e., K/tø_{j} + jbj)) may be mathematically expressed as follows:
where a_{0} is the real portion of the scaled inverse, where b_{0} is the imaginary portion of the scaled inverse, where a_{0} = bj^{2}), where b_{0} = Kb^a,^{2} + b,^{2}) where (a^ + b^{2}) is the magnitude squared of the input number, and where K = (k, *k_{2} . . . *k_{n})
Further, it should be appreciated that because a_{0} and b_{0} (above) have a common denominator (i.e., (a^{2} + bi^{2})), this common denominator needs to be computed only once to calculate the inverse. On the other hand, in exemplary apparatus 100 this denominator becomes a 21 bit quantity. Although large dividers are typically not easily implemented in IC hardware, the present invention provides significant hardware simplifications while maintaining on the order of about 1 2% of the ideal performance.
As shown in FIG. 1 , apparatus 100 includes a multiplier 1 10. Multiplier 1 10 includes a 10bit, 2's complement input 120, a 10bit, 2's complement input 130, and a 21 bit, 2's complement output 140. Multiplier 1 10 is configured to receive a, (which is discussed above) through input 120 and through input 130 and configured to provide a result through output 140 that corresponds to a^{2} (i.e., (a_{j} x aj)).
Apparatus 100 further includes a multiplier 144. Multiplier 144 includes a 10bit, 2's complement input 150, a 10bit, 2's complement input 160, and a 21 bit, 2's complement output 170. Multiplier 144 is configured to receive b_{s} (discussed above) through input 150 and through input 160 and configured to provide a result through output 170 that corresponds to b_{} ^{2} (i.e., (b_{(} x b_{j})).
Apparatus 100 further includes a summer 180 including a 21 bit, 2's complement input 190, a 21 bit, 2's complement input 200, and a 21 bit, 2's complement output 210. Summer 180 is configured to receive a 21 bit, 2's complement number through input 190, to receive a 21 bit, 2's complement number through input 200, and to provide through output 210 a 21 bit, 2's complement number corresponding to an addition of the numbers received through its inputs.
Apparatus 100 further includes a barrel shifter 220. Barrel shifter 220 includes a 21 bit, 2's complement input 230, a 6bit output 240, and a 4bit shift control output 250. In general, barrel shifter 220 is configured to extract 6 bits for computing the approximation of the scaled inverse from the 21 bit input (and to discard the remaining bits). Accordingly, barrel shifter 220 is configured to receive a 21 bit, 2's complement input word through input 230, to left shift the input word (discarding the previous most significant bit ("MSB") and tacking on a trailing "0" as the new least significant bit ("LSB") with each shift) until reaching the first "1 " in the data portion of the input word (i.e., until reaching the most significant "1 " other than the sign bit), to provide through output 240 the next 6 MSB after the first "1 " (or simply the 6 LSB in the event that the first 14 MSB of the data portion are all "0"), and to provide through shift control output 250 a 4bit shift control word indicating the number of left shifts of the input word required to reach the first "1 " (or indicating 14 shifts in the event that the first 14 MSB of the data are all "0").
Apparatus 100 further includes a converter arrangement 260. Converter arrangement 260 includes a lookup table ("LUT") block 270. LUT block 270 includes a 6bit input 280 and a 6bit output 290. LUT block 270 is configured to receive a 6bit input word through input 280 and to provide a 6bit output word through output 290 corresponding to a 64entry LUT (see FIG. 2) of possible input word states versus a scaled and offset inverse of the input word states. FIG. 2 is a depiction of an exemplary lookup table 300 for exemplary LUT block 270 of exemplary apparatus 100. The "LUT input data" is assigned to cover the possible states for the input word received through input 280 (see FIG. 1 ). The "LUT output data" is assigned according to the following formula:
LUT output data (decimal) =
INT(64*(2*(1 /((LUT input data/64) + 1 )) D +0.5)
Referring again to FIG. 1 , converter arrangement 260 further includes a prepend block 310. Prepend block 310 includes a 6bit input 320 and an 8bit, 2's complement output 330. Prepend block 310 is configured to receive a 6bit input word through input 320 and provide through output 330 an 8bit, 2's complement output word corresponding to the 6bit input word with "01 " prepended (i.e., with a leading "01 " added or tacked on as MSB). Converter arrangement 260 also includes a 6bit data bus 340 that suitably couples output 290 of LUT block 270 to input 320 of prepend block 310.
Apparatus 100 further includes a multiplier arrangement 350. Multiplier arrangement 350 includes a 10bit, 2's complement input 360 and a 10bit, 2's complement output 370. Multiplier arrangement 350 is configured to receive a 10bit, 2's complement input word through input 360 and to provide through output 370 a 10bit, 2's complement output word corresponding to the magnitude of the input word with a sign bit opposite that of the input word (i.e., a multiplication by negative one).
Apparatus 100 further includes a multiplier 380. Multiplier 380 includes a 10bit, 2's complement input 390, an 8bit, 2's complement input 400, and an 18bit, 2's complement output 410. Multiplier 380 is configured to receive a 10bit, 2's complement input word through input 390, to receive an 8bit, 2's complement input word through input 400, and to provide a result through output 410 that corresponds to the multiplicative product of the two input words.
Apparatus 100 further includes a multiplier 420. Multiplier 420 includes a 10bit, 2's complement input 430, an 8bit, 2's complement input 440, and an 18bit, 2's complement output 450. Multiplier 420 is configured to receive a 10bit, 2's complement input word through input 430, to receive an 8bit, 2's complement input word through input 440, and to provide a result through output 450 that corresponds to the multiplicative product of the two input words. Apparatus 100 further includes a dual inverse barrel shifter 460. Dual inverse barrel shifter 460 includes an 18bit, 2's complement input 470, an 18 bit, 2's complement input 480, a shift control input 490, a 31 bit, 2's complement output 500, and a 31 bit, 2's complement output 510. In general, dual inverse barrel shifter 460 is configured to desirably scale the magnitudes of the outputs of multiplier 380 and multiplier 420 by applying a bit shift to each of these quantities depending on the number of LSB discarded by barrel shifter 220. Accordingly, dual inverse barrel shifter 460 is configured to receive a first 18bit, 2's complement input data word through input 470, to receive a second 18bit, 2's complement input data word through input 480, and to receive the shift control word (discussed above) through input 490. In exemplary apparatus 100, dual inverse barrel shifter 460 is configured to shift each of the first input data word and second input data word according to the following inverse shifting formula No.1 :
Number of inverse shifts (decimal) = [(shift control word)  9]; where a Number of inverse shifts = 0 results in no shifting, where a Number of inverse shifts > 0 results in right shifting by the Number of inverse shifts, and where a Number of inverse shifts < 0 results in left shifting by the Number of inverse shifts
Dual inverse barrel shifter 460 is further configured to add leading and/or trailing "0"s during and/or after shifting the first input data word and second input data word to fill out or form a corresponding first 31 bit, 2's complement output data word and a corresponding second 31 bit, 2's complement output data word. Further, dual inverse barrel shifter 460 is configured to provide the first output data word through output 500 and to provide the second output data word through output 510.
Apparatus 100 further includes a limiter 520. Limiter 520 includes a 31  bit, 2's complement input 530 and a 20bit, 2's complement output 540. Limiter 520 is configured to receive a 31 bit, 2's complement input word through input 530 and to provide a corresponding scaled and offset 20bit, 2's complement output word through output 540.
Apparatus 100 further includes a limiter 550. Limiter 550 includes a 31  bit, 2's complement input 560 and a 20bit, 2's complement output 570. Limiter 550 is configured to receive a 31 bit, 2's complement input word through input 560 and to provide a corresponding scaled and offset 20bit, 2's complement output word through output 570.
Apparatus 100 further includes a delay 600. Delay 600 includes a 10bit, 2's complement input 610 and a 10bit, 2's complement output 620. Delay 600 is configured to receive a 10bit, 2's complement input word through input 610 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 620. Accordingly, it should be appreciated that delay 600 is configured such that it does not materially affect computational values. That is, delay 600 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 630. Delay 630 includes a 10bit, 2's complement input 640 and a 10bit, 2's complement output 650. Delay 630 is configured to receive a 10bit, 2's complement input word through input 640 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 650. Accordingly, it should be appreciated that delay 630 is configured such that it does not materially affect computational values. That is, delay 630 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 660. Delay 660 includes a 10bit, 2's complement input 670 and a 10bit, 2's complement output 680. Delay 660 is configured to receive a 10bit, 2's complement input word through input 670 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 680. Accordingly, it should be appreciated that delay 660 is configured such that it does not materially affect computational values. That is, delay 660 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 690. Delay 690 includes a 10bit, 2's complement input 700 and a 10bit, 2's complement output 710. Delay 690 is configured to receive a 10bit, 2's complement input word through input 700 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 710. Accordingly, it should be appreciated that delay 690 is configured such that it does not materially affect computational values. That is, delay 690 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 720. Delay 720 includes a 10bit, 2's complement input 730 and a 10bit, 2's complement output 740. Delay 720 is configured to receive a 10bit, 2's complement input word through input 730 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 740. Accordingly, it should be appreciated that delay 720 is configured such that it does not materially affect computational values. That is, delay 720 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 750. Delay 750 includes a 10bit, 2's complement input 760 and a 10bit, 2's complement output 770. Delay 750 is configured to receive a 10bit, 2's complement input word through input 760 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 770. Accordingly, it should be appreciated that delay 750 is configured such that it does not materially affect computational values. That is, delay 750 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 780. Delay 780 includes a 10bit, 2's complement input 790 and a 10bit, 2's complement output 800. Delay 780 is configured to receive a 10bit, 2's complement input word through input 790 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 800. Accordingly, it should be appreciated that delay 780 is configured such that it does not materially affect computational values. That is, delay 780 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 further includes a delay 810. Delay 810 includes a 10bit, 2's complement input 820 and a 10bit, 2's complement output 830. Delay 810 is configured to receive a 10bit, 2's complement input word through input 820 and to provide an equivalent (but delayed) 10bit, 2's complement output word through output 830. Accordingly, it should be appreciated that delay 810 is configured such that it does not materially affect computational values. That is, delay 810 is merely a pipeline register for facilitating synchronous data flow during operation as discussed further below.
Apparatus 100 also includes: a 10bit data bus 850 that suitably couples input 120 of multiplier 1 10 to input 130 of multiplier 1 10 and to input 610 of delay 600; a 10bit data bus 860 that suitably couples input 150 of multiplier 144 to input 160 of multiplier 144 and to input 670 of delay 660; a 21 bit data bus 870 that suitably couples output 140 of multiplier 1 10 to input 190 of summer 180; a 21 bit data bus 880 that suitably couples output 170 of multiplier 144 to input 200 of summer 180; a 21 bit data bus 890 that suitably couples output 210 of summer 180 to input 640 of delay 630; a 21 bit data bus 900 that suitably couples output 650 of delay 630 to input 230 of barrel shifter 220; a 6bit data bus 910 that suitably couples output 240 of barrel shifter 220 to input 280 of LUT block 270; an 8bit data bus 920 that suitably couples output 330 of prepend block 310 to input 400 of multiplier 380 and to input 440 of multiplier 420; a 10bit data bus 930 that suitably couples output 620 of delay 600 to input 390 of multiplier 380; a 4bit data bus 940 that suitably couples output 250 of barrel shifter 220 to input 730 of delay 720; a 10bit data bus 950 that suitably couples output 680 of delay 660 to input 360 of multiplier arrangement 350; a 10bit data bus 960 that suitably couples output 370 multiplier arrangement 350 to input 430 of multiplier 420; an 18bit data bus 970 that suitably couples output 410 of multiplier 380 to input 700 of delay 690; an 18bit data bus 980 that suitably couples output 450 of multiplier 420 to input 760 of delay 750; a 4bit data bus 990 that suitably couples output 740 of delay 720 to input 490 of dual inverse barrel shifter 460; an 18bit data bus 1000 that suitably couples output 710 of delay 690 to input 470 of dual inverse barrel shifter 460; an 18bit data bus 1010 that suitably couples output 770 of delay 750 to input 480 of dual inverse barrel shifter 460; a 31 bit data bus 1020 that suitably couples output 500 of dual inverse barrel shifter 460 to input 530 of limiter 520; a 31 bit data bus 1030 that suitably couples output 510 of dual inverse barrel shifter 460 to input 560 of limiter 550; a 20bit data bus 1040 that suitably couples output 540 of limiter 520 to input 790 of delay 780; a 20 bit data bus 1050 that suitably couples output 570 of limiter 550 to input 820 of delay 810; a 20bit data bus 1060 suitably extending from output 800 of delay 780; and a 20bit data bus 1070 suitably extending from output 830 of delay 810.
In operation, exemplary apparatus 100 (see FIG. 1 ) receives a_{f} through input 120 and input 130 of multiplier 1 10. Further, apparatus 100 receives b, through input 150 and input 160 of multiplier 144. Through output 140, multiplier 1 10 provides the quantity a^{2} to input 190 of summer 180. Through output 170, multiplier 144 provides the quantity b^{2} to input 200 of summer 180. Through output 210, summer 180 provides the quantity (a^ + b^{2}) to input 640 of delay 630. Through output 650, delay 630 provides the quantity (a^{2} + b,^{2}) to input 230 of barrel shifter 220.
Barrel shifter 220 searches for the most significant "1 " in the data portion of the quantity (a^{2} + b^{2}) . After barrel shifter 220 finds the most significant "1 ," barrel shifter 220 provides (through output 240) the next 6 bits to input 280 of LUT block 270 and barrel shifter 220 provides through shift control output 250 the 4bit shift control word, indicating the number of left shifts of the input word required to reach the first "1 " (or indicating 14 shifts in the event that the first 14 MSB of the data are all "0"). Through output 290, LUT block 270 provides a scaled and offset inverse of the 6bit quantity (received from barrel shifter 220) to input 320 of prepend block 310 (see "LUT output" data of FIG. 2). At only 64 entries by 6 bits per entry, the LUT (see FIG. 2) is quite compact, while providing more than 1 % accuracy. Through output 330, prepend block 310 provides an 8bit, 2's complement output word corresponding to the 6bit quantity (received from barrel shifter 220) with "01 " prepended (i.e., with "01 " added as MSB).
Thus, apparatus 100 provides efficiency by using the 7 most significant nonzero bits of the quantity (a^ + bj^{2}). However, it should be appreciated that in alternative embodiments the accuracy of apparatus 100 can be made arbitrarily higher or lower by increasing or decreasing the size of the LUT (and the corresponding number of bits used for the approximation), respectively, as desired.
Meanwhile, delay 600 receives a_{t} through input 610 and delay 660 receives , through input 670. Further, through output 620 delay 600 provides a, to input 390 of multiplier 380 and through output 680 delay 660 provides bj to input 360 of multiplier arrangement 350. Through output 370 multiplier arrangement 350 provides the quantity (bj) to input 430 of multiplier 420.
Through output 410 multiplier 380 provides a bitshifted approximation of the quantity a^a^{2} + b^{2}) to input 700 of delay 690. Further, through output 450 multiplier 420 provides a bitshifted approximation of the quantity b;/(aj^{2} + bj^{2}) to input 760 of delay 750.
Meanwhile, through output 710 delay 690 provides the bitshifted approximation of the quantity a;/(a^{2} + bj^{2}) to input 470 of dual inverse barrel shifter 460, and through output 770 delay 750 provides the bitshifted approximation of the quantity b_{j}/(a^{2} + b^{2}) to input 480 of dual inverse barrel shifter 460. Additionally, through input 730 delay 720 receives the shift control word from output 250 of barrel shifter 220, and through output 740 delay 720 provides the shift control word to input 490 of dual inverse barrel shifter 460.
Dual inverse barrel shifter 460 receives the bitshifted approximation of the quantity a^a^ + b^{2}) through input 470, receives the bitshifted approximation of the quantity bj/ta,^{2} + b,^{2}) through input 480, and receives the shift control input word through input 490. Further, dual inverse barrel shifter 460 shifts the bitshifted approximation of the quantity a iaf + b?) by the "Number of inverse shifts" as discussed above, and through output 500 it provides the corresponding 31 bit, 2's complement output data word to input 530 of limiter 520. Similarly, dual inverse barrel shifter 460 shifts the bitshifted approximation of the quantity  b_{j}/fa^ + b^{2}) by the "Number of inverse shifts" as discussed above, and through output 510 it provides the corresponding 31 bit, 2's complement output data word to input 560 of limiter 550.
Through input 530, limiter 520 receives the 31 bit, 2's complement word from output 500 of dual inverse barrel shifter 460. Through output 540, limiter 520 provides a corresponding scaled and offset 20bit, 2's complement output word to input 790 of delay 780, and delay 780 provides a_{0} through output 800. Further, through input 560 limiter 550 receives the 31 bit, 2's complement word from output 510 of dual inverse barrel shifter 460. Through output 570, limiter 550 provides a corresponding scaled and offset 20bit, 2's complement output word to input 820 of delay 810, and delay 810 provides b_{0} through output 830.
The delays (600, 630, 660, 690, 720, 750, 780, 810) facilitate synchronous data flow by respectively delaying the data in their paths such that the data remains properly aligned for the various computations during operation. The net result for exemplary apparatus 100 is an overall "latency" of 3 clock cycles. In other words, apparatus 100 provides a_{0} and b_{0} three clock cycles after receiving a, and b,.
FIG. 3A is a table 2000 of exemplary operational data (including lookup table data from FIG. 2) for exemplary apparatus 100 of FIG. 1 , and FIG. 3B is a continuation of FIG. 3A. The "Nonzero MSB (BIN)" data (column 2010) represents the possible binary results from operation of barrel shifter 220. The "1 " to the left of the decimal point indicates the first "1 " found by barrel shifter 220; while the 6 bits to the left of the decimal point indicate the possible values for the next 6 bits after the first "1 ." For clarity of exposition, the "Nonzero MSB (DEC)" data (column 2020) and the "Nonzero MSB (Fractional)" data (column 2030) show the decimal (i.e., base 10) and the fractional equivalents of the "Nonzero MSB (BIN)" data, respectively.
The "LUT input (BIN)" data (column 2040) represents the possible outputs of barrel shifter 220 (i.e., the possible inputs to LUT block 270; see also FIG. 2). For clarity of exposition, the "LUT IN (DEC)" data (column 2050) shows the decimal equivalent of the "LUT input (BIN)" data.
Meanwhile, the "LUT output (BIN)" data (column 2060) represents the possible outputs of LUT block 270 (i.e., the possible inputs to prepend block 310; see also FIG. 2). For clarity of exposition, the "LUT output (DEC)" data (column 2070) shows the decimal equivalent of the "LUT output (BIN)" data.
The "2 / Nonzero MSB (Fractional)" data (column 2080) shows 2 times the ideal inverse of the "Nonzero MSB (Fractional)" data. For the most part, the "LUT output (DEC)" data corresponds to the numerator of the fractional portion of 2 times the inverse of the 7 nonzero MSB (i.e., the first "1 " plus the next 6 bits) found by barrel shifter 220. It is noted here, however, that because 64 decimal cannot be expressed by only 6 bits, the maximum value for "LUT output (BIN)" is assigned "1 1 1 1 1 1 " (i.e., 63 decimal). For clarity of exposition, the "2 / Nonzero MSB (DEC)" data (column 2084) shows the decimal equivalent of the "2 / Nonzero MSB (Fractional)" data.
The "Prelnverse shift data (BIN)" data (column 2090) shows the possible outputs from prepend block 310. Here, the "01 "s to the left of the decimal points represent the "01 " that is prepended by prepend block 310, and the data to the left of the decimal points corresponds to the outputs from LUT block 270 (see column 2060). For clarity of exposition, the "Preinverse shift data (DEC)" data (column 2100) shows the decimal equivalent of the "Prelnverse shift data (BIN)" data. The "% error" data (column 21 10) shows that the errors between the "2 I Nonzero MSB (DEC)" data and the "Preinverse shift data (DEC)" data are all well under 3%. Indeed, the errors are approximately 1 /2 an LSB; so with 6 bits , the errors should be less than 1 /126 (i.e., less than 0.8%). However, it is noted that in alternative embodiments the accuracy of apparatus 100 can be made arbitrarily higher or lower by increasing or decreasing the size of the LUT, respectively, as desired.
Thus, it should be appreciated that apparatus 100 implements a modified "exponent/mantissa" type of computation, where the scaled mantissa is stored in LUT block 270. In other words, the input to LUT block 270 may be thought of as 1 X_{0}X^_{2}X_{3}X_{4}X_{5} * 2^{"6} (i.e., the quantity 1 .XoX^XaX^ or 1 + (X_{0}X_{1}X_{2}X_{3}X_{4}X_{5}/64)) while the output from prepend block 310 (i.e., the significant bits f quantity (2^{1} * 1
and the corresponding values for _{0}Y_{1}Y_{2}Y_{3}Y _{5} ^{are} stored in LUT block 270. Further, operations of apparatus 100 may be thought of as carrying an understood "01 " (exponent) around LUT block 270 and prepending the "1 " back to the output of LUT block 270 via prepend block 310. As a result, the possible inputs to LUT block 270 have a range from 000000 binary (i.e., 0 decimal) to 1 1 1 1 1 1 binary (i.e., 63 decimal) while the possible outputs correspondingly range from 1 1 1 1 1 1 binary (i.e., 63 decimal) to 000001 binary (i.e., 1 decimal). It is noted that one of the factors of 2 (i.e., 2^{1}) in the output from prepend block 310 merely ensures a range over all 63 possible outputs from LUT block 270. Without this factor of 2\ LUT block 270 would only use half of the possible 63 values  which would limit precision. It should be further appreciated, then, that exemplary apparatus 100 provides a rough or approximate overall system gain or cumulative scaling factor, K (noted above), such that:K = (k_{1} *k_{2}*k_{3});
where barrel shifter 220 introduces the gain factor, k_{1 f} as follows:
k, = 2^{,f1)}; f., = [total bit width of the 2's complement quantity (a^ + b^{2})]
 [total bit width of the approximation]
 [number of left shifts of the input word]
+ [LUT input exponent/mantissa format contribution];
which, in operation of exemplary apparatus 100, simplifies as:
f, = 21  7  [shift control word] + 6; f, = 20  [shift control word];
K. o [20  (shift control word)]
And where LUT block 270 and prepend block 310 in combination introduce the approximate gain factor, k_{2}, as follows:
k_{2} = 2«^{2}'; f_{2} = (2^{1} * 2^{6} ) = 2^{7}; k_{2} = 2^{7} (approximately)
And where dual inverse barrel shifter 460 introduces the gain factor, k_{3}, as follows:
k_{3} = 2«^{31}; f_{3} = [shift control word]  9; k = θ"^{stlift} control word]  9 ) .
So that:
ιx _ ; [20  (shift control wordlh # ( 9^{7} * f "^{8}*^{1}'** control word]  9 h .
K = 2^{18} (approximately) Next, FIG. 4 is a block diagram of an exemplary alternative apparatus 3000 for computing a reciprocal of a complex number according to the present invention. Apparatus 3000 is configured and operates in a like manner to exemplary apparatus 100 (discussed above), except apparatus 3000 provides additional hardware reduction by applying the barrel shift/ inverse barrel shift concept to the real and imaginary input operands.
Accordingly, apparatus 3000 includes a barrel shifter 3010 which is configured to left shift a, until reaching the most significant "1 " in the data portion, to provide this first "1 " and the next 3 MSB (or simply the 4 LSB in the event that the first 5 MSB of the a_{t} data are all "0"), and to provide a 3bit shift control word indicating the number of left shifts of the input word required to reach the first "1 " (or indicating 5 shifts in the event that the first 6 MSB of the a, data are all "0"). Similarly, barrel shifter 3020 is configured to left shift bj until reaching the most significant "1 ", to provide this first "\ " and the next 3 MSB (or simply the 4 LSB in the event that the first 5 MSB of the b, data are all "0"), and to provide a 3bit shift control word indicating the number of left shifts of the input word required to reach the first "1 " (or indicating 5 shifts in the event that the first 6 MSB of the b, data are all "0").
Apparatus 3000 also includes a multiplier 3012 and a multiplier 3014. Multiplier 3012 is configured to receive the left shifted a_{f} and the prepended output from converter arrangement 260, and to provide a result corresponding to the multiplicative product of these inputs; while multiplier 3014 is configured to receive the left shifted  b_{j} and the prepended output from converter arrangement 260, and to provide a result corresponding to the multiplicative product of these inputs.
Apparatus 3000 further includes a dual inverse barrel shifter 3030 which is configured to receive the 12bit, 2's complement bitshifted approximation of the quantity
+ b^{2}), to receive the 12bit, 2's complement bitshifted approximation of the quantity + bj^{2}), and to receive shift control input word numbers 0, 1 , and 2.Dual inverse barrel shifter 3030 is further configured to shift the 12bit approximation of the quantity a;/(a^{2} + b^{2}) according to the following formula inverse shifting formula No.2:
Number of inverse shifts (decimal) = (shift control input number 0)  9
+ (shift control input number 1 )  6, where a Number of inverse shifts = 0 results in no shifting, where a Number of inverse shifts > 0 results in right shifting by the Number of inverse shifts, and where a Number of inverse shifts < 0 results in left shifting by the Number of inverse shifts
Similarly, dual inverse barrel shifter 3030 is further configured to shift the 12bit approximation of the quantity bj/fa^{2} + bj^{2}) according to the following inverse shifting formula No.3:
Number of inverse shifts (decimal) = (shift control input number 0)  9
+ (shift control input number 2)  6, where a Number of inverse shifts = 0 results in no shifting, where a Number of inverse shifts > 0 results in right shifting by the Number of inverse shifts, and where a Number of inverse shifts < 0 results in left shifting by the Number of inverse shifts
Dual inverse barrel shifter 3030 is also configured to add leading or trailing "0"s to each of the shifted quantities to form a corresponding first 24bit, 2's complement output data word and a corresponding second 24bit, 2's complement output data word.
Thus, if a slight reduction in divider accuracy is acceptable, apparatus 3000 provides reduced bit widths for the inputs to the multipliers. This should facilitate smaller circuit areas for the multipliers and, consequently, smaller overall circuit size and increased computational speed. It should be appreciated that the bit widths shown are merely exemplary, however, and may be varied in alternative embodiments according to the desired computational accuracy. To this end, corresponding variations of the constants for inverse shifting formula No.2 and inverse shifting formula No.3 should be readily appreciated by comparison to the basic form of inverse shifting formula No.1 (above).
FIG. 5 is a block diagram of another exemplary alternative apparatus 4000 for computing a reciprocal of a complex number according to the present invention. Apparatus 4000 is configured and operates in a like manner to exemplary apparatus 100 (discussed above), except converter arrangement 4010 includes a LUT block 4020 that operates from an LUT that differs from that of LUT block 270 and includes a prepend block 4030 that prepends "00" rather than the "01 " of prepend block 310. FIG. 6 is a depiction of an exemplary lookup table 5000 for exemplary apparatus 4000 of FIG. 5. In apparatus 4000, the "LUT output data" is assigned according to the following formula:
LUT output data (decimal) =
INT(64*(1 /((LUT input data/64) + 1 )) + 0.5)
FIG. 7A is a table 6000 of exemplary operational data (including lookup table data from FIG. 6) for exemplary apparatus 4000 of FIG. 5, and FIG. 7B is a continuation of FIG. 7A. It should be appreciated that in this embodiment the LUT provides only about 1 /2 of the possible 6bit codes, and some codes are repeated. Similar to apparatus 100, the errors for apparatus 4000 are approximately 1 /2 an LSB. However, for apparatus 4000 this results in maximum errors on the order of 1 /63 (i.e., 1 .6%). So, although the LUT for apparatus 4000 is somewhat less complex than that of apparatus 100, the resulting errors are also on the order of twice those of apparatus 100. On the other hand, it should be appreciated that in alternative embodiments the accuracy of apparatus 4000 can be made arbitrarily higher or lower by increasing or decreasing the size of the LUT, respectively, as desired.
FIG. 8 is a block diagram of yet another exemplary alternative apparatus 7000 for computing a reciprocal of a complex number according to the present invention. It should be readily appreciated that apparatus 7000 is configured and operates in a like manner to the alternative embodiments discussed above, except apparatus 7000 employs dual inverse barrel shifter 3030 (to apply the barrel shift/ inverse barrel shift concept to the real and imaginary input operands) in conjunction with converter arrangement 4010 (for the less complex LUT) (see FIG. 4, FIG. 5, FIG. 6, FIG. 7A, and FIG. 7B).
Claims
Priority Applications (2)
Application Number  Priority Date  Filing Date  Title 

US38062002P true  20020807  20020807  
US60/380,620  20020807 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

AU2003233499A AU2003233499A1 (en)  20020807  20030508  Apparatus and method for computing a reciprocal of a complex number 
Publications (1)
Publication Number  Publication Date 

WO2004015558A1 true WO2004015558A1 (en)  20040219 
Family
ID=31715636
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2003/014504 WO2004015558A1 (en)  20020807  20030508  Apparatus and method for computing a reciprocal of a complex number 
Country Status (2)
Country  Link 

AU (1)  AU2003233499A1 (en) 
WO (1)  WO2004015558A1 (en) 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

US3828175A (en) *  19721030  19740806  Amdahl Corp  Method and apparatus for division employing tablelookup and functional iteration 
US5650953A (en) *  19921225  19970722  Fujitsu Limited  Reciprocal number arithmetic operating method and circuit which are used in modem 
US6502117B2 (en) *  19981012  20021231  Intel Corporation  Data manipulation instruction for enhancing value and efficiency of complex arithmetic 

2003
 20030508 WO PCT/US2003/014504 patent/WO2004015558A1/en not_active Application Discontinuation
 20030508 AU AU2003233499A patent/AU2003233499A1/en not_active Abandoned
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

US3828175A (en) *  19721030  19740806  Amdahl Corp  Method and apparatus for division employing tablelookup and functional iteration 
US5650953A (en) *  19921225  19970722  Fujitsu Limited  Reciprocal number arithmetic operating method and circuit which are used in modem 
US6502117B2 (en) *  19981012  20021231  Intel Corporation  Data manipulation instruction for enhancing value and efficiency of complex arithmetic 
Also Published As
Publication number  Publication date 

AU2003233499A1 (en)  20040225 
Similar Documents
Publication  Publication Date  Title 

US4905177A (en)  High resolution phase to sine amplitude conversion  
US5831878A (en)  Exponential and logarithmic conversion circuit  
US6369725B1 (en)  Method for binary to decimal conversion  
US6209017B1 (en)  High speed digital signal processor  
JP3541066B2 (en)  Method and apparatus for performing division and square root calculations in the computer  
US6438570B1 (en)  FPGA implemented bitserial multiplier and infinite impulse response  
US5553012A (en)  Exponentiation circuit utilizing shift means and method of using same  
US4707798A (en)  Method and apparatus for division using interpolation approximation  
US5768170A (en)  Method and apparatus for performing microprocessor integer division operations using floating point hardware  
US6353649B1 (en)  Time interpolating direct digital synthesizer  
US5249149A (en)  Method and apparatus for performining floating point division  
EP0938042A2 (en)  High accuracy estimates of elementary functions  
US6549998B1 (en)  Address generator for interleaving data  
Taylor  Large moduli multipliers for signal processing  
Busaba et al.  The IBM z900 decimal arithmetic unit  
US5631859A (en)  Floating point arithmetic unit having logic for quad precision arithmetic  
EP0566498A2 (en)  Digital signature device and process  
US5737253A (en)  Method and apparatus for direct digital frequency synthesizer  
WO2007012179A2 (en)  Karatsuba based multiplier and method  
Schulte et al.  Symmetric bipartite tables for accurate function approximation  
GB2278940A (en)  Floating point arithmetic unit  
US5563818A (en)  Method and system for performing floatingpoint division using selected approximation values  
US7467174B2 (en)  Processing unit having decimal floatingpoint divider using NewtonRaphson iteration  
US5274707A (en)  Modular exponentiation and reduction device and method  
US8024393B2 (en)  Processor with improved accuracy for multiplyadd operations 
Legal Events
Date  Code  Title  Description 

AK  Designated states 
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW 

AL  Designated countries for regional patents 
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG 

121  Ep: the epo has been informed by wipo that ep was designated in this application  
122  Ep: pct application nonentry in european phase  
NENP  Nonentry into the national phase in: 
Ref country code: JP 

WWW  Wipo information: withdrawn in national office 
Country of ref document: JP 