WO1994018632A1

WO1994018632A1 - Low latency function generating apparatus and method

Info

Publication number: WO1994018632A1
Application number: PCT/US1993/001242
Authority: WO
Inventors: Lester Caryl Pickett
Original assignee: Lester Caryl Pickett
Priority date: 1993-02-01
Filing date: 1993-02-01
Publication date: 1994-08-18
Also published as: AU3664293A

Abstract

General purpose digital generation of mathematical and engineering functions in connection with a computer processor (94) enabling computation of piece-wise differentiable functions with particularly short delay as well as at high throughput speed. Applicable piece-wise differentiable functions include portions of exponential logarithmic, hyperbolic and trigonometric function, special purpose engineering functions of concatenations thereof. The invention employs compact look-up tables (98, 99) accessed by look-up techniques and value correction comprising interpolation or extrapolation to determine values not found in the tables. In specific embodiments, specific techniques are employed for functions value selection and value correction to attain result at desired levels of latency, throughput speed and precision.

Description

LOW LATENCY FUNCTION GENERATING

APPARATUS AND METHOD

Reference to Related Applications

Reference is made to U. S. patent application Serial No. 07/623,238 filed January 30, 1991 in the name of the present inventor and entitled "Method and Apparatus for Generating Mathematical Functions," which patent application is a continuation of U. S. patent application Serial No. 07/366,376, now abandoned, filed June 14, 1989 in the name of the present inventor and entitled "Method and Apparatus for Generating Mathematical Functions."

Background of the Invention

A portion of the disclosure of this patent document contains material to which a claim of copyright protection is made. The owner has no objection to the facsimile reproduction of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but reserves all other rights whatsoever.

The present invention is related to numerical processing and more particularly to a computer based apparatus and method involving the use of a new fundamental architecture using look-up tables for particularly efficient generation of general purpose mathematical transformations (i.e., mappings) of signals representing numerical data.

The present invention addresses the generation of (piece- wise) differentiable functions using compact look-up tables which are much lower in cost, but still provide nearly as short total delay (i.e., latency) from input to output as that which is possible when using direct table look-up. Description of the Prior Art

Function Generation by Polynomial Approximation

Polynomial approximation forms, and particularly minimax polynomials, have been used most widely for generating fixed-point functions or floating-point functions of restricted range. However, such polynomial function generators usually require many multiplication operations and, therefore, cannot perform as rapidly as a basic arithmetic operation. Two exemplary references are "Near-Minimax Polynomial Approximations and Partitioning of Intervals," by W. Fraser and J. F. Hart, in the Communications of the A CM, August, 1964, 486-489 and "Floating Point Algorithm Design," in Computer Design, June, 1987, 107-114 by R. L. Cassola.

Function Generation by Direct Table Look-up

Direct table look-up is one technique for rapidly generating logical or mathematical functions. This technique has generally required the least possible execution time when generating fixed-point numerical functions. However, direct table look-up can require massive tables even for comparatively minor levels of precision.

A completely general radix-r m-digit-wide logical or mathematical function of a radix-r n-digit-wide argument requires rⁿ table entries of m digits per entry for a total of mrⁿ stored digits of table. For m = n this is nrⁿ which grows to astronomical proportions long before reaching the levels of precision needed for general purpose computation. This well known extreme growth rate of direct-look-up tables as a function of the n digits of precision is clearly shown for r =2 2 by the following bit-width:byte-size pairs: 8-bits:256-bytes; 16-bits:131,072-bytes; 24-bits:44,040,192-bytes; 32-bits:17,179,869,184-bytes. The steep growth exhibited here is but an absolute upper limit which is always applicable because absolutely all possible combinations of such inputs and outputs for every such particular function can be specified completely in a table of such size, even when no special properties of the function have been identified. Thus there can be no n-digit function of an n-digit argument demanding more than the indicated direct-look-up table size.

A table look-up process is known to be functionally equivalent to a rule-like process implemented with logic gates. An exemplary reference is "Minimal Sets of Distinct Literals for a Logically Passive Function," authored by R. C. De Vries in the Journal of the Association for Computing Machinery, 431-443, July, 1971. For example, when a major portion of a table contains the same value, then that portion of the table may be replaced by comparatively simple logical operations. Hence the possibility remains that for particular functions the table requirements may well be but a tiny fraction of the upper limits given above.

Therefore, when the tables for direct table look-up are too large and expensive, other methods are needed to obtain the desired results. In particular, indirect table look-up techniques may be used, i.e., employing any kind of operation which can be construed to be usefully available together with some form of abbreviated look-up. (Such an operation may possibly include yet other look-ups.) In particular, a correction comprising interpolation between, or extrapolation about, comparatively sparse look-up table entry values permits a drastic reduction in the quantity of required look-up table entries when the generation of fixed-point approximations of differentiable functions is desired.

Attempts have been made to develop table look-up schemes applicable to generating mathematical functions which are not limited by the large function table requirements of direct table look-up techniques. One such technique that is limited to generating particular trigonometric functions is described in U. S. Patent No. 4,077,063 to Lind.

U. S. Patent No. 4,078,250 to Windsor et al. purports to describe a technique applicable to the generation of logarithms and exponentials. The disclosure thereof however is, when analyzed, incorrect and inoperative. While the Singer Company, assignee of the 4,078,250 patent, may well have attempted to make an operative form of such a system, no report thereof was found in a search of the technical literature by the present inventor.

Other function generators are known. Exemplary references of logarithmic and/or exponential function generation are as follows: U. S. Patent No. 3,036,774 to Brinkerhoff, U. S. Patent No. 3,099,742 to Byrne, U. S. Patent No. 3,194,951 to Schaefer, U. S. Patent No. 3,402,285 to Wang, U. S. Patent No. 3,436,533 to Moore et al., U. S. Patent No. 4,046,999 to Katsuoka et al., U. S. Patent No. 4,062,014 to Rothgordt et al., and U. S. Patent No. 4,158,889 to Monden.

Exponential function generation using vectorized floatingpoint multiplications in the interpolation of table look-up values is known. An exemplary reference is "Cyber 205: A Fast Vectorized Exponential Function," in Supercomputer 13, May, 1986.

Generation of other designated functions is known, e.g., an exemplary reference of arctangent function generation is U. S. Patent No. 4,164,022 to Rattlingourd et al.

Summary of the Invention

According to the invention an apparatus and generalized method for generation of differentiable mathematical functions in connection with a computer processor enabling extremely rapid computation of such functions as the logarithmic, exponential, trigonometric and hyper bolic functions, as well as numerous special purpose engineering functions. The mvention employs compact look-up tables and, in order to determine values not found in the tables, uses value correction comprising interpolation or extrapolation.

In specific embodiments, specific techniques are employed for function selection and value correction to attain results at desired levels of speed, precision and economy. Applications include alternatives to conventional hardware multipliers and numerical processors. Individual embodiments may be realized either directly in hardware or in software. Embodiments may be realized as software or hardware function libraries. Coupled multiple embodiments may be realized as general purpose hardware or software numerical processors.

The highest speeds are achieved when the invention is implemented directly in dedicated hardware for general purpose computation as well as for more specialized applications such as signal and image processing requiring particularly high speed. The hardware and its associated operations can, however, be implemented particularly simply. This simplicity permits economical firmware and software implementations of the invention to provide new levels of speed which approach that of existing hardware function generators. Such software and firmware implementations provide qualitatively new levels of combined performance, function, economy, and ease-of-use even on the simpler microprocessors possessing no instruction for direct multiplication.

This invention can be used for numerical computation, mathematical special functions, mathematical function libraries, specialized application functions, logarithmic converters and normalizers, exponential converters and denormalizers, and generally in applications related to fixed-point computation, floating-point computation, real-time sampled- data control, process control, signal processing, signal conditioning, teleme try, engine control, motor control, guidance, navigation, statistical analysis, data reduction, avionics, bionics, nucleonics, radar, sonar, microprocessors, pipelined processors, parallel processors, reduced-instruction-set computers (RISC), and stand-alone processors; and for lower life-cycle costs: more function, greater performance, greater reliability, easier design and design update, less and simpler hardware, less weight, less space, less power, fewer parts, fewer suppliers, fewer delivery schedules, easier built-in-test (BIT) , and more economical repair.

Brief Description of the Drawing

Figure 1 is a block diagram of a general purpose function generator employing compact look-up tables and a correction computation that is incorporated into a direct table look-up operation.

Description of the Specific Embodiments

In order to assist in understanding the invention, the following definitions and terminology are used.

Conventions, Notation, and Terminology

Notation for Number Representations.

Binary Numbers:

1) The decimal subscript 2 may be used to indicate binary numbers. Examples are 1110₂ and 11000₂.

2) Eight (8) digits between a comma and a radix point or another comma indicates a binary number. Parentheses may be used to avoid ambiguity in some contexts. Examples are ( 10,01100001.00100111) and (,00101100. ).

Decimal Numbers: 1 ) Decimal notation is the normal default. With nothing to indicate otherwise, numbers such as 10, 237.8372 and .0011 are to be considered decimal numbers.

2) Three (3) digits between a comma and a radix point or another comma indicates a decimal number.

General Terminology. The term "object" refers to any basic system construct such as a look-up table, function generator or random access memory. (A memory is referred to herein as a random access memory to distinguish it from a slower serial access memory without regard to any read-write or read-only capability.)

The basic objects used in the invention usually perform unsealed (e.g., fixed-point) numerical operations. The set of such objects includes signal detectors and modulators configured as logical gates, data registers, adders, shifters, function look-up tables and function-value correction calculators. However, the expensive fixed-point multiplier component is not required.

Function Generation by Semi-Direct Table Look-up

Introduction. Function generation by direct look-up in a random access memory table of function values is well known. In contrast, the present invention may be referred to as function generation by "semi- direct" look-up in random access memory tables as described herein.

Ultra rapid function generation by direct look-up from tables is known to require extremely large memory sizes except for very low precision. Hence such table-based generation has been regarded as much too expensive and thus virtually useless for general purpose levels of precision. Nevertheless, according to the invention, this table-based generation is used to correct values within a compact look-up table for a net gain in precision while retaining economy and particularly high speed. According to one aspect of the invention, semi-direct table look-up techniques are employed which include at least two look-ups in tables of comparatively sparse entry values and at least one addition to attain results rapidly as compared to direct computation and accurately as compared to direct look-up in tables of dense entry values.

Two or more look-up indices are assembled as arrangements of particular combinations of different "species" of digits, i.e., noncontiguous portions of the digits of the argument input value instead of making all digits of the input value into a single look-up index, as is necessary in direct look-up. An index assembled from two non-contiguous portions of digits is thus a "hybrid" index, while an index assembled from three or more non-contiguous portions of digits may be referred to as a "chimeric" index. Each hybrid or chimeric index thus assembled is used to address its respective particular corresponding hybrid or chimeric look-up table. The values retrieved from all of the tables are then simply summed to provide the output function value.

The relevant partitionings of argument digits and associated general functional relations for constructing the look-up tables may be derived as follows: f(x) ≅ f(a + b + c), |a|_max >> |b|_max >> |c| (1) ≅ f(a + b) + cf'(α + b) + c²f"(a + b) (2) ≅ f(a + b) + cf'(d + e) +

c²f"(a + b) (3) ≅ f(a + b) + c(f'(d) + ef"(d)) +

c²f"(g) (4)

= f(a + b) + cf'(d) + cef"(d) + c²f"(g) (5)

wherein f(x) is a function of restricted range with f'(x) and f"(x), respectively, as first and second derivatives of restricted magnitude; a, b, c, d, e and g represent the values of individual portions of the digits of the radix-r quantity x; and ">> " means "significantly greater than." In particular, the radix r or the digits may be signed. Moreover, hardware implementations may employ multi-valued logic operating on physical signals of more than two levels. For clarity, however, the description and examples herein use quantities represented in terms of unsigned digits and an unsigned radix. The first, second, third and fourth terms of the right hand side of (5) may be referred to, respectively, as the primary function value and the primary, secondary and tertiary corrections to the primary function value.

The foregoing relations (1)-(5), as well as further ones, may be written in the following alternative form which relates more clearly to values specified in radix-r fixed-point form: f(x) = f(u₁ + h ₁( v₁ + h₂w ₁)) (6)

= f((u₁ + h₁v₁) + h₁h₂w₁ ) (7) ≅ f(u₁ + h₁v₁) + h₁h₂w₁f'(u₁ + h₁v₁)

+(h₁h₂)

"(u₁ + h₁v₁) (8)

= f(u₁ + h₁v₁) + h₁h₂w₁f'(u₂ + h₃v₂)

+(h₁ h₂)²

(u₁ + h₁v₁) (9) ≅ f(u₁ + h₁v₁) + h₁h₂w₁f'(u₂ + h₃v₂f"(u₂ ))

+(h₁ h₂) "(u₁ + h₁v₁) (10)

= f(u₁ + h₁v₁) + h₁h₂w₁f'(u₂ + h₁h₂h₃w₁v₂f"(u₂)

+(h₁h₂) (u₁ + h₁v₁) (11)

≅ f(u₁ + h₁v₁) + h₁h₂w₁f'(u₂) + h₁h₂h₃w

f"(u₂*)

+(h₁h₂)

(u₃) (12) ≅ f(u₁ + h₁v₁) + h₁h₂w₁f'(u₂) + h₁h₂h₃w

f"(u*₂),

|h₁h₂| << |h₃| (13) wherein f(x) is a function of restricted range with f'(x) and f"(x ), respectively, as first and second derivatives of restricted magnitude; x, u₁, u₂, u₃, v₁ , v₂, w₁ , and w₂ are values of particular portions of digits that are conveniently identified as unsigned fractions (although such identification is not strictly required), because any product of such fractions remains fractional; |h_i| = r^-ji << 1 for positive integers j₁ , j₂, j₃, m, and n = j₁ + j₂+ m; x is an n-digit radix-r argument partitioned between a dominant portion u₁ + h₁v₁ and an adjunct portion w₁; the dominant portion is further partitioned between a primary dominant portion u₁ and a secondary dominant portion v₁; quantities u₁, v₁ , and w₁ possess, respectively, j₁ , j₂, and m digits. The first, second and third terms of the right hand side of (13) may be referred to, respectively, as the primary function value and the primary and secondary corrections.

The most significant (j₁ +j₂)-digit portion of x, corresponding to u₁ + h₁v₁ , is also partitioned between the two quantities u₂ and v₂ possessing j₃ and j₁ + j₂ - j₃ digits, respectively. The starred quantities represent the most significant portions of their respective unstarred counterparts. Similarly, quantities w₂ and u₃ in the fourth term of equation (12) are merely the most significant portions of w₁ and u₁ + h₁v₁ , respectively.

The value of each term is retrieved from its own look-up table as a function of its independent variables by addressing the look-up table with an index assembled as an arrangement of the relevant portions of the digits of its independent variables. The first two terms, i.e., the primary function value together with the primary correction, typically generate nearly all of the digits of the output. Thus only these two tables and their respective addressing indices are sufficient in many applications. The third term, i.e., the secondary correction, helps provide a few guard digits with an economical look-up table of constant size when maximum precision for a given total table size is desired. Only very few significant digits of the third or fourth terms are needed when the magnitude of f"(u) is less than about two. Therefore, very few of the most significant digits of each independent variable need contribute to the addressing indices for the look-up tables for these latter two terms.

The fourth term may be required to supplement the third term in providing the guard digits unless |h₁h₂| << |h ₃ |. In the better designs, however, there usually is j₁ ≅ j₂≅ j₃≅ m which eliminates the need for the fourth term. Total table costs are minimized when u₁ , v₁ , and w₁ all have similar digit-widths in the nominal case wherein |f'(u) | < 1 and |f"(u ) | < 2.

A General Example. Referring to Figure 1 , there is shown a simple machine for performing certain functions in accordance with the invention. The functions are typically limited to restricted- range, piece- wise differentiable, general functions possessing derivatives of restricted magnitude. A generalized semi-direct look-up table function generator 94 comprises an input data bus 103 coupled to a partitioning means 104. The partitioning means 104 partitions the signal lines of the input data bus 103 between the following three data busses: a most significant (MS) (primary dominant) second bus 95, a next most significant (NS) (secondary dominant) third bus 96, and a next next most significant (or LS) (adjunct) fourth bus 97, respectively, with data bus values u, v, and w, respectively. The second bus 95 is coupled to the MS input digits of a first look-up table 98 (TABLE A) and to a (nominally MS) first set of input digits of a second look-up table 99 (TABLE B). The NS third bus 96 is coupled to the LS input digits of the first look-up table 98 (TABLE A). Finally, the LS fourth bus 97 is coupled to a (nominally LS) second set of input digits of the second look-up table 99 (TABLE B). The output digits of the first look-up table 98 (TABLE A) and the second look-up table 99 (TABLE B) are coupled to the inputs of a dual-input adder 101 as follows: The MS, NS and LS output digits of the first table 98 (TABLE A) are directed to the respective MS, NS and LS input digits of a first port of the adder 101. The output digits of the second look-up table 99 (TABLE B) are the result of a table look-up based on an arrangement of the digits of the MS and LS input data values u and w, respectively, on the second and fourth busses 95 and 97, respectively. The output digits of the second look-up table (TABLE B) are directed to the LS input digits of the second port of the adder 101. The MS and NS sets of digits of the second port of the adder 101 are determined by a fixed value input of an element 100, which is typically set to the constant value zero. The resultant full width of output digits 105 and the carry-out 102 of the adder 101 constitute the output digits of the function generator 94.

If desired, the first look-up table 98 (TABLE A) can be partitioned between two or more separate random access memory look-up tables (say tables A1, A2, and A3, which are not shown) provided that the entire look-up index, i.e., both the second and the third data busses 95 and 96, respectively, are coupled to the input of each such table. This follows from the fact that each digit of output from a look-up table can be a fully independent function of the complete input addressing index.

The adder 101 is diagrammed as a full- width adder for clear illustration of the principle involved. As a practical matter, the speed and economy may be improved by instead making the adder 101 only wide enough to add the LS output digits of the first look-up table 98 (TABLE A) and the output digits from the second look-up table 99 (TABLE B) . Standard techniques suffice for efficiently propagating the carry-out 102 from this addition through the balance of the output digits of the first look-up table 98 (TABLE A) .

An important factor in relation to the present invention is an evaluation of the resultant total table cost. Total table costs decrease as j₁, j₂ and m take on similar values in the foregoing set of equations. This circumstance may be examined further by assigning j≡ j₁ = j₂ = j₃ = m and h≡ h₁ = h₂ = h₃. Then there is f(u + hv + h²w)≅ f(u + hv) + h²wf'(u + hv)≅ f(u + hv) + h²wf'(u) with shift factor h = r^-j << 1 wherein r is the radix of the number representation and j is a positive integer. The simplest such implementation takes the form f(u + hv + h²w) ≅ TABLE(f(u + hv); u + hv)

+h² TABLE( w f'(u ); u + hw) (14) wherein TABLΕ(g(x₁ , x₂, . . .); i(x_i, x₂, . . .)) refers to a look-up-table value providing an approximation of g(x₁, x₂, . . .) as a function of an addressing index i(x₁ , x₂, . . .), which index i(x₁, x₂, . . .) is assembled from relevant portions of the digits of x₁ , x₂, ....

The error e_i in computing the interpolation value for the correction is on the order of h, while the primary look-up TABLE(f(u + hv); u + hv), when used as a function generator, has an error e_p whose magnitude cannot in general be much smaller than the effective error of its argument, which is on the order of h². The h²-term of equation (14) is a correction to the look-up value TABLE(f(u + hv ); u + hv) . This correction may be said to be an interpolation of order "one half" because e_i≈ (e_p)^{1 /2}.

The correction value in such cases need be only about j digits wide. Thus total stored-digit table size for an n-digit function of an n-digit argument increase only about as fast as 4jr^2j = (4/3)nr^2n/3 which leads rapidly to much more economical table sizes than that of direct lookup, even for very modest values of n. Moreover, the most significant two- thirds of TABLE A in Figure 1 corresponds to a function generation by direct table look-up that can be replaced by a function generation by semi- direct look-up. (However, this replacement must be exact, but this can be accomplished, for example, by making the error uniformly small enough that simple truncation provides exact replacement for all values.) Thus about half of the (4/3)nr^2n/3 table size can be replaced by tables with a total size of about (4/3)(2n/3)r^2(2n/3)/3 = (8/9)nr^4n/9. Therefore, the original (4/3)nr^2n/3 size is thereby reduced to about (2/3)nr^2n/3+ (8/9)nr^4n/9 =

(2/3)nr^2n/3(1 + (4/3)r^-(2n/9)). When r = 2 and n = 24, as in the examples in the Tables in the Appendix hereinafter, this reduced total table size is about (16)2¹⁶( 1 + (1/3)2^-(3+1/3))≅ (16)2¹⁶(1 + .033) which is only about 3% greater than half of the original (4/3)nr^2n/3 size. Furthermore, implementation of this reduction demands very little additional execution delay.

Implementation may also take the following somewhat more general form: f(u + hv + h²w) ≅ TABLE(f(u + hv); u + hv) + h²ψr^-λw

+h² TABLE( w(f'(u) - ψr^-λ); u + hw) (15) wherein ψ and λ are signed integer constants of small magnitude chosen in conjunction with f such that 0≤ f'(u) - ψr^-λ < 1. The multiplication r^-λ x w may be efficiently realized with only very few shifts. Similarly, the multiplication ψ x (r^-λw) may be efficiently realized with only very few additions or subtractions to produce a signed multiple of a shifted adjunct portion w. Parameters such as λ and ψ permit a modest adjustment of the allowed range of f'( u) without reducing the digit- width of w and thus sacrificing precision.

Hence, the total execution time costs of scaling w by r^-λ, converting u and w into u + hw, looking up the additional look-up value, and summing the component values is often a good trade for the resulting considerable reduction in table costs.

Guard Digits. One or two low-order guard digits may be used in order to maintain as many digits of precision in the outputs as there are digits in the input. A third look-up table of constant size can help in this regard. The expansion f(u + hv + h²w) ≅ f(u + hv) + h²wf'(u + hv) (16) ≅ f(u + hv) + h²w(f'(u) + hvf"(u)) (17)

= f(u + hv) + h²wf'(u) + h³vwf"(u) (18) identifies the secondary correction term h³vwf"(u) which is desired. Fortunately, only a comparatively trivial approximation of this term is actually required. For example, if |f"(u)| is less than or on the order of 1-2 then only the MS 1-3 digits of each of it, v, and w, i.e., their respective primitive portions u*, v*- and w*, can be arranged into an addressing index for a look-up table providing 1-3 digits of the secondary correction v*w*f"(u*). Thus for a given f"( u) the table size (but not its entries) for this term remains substantially constant as the value of j changes.

The general form becomes f(u + hv + h²w) ≅ TABLE(f(u + hv); u + hv) + h²ψr^-λw +h² TABLE(w(f'(u) - ψr^-λ); u + hw)

+ h³ TABLE(v* w*f"(u*); u* + r^{- i}v* + r^-2iw*) (19) wherein 1 ≤ i ≤ 3 and the table entries are constructed with sufficient digit-width that the magnitude of the error of each term is less than h³r^-i.

Numerical Examples. An appendix hereinafter contains TABLES 1A, 1B, 2 A and 2B which show clearly how the invention can be applied to particularly high speed generation of differentiable functions. These tables contain examples for generating 24-bit values as functions of 24-bit input argument values. The required total look-up table size is about 256 kbytes for each such function generator. There exist numerous applications where the 256 kbytes of random access memory for the lookup tables represents a low cost for an ultra high speed function generator yielding such a general purpose level of precision as 24 significant bits.

TABLE 1A and TABLE IB provide several numerical examples that illustrate generation of a first differentiable function f(x) = log₂(1 + x) which is often targeted for high speed execution. For 0≤ x < 1, this generation proceeds in accordance with equation (19) with h = 2^-8, r = 2, i = 3 and ψ = 0. The first and second derivatives of this function are, respectively, f'(x) = q/(1 + x) and f"(x) = -q/(1 + x)². wherein q = log₂ e≡ 1/ log_e 2≅ 1.442 . . ..

TABLE 2A and TABLE 2B provide further numerical examples that show generation of another differentiable function f(x) = 2^x - 1 which is often also targeted for high speed execution. For 0≤ x < 1, this generation similarly proceeds in accordance with equation (19) with h = 2^-8, r = 2, i = 3 and ψ = 0. The first and second derivatives of this function are, respectively, f'(x) = 2^x/q and f"(x) = 2^x/q².

As may be readily verified from these numerical examples, an individual look-up table may contain discontinuous sequences of values, which make such an individual look-up table correspond to a discontinuous map. However, such discontinuous sequences of look-up values are addressed by precisely corresponding discontinuous sequences of hybrid and chimeric look-up index values such that any discontinuous set of index values is discontinuously, but precisely, mapped to produce a set of function values and correction values which combine to produce continuous values as required for accurately approximating a continuous segment of a piece-wise differentiable function.

For example, in TABLE 1A, numerical examples 1c and 1d show that successive values for the input x lead to drastically different combinations of values for u, v, w, u*, v* and w* and thus drastically different hybrid and chimeric look-up indices assembled therefrom. The corresponding hybrid and chimeric look-up tables successfully map these contrasting look-up indices into correction values which combine to generate accurate successive output values.

Multiple Functions from a Single Generator. This semi-direct look-up function generator can generate differentiable functions which may well be only piece- wise continuous. The most significant digits of the input to such a function generator can thus select in real time which continuous function segment is to be generated. One such segment can, for example be the function log₂(1 + x) for 0 < x < 1, say. Another such segment can be the inverse of this function, 2^x - 1. Segments of numerous other functions are useful when generated at very high speed, particularly including sin xπ/2, cos x π/2, sin 2πx or cos 2 π x, etc., as well as functions associated with particular engineering applications.

Overall Economy. This semi-direct table look-up form of function generation uses only the fixed or narrow variable digit-width economical shifts and the fast operations look-up, addition, and subtraction and, in particular, does not require direct massive left-right variable shifting or multiplication operations.

Change of Radix. The radix of the argument u+hv + h²w need not be the same as the radix of the generated function value. Thus radix conversion may be combined with the function generation at little or no additional table or execution time costs. Examples include the binary- to-BCD (binary-coded decimal) conversion and its inverse. Computer Instructions for Generating Functions.

The foregoing function generators are very efficient when implemented in machine code. However, much of the execution time is still expended in retrieving and decoding the instructions. Thus, the performance can be improved by incorporating the machine code sequence for the function-generating table look-up and value correction into a shorter code sequence employing specialized instructions or even into a single special instruction, thereby considerably increasing the speed of the function generation.

The general purpose function-generating capability is further enhanced by allowing the generation of multiple functions from lookup tables wherein the functions and the look-up table entry values may be specified at execution time.

Claims

The Claims The invention has now been explained with reference to specific embodiments. Other embodiments will be apparent to those of ordinary skill in this art. The invention is therefore not intended to be limited except as indicated by the appended claims. What is claimed is:

1. Apparatus for high speed generation of an output signal which approximates a designated differentiable function of an input argument signal, comprising:

means coupled to receive said input argument signal for partitioning between a most significant dominant portion and a next most significant adjunct portion;

means comprising random access memory coupled to receive said dominant portion for mapping into a primary function signal in accordance with a primary map comprising said designated function of said dominant portion;

means coupled to receive said dominant portion for extracting a most significant primary dominant portion;

means coupled to receive said primary dominant portion and said adjunct portion for arranging into a hybrid signal as a hybrid arrangement;

means coupled to receive said hybrid signal for mapping into a hybrid correction signal in accordance with a hybrid correction map comprising selected hybrid correction function signals of a hybrid correction function for said designated function, said hybrid correction function signal comprising a product of said adjunct portion and a first derivative of said designated function of said primary dominant portion, each said selected hybrid correction function signal corresponding to each possible ones of said hybrid signals as an input argument, to produce a hybrid correction signal as a primary correction signal; and

means coupled to receive said primary correction signal and said primary function signal for adding to produce said output signal.

2. The apparatus as set forth in claim 1 wherein said hybrid correction map comprises a product of said adjunct portion and a difference between said first derivative of said designated function of said primary dominant portion and a bias which is related to a radix of said adjunct portion and wherein said means coupled to receive said primary correction signal and said primary function signal is further coupled to receive an auxiliary correction signal, further comprising:

means coupled to receive said adjunct portion for scaling by said bias to produce said auxiliary correction signal.

3. The apparatus as set forth in claim 1 wherein said designated function is twice differentiable and wherein said means coupled to receive said primary correction signal and said primary function signal is further coupled to receive a chimeric correction signal, further comprising:

means coupled to receive said primary dominant portion for extracting a most significant primitive primary dominant portion;

means coupled to receive said secondary dominant portion for extracting a most significant primitive secondary dominant portion;

means coupled to receive said adjunct portion for extracting a most significant primitive adjunct portion;

means coupled to receive said primitive primary dominant portion, said primitive secondary dominant portion and said primitive adjunct portion for arranging into a chimeric signal as a chimeric arrangement; and

means coupled to receive said chimeric signal for mapping into a chimeric correction signal in accordance with a chimeric correction map comprising selected signals of a chimeric correction function for said designated function, said chimeric correction function comprising a product of said primitive secondary dominant portion, said primitive adjunct portion and a second derivative of said designated function of said primitive primary dominant portion, each said selected chimeric correction function signals corresponding to each possible ones of said chimeric signals as an input argument.

4. The apparatus as set forth in claim 1 wherein said designated function comprises a logarithmic function.

5. The apparatus as set forth in claim 4 wherein said logarithmic function comprises a base-two logarithmic function.

6. The apparatus as set forth in claim 1 wherein said designated function comprises an exponential function.

7. The apparatus as set forth in claim 6 wherein said exponential function comprises a base-two exponential function.

8. The apparatus as set forth in claim 1 wherein said designated function comprises a trigonometric function.

9. The apparatus as set forth in claim 1 wherein said designated function comprises an inverse trigonometric function.

10. The apparatus as set forth in claim 1 wherein said designated function comprises a hyperbolic function.

11. A method for high speed conversion of input data into output data that approximates a designated differentiable function of said input data, including the steps of:

partitioning said input data between a most significant dominant portion and a next most significant adjunct portion;

mapping said dominant portion into primary function data in accordance with a primary map comprising a random access memory containing values of said designated function of said dominant portion;

extracting a most significant primary dominant portion from said dominant portion;

arranging said primary dominant portion and said adjunct portion into a hybrid arrangement;

mapping said hybrid arrangement into hybrid correction data in accordance with a hybrid correction map comprising selected hybrid correction function data of a hybrid correction function for said designated function, said hybrid correction function data comprising a product of said adjunct portion and a first derivative of said designated function of said primary dominant portion, each said selected hybrid correction function data corresponding to each possible ones of said hybrid data as an input argument, to produce hybrid correction data as primary correction data; and

adding said primary correction data and said primary function data to produce said output data.

12. The method as set forth in claim 11 wherein said hybrid correction map comprises a product of said adjunct portion and a difference between said first derivative of said designated function of said primary dominant portion and a bias that is related to a radix of said adjunct portion and wherein said adding step further includes auxiliary data, further comprising the step of:

scaling said adjunct portion by said bias to produce said auxiliary correction data.

13. The method as set forth in claim 11 wherein said designated function is twice differentiable and wherein said adding step further includes chimeric correction data, further comprising the steps of:

extracting a most significant primitive primary dominant portion from said primary dominant portion;

extracting a most significant primitive secondary dominant portion from said secondary dominant portion;

extracting a most significant primitive adjunct portion from said adjunct portion;

arranging said primitive primary dominant portion, said primitive secondary dominant portion and said primitive adjunct portion into chimeric data as a chimeric arrangement; and

mapping said chimeric data into said chimeric correction data in accordance with a chimeric correction map comprising selected data of a chimeric correction function for said designated function, said chimeric correction function data comprising a product of said primitive secondary dominant portion, said primitive adjunct portion and a second derivative of said designated function of said primitive primary dominant portion, each said selected chimeric correction function data corresponding to each possible ones of said chimeric data as an input argument.