CN103092561A

CN103092561A - Goldschmidt division implementation method based on divisor mapping

Info

Publication number: CN103092561A
Application number: CN201310019685XA
Authority: CN
Inventors: 陈禾; 闫雯; 于文月; 谢宜壮; 曾涛; 龙腾
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2013-01-18
Filing date: 2013-01-18
Publication date: 2013-05-08
Anticipated expiration: 2033-01-18
Also published as: CN103092561B

Abstract

The invention discloses a Goldschmidt division realization method based on divisor mapping. First, the dividend N _f and divisor D _f in floating point form are normalized into the form of f×2 ^e , and the normalized dividend and divisor are recorded as N and D; Calculate the boundary value p according to the given minimum relative error E and the number of iterations M; if the normalized divisor D falls within the [1,p] interval, then directly perform M iterations; if the divisor D falls in [ p, 2) interval, map D to the interval [1, p], and then perform M iterations. During iteration, the initial value F ₀ =2-D ₀ . The division result of the f part is obtained by M iterations, and finally the division result of the f part is combined with the subtraction of the ^2e part to obtain the final division operation result. This method does not require an initial estimate, thus saving a lot of storage resources.

Description

A kind of Goldschmidt division implementation method based on the divisor mapping

Technical field

The invention belongs to the signal process field, relate to a kind of Goldschmidt division implementation method based on the divisor mapping.

Background technology

Floating-point division is applied in the middle of the processor design more and more, becomes the pith that signal is processed.Although divider does not have the frequency of utilization of adder-subtractor high, the performance of divider has directly affected the effect that signal is processed, so accelerate the arithmetic speed of divider and reduce hardware resource to become a key issue.

The modern commerce processor mostly adopts parallel multiplier to improve the speed of division, utilize the iteration of multiplication to realize division arithmetic, and required precision is higher, and the number of times of iteration is more.Now the main iterative division based on multiplication has the Goldschmidt algorithm that Newton-Raphson algorithm that M.J.Flynn proposes and R.E.Goldschmidt propose in Applications of division by convergence one literary composition in On division by functional iteration one literary composition.

The Newton-Raphson algorithm is converted into division the inverse of asking divisor, then obtains the business by multiplying each other with dividend, and then the Goldschmidt algorithm directly tries to achieve the business by normalization divisor and dividend.These two each iteration of algorithm all relate to twice multiply operation, but this twice multiply operation can not parallel work-flow in the Newton-Raphson algorithm, can only carry out serial operation, and these two multiplication can parallel processing in the Goldschmidt algorithm, so adopt the processing speed of Goldschmidt algorithm can be than comparatively fast.

The Goldschmidt algorithm when carrying out division arithmetic, Q=N/D, wherein N is dividend, D is divisor, Q is the business, by constructing a group factor F ₁, F ₂..., F _M, make molecule and denominator multiply each other with this group factor respectively, when denominator levels off to 1 the time, molecule is desired result of division.Molecule and denominator and this group factor multiply each other and realize by M iteration.

The iterative formula of traditional Goldschmidt division arithmetic can be write as following form:

\{\begin{matrix} F_{i + 1} = 2 - D_{i + 1} \\ N_{i + 1} = N_{i} F_{i} \\ D_{i + 1} = D_{i} F_{i} \end{matrix} - - - (1)

Wherein, i=1,2 ... M-1, M represents iterations, N _iAnd D _iRepresent respectively dividend and the divisor after iteration the i time, F _iRepresent the factor.Here N ₀And D ₀Be dividend and the divisor that initially will calculate, and F ₀Be that inverse by divisor estimates, but the precision of estimating is limited, the error of establishing initial estimate is s, F ₀=(1-s)/D can know that through deriving the error of calculation of traditional Goldschmidt division algorithm is

Can be found out by upper surface analysis, the error of traditional Goldschmidt algorithm iteration result depends on the evaluated error s of initial value, and namely the height of initial value estimated accuracy has determined the size of iteration resultant error.

For this problem, D.DasSarma and D.W.Matula be at Faithful Bipartite ROM Reciprocal Tables one literary composition, and M.J.Schulte and J.E.Stine have proposed multiple loop up table and reduce the initial estimate error to improve the precision of iteration result in Symmetric Bipartite Tables for Accurate Function Approximation one literary composition.

The principle of this method is the initial value of storage 1/D in ROM, in order to improve the precision of computing, just needs to increase the figure place of look-up table, and establishing the look-up table figure place is n, and the resource of look-up table is n2 ⁿ, namely needed resource will increase with exponential manner along with the figure place of look-up table.

In sum, in the situation that accuracy requirement is higher, the storage resources of look-up table will sharply increase along with figure place, in order to improve the performance of divider, reduce storage resources and will become a problem demanding prompt solution.

Summary of the invention

In view of this, the present invention proposes a kind of improved Goldschmidt division implementation method in order to solve the large problem of ROM storage resources, and this division algorithm does not need initial estimate, thereby can save a large amount of storage resources.

In order to solve the problems of the technologies described above, the present invention is achieved in that

A kind of Goldschmidt division implementation method based on the divisor mapping, the method comprises the steps:

Step 1, the dividend N of relocatable _fWith divisor D _fBe normalized to f * 2 ^eForm, wherein, f be magnitude portion and f ∈ [1,2), e is exponential part; Dividend after normalization and divisor are designated as N and D; Floating data be divided by be converted into magnitude portion be divided by and exponential part subtract each other;

Step 2, when magnitude portion is divided by, according to given minimum relative error E and in the situation that time delay meets the demands the value of the minimum iterations M of resource requirement, employing formula (I) is obtained cut off value p;

p = \exp (\frac{\ln E}{2^{M}}) + 1 - - - (I)

Divisor D after step 3, judgement normalization and the size of cut off value p; If the divisor D after normalization drops in [1, p] interval, directly make N ₀=N, D ₀=D, execution in step 4; If normalized divisor D drop on [p, 2) in the interval, D is mapped to [1, p] interval by multiply by mapping coefficient, simultaneously dividend N also be multiply by described mapping coefficient, with the D after mapping and N assignment to D ₀And N ₀, and then execution in step 4;

Step 4, through type (IV) carry out the division arithmetic result that M iteration obtains magnitude portion;

\{\begin{matrix} F_{i} = 2 - D_{i} \\ N_{i + 1} = N_{i} F_{i} \\ D_{i + 1} = D_{i} F_{i} \end{matrix} - - - (IV)

Wherein, i=0,2 ..., M-1, N _iAnd D _iRepresent respectively dividend and the divisor after iteration the i time;

Step 5, the magnitude portion phase division result that step 4 is obtained and exponential part are subtracted each other and are combined, and obtain final division arithmetic result.

Preferably, in step 3, describedly D is mapped to [1, p] interval by multiply by mapping coefficient, the step that simultaneously dividend N also be multiply by described mapping coefficient is:

Step 31, obtain the value of minimum mapping segments T according to formula (II), and [p, 2) be divided into the T section;

T＝ceil(log _p2-1) （II）

Wherein, the ceil representative rounds up;

Step 32, through type (III) are obtained border and the mapping coefficient of the scope of respectively shining upon, and find the mapping scope at divisor D place, extract corresponding mapping coefficient; With divisor D and dividend N all with the correspondence mappings multiplication of extracting;

\{\begin{matrix} {Range}_{(j, \max)} = \frac{2}{p^{(T - 1 - j)}} \\ {Range}_{(j, \min)} = \frac{2}{p^{T - j}} \\ c (j) = \frac{p^{T - j}}{2} \end{matrix} - - - (III)

Range _{(j, min)}And Range _{(j, max)}Be respectively lower limit and the upper limit of j mapping scope, c (j) is the mapping coefficient of j mapping scope; J=0,1 ..., T-1.

Preferably, in advance mapping coefficient is carried out the CSD coding, and storage; During with divisor D and dividend N and correspondence mappings multiplication, adopt CSD to encode to be fixed the multiplier multiplication.

Beneficial effect:

(1) this improved Goldschmidt division algorithm of the present invention's proposition, do not need the estimation to the F initial value, can save a large amount of storage resources.

(2) error of this method iteration result is also just relevant with iterations with divisor, makes the operation result error controlled.

(3) this method also utilizes fixed number least resource multiplication algorithm to reduce the resource occupation of divisor mapping algorithm.

(4) this method is easy to the Parallel Implementation of hardware.

Description of drawings

Fig. 1 (a) and Fig. 1 (b) are divisor mapping process schematic diagram.

Fig. 2 is for improving Goldschmidt algorithm realization flow figure.

Embodiment

Below in conjunction with the accompanying drawing embodiment that develops simultaneously, describe the present invention.

At first the iterative formula of division arithmetic is modified, is write as following form:

\{\begin{matrix} F_{i} = 2 - D_{i} & (2) \\ N_{i + 1} = N_{i} F_{i} & (3) \\ D_{i + 1} = D_{i} F_{i} & (4) \end{matrix}

Wherein, i=1,2 ... M-1, M represents iterations, N _iAnd D _iRepresent respectively dividend and the divisor after iteration the i time.Here N ₀=N represents initial dividend, D ₀=D represents initial divisor, and the initial value F of the factor ₀Do not need to estimate, can directly be calculated by formula (2), so just can save storing initial estimated value F ₀The ROM space.

Bring formula (2) into formula (4), the row iteration of going forward side by side is derived, and can obtain

\begin{matrix} D_{i + 1} - 1 = D_{i} (2 - D_{i}) - 1 \\ = - {(D_{i} - 1)}^{2} \\ = - {(D - 1)}^{2^{i + 1}} \end{matrix} - - - (5)

According to formula (3) and formula (4), can derive D _iAnd N _iIterative relation

N_{i + 1} = \frac{N}{D} D_{i + 1} - - - (6)

Wushu (5) substitution formula (6) can get

N_{i + 1} = \frac{N}{D} [1 - {(D - 1)}^{2^{i + 1}}] - - - (7)

After M iteration of process, denominator D _M≈ 1, therefore

Q_{M} = \frac{N_{M}}{D_{M}} \approx D_{M} = \frac{N}{D} [1 - {(D - 1)}^{2^{M}}] - - - (8)

Wherein, Q _iBe the business after after the i time iteration.

The definition relative error

ϵ_{Q_{i}} = abs (\frac{Q_{i} - (N / D)}{N / D}) - - - (9)

Wherein, the result of division relative error under M iteration is

ϵ_{Q_{M}} = {(D - 1)}^{2^{M}} - - - (10)

Can find out from formula (10), the relative error of improved division arithmetic result is only relevant with size and the iterations of divisor.

For floating-point system, N and D are represented with scientific notation, namely dividend and divisor all are normalized to f * 2 ^eForm, for example 2.5 is 1.25 * 2 after being normalized ¹, wherein f be magnitude portion and f ∈ [1,2), e is exponential part, the division arithmetic of floating data is that mantissa is divided by, index subtracts each other, so a division arithmetic that only needs here to consider magnitude portion gets final product.The value scope through D and N after normalization be [1,2), in this scope, fixedly the time, relative error function is along with the monotonically increasing function that is increased to of D as iterations M, divisor D is more larger close to 2 relative errors.

Suppose when iterations is fixed as M, make relative error be not more than E for the parameter E of provisioning request, namely

{(D - 1)}^{2^{M}} \leq E - - - (11)

Can release

D \leq \exp (\frac{\ln E}{2^{M}}) + 1 = p - - - (12)

Divisor maximal value when p satisfies the relative error requirement exactly so, and, p ∈ [1,2).That is to say, only have when divisor D ∈ [l, p) time, could satisfy the requirement of relative error.

Therefore [1, requirement p) is in order to allow the divisor in all scopes can both satisfy error requirements but not all divisor D can satisfy D ∈, for the data of D ∈ (p, 2), take it is mapped to [1, p] method in scope satisfies accuracy requirement, as shown in Figure 1.

As p〉3/2 the time, only need a unified mapping coefficient can be mapped to the number in (p, 2) scope in [1, p] interval.For once mapping, when D ∈ (p, 2), define a mapping coefficient γ ∈ (0.5,1), D '=D * γ ∈ [1, p] is arranged, D ' is the D after shining upon. Respectively as dividend and divisor after mapping, divisor is mapped in [1, p] scope with N * γ and D * γ.At this moment, the relative error after M iteration of process will guarantee satisfy accuracy requirement by formula (11).

But when p＜3/2, aforementioned unified mapping coefficient can not be mapped to [1 with the number in all (3/2,2) scopes, p] in the interval, therefore need (3/2,2) is divided into the T section, set a mapping coefficient c, choose suitable mapping coefficient according to the place section of D and shine upon for every section.Suppose (p, 2) are divided into the T section, wherein adjacent two sections can have common factor, and the mapping scope of note j section is (Range _{(j, min)}, Range _{(j, max)}], j=0,1 ..., T-1, the mapping scope of T-1 section is (Range _{(T-1, min)}, 2), Range represents the border of the scope of shining upon, and following characteristic is arranged

\{\begin{matrix} 1 \leq {Range}_{(0, \min)} \leq p \\ {Range}_{(j - 1, \max)} &GreaterEqual; {Range}_{(j, \min)} \\ {Range}_{(T - 1, \max)} = 2 \end{matrix} - - - (13)

Note C={c (0), c (1) ..., c (T-1) } be the mapping coefficient of each section mapping scope, through following relation is arranged after mapping

\{\begin{matrix} c (j) {Range}_{(j, \max)} \leq p \\ c (j) {Range}_{(j, \min)} &GreaterEqual; 1 \end{matrix} - - - (14)

When all mapping scopes were all nonoverlapping, needed mapping segments T was minimum, therefore order

\{\begin{matrix} c (j) {Range}_{(j, \max)} = p \\ c (j) {Range}_{(j, \min)} = 1 \end{matrix} - - - (15)

Formula (13) can turn to

\{\begin{matrix} {Range}_{(0, \min)} = p \\ {Range}_{(j - 1, \max)} = {Range}_{(j, \min)} \\ {Range}_{(T - 1, \max)} = 2 \end{matrix} - - - (16)

In conjunction with formula (15) and (16), can be in the hope of each mapping scope and mapping coefficient, as follows

\{\begin{matrix} {Range}_{(j, \max)} = \frac{2}{p^{(T - 1 - j)}} \\ {Range}_{(j, \min)} = \frac{2}{p^{T - j}} \\ c (j) = \frac{p^{T - j}}{2} \end{matrix} - - - (17)

Can be obtained by foregoing formula (13) again,

1 \leq {Range}_{(0, \min)} = \frac{2}{p^{T}} \leq p - - - (18)

Have

log _p2-1≤T≤log _p2 （19）

Therefore, the value of mapping segments T is determined by (19).Consider that T is integer, in order to choose minimum T, get here simultaneously

T＝ceil(log _p2-1) （20）

Wherein the ceil representative rounds up.

Derivation by above-mentioned divisor algorithm can be seen, under M time fixing iterated conditional, as long as before iteration, divisor and dividend multiply by different mapping coefficients in different scopes, just can make the operation result of all scope divisors all reach accuracy requirement by a mapping operations.

For the improvement Goldschmidt division algorithm that will carry out M iteration, according to formula (2,3,4) as can be known, each iteration needs 2M multiplication and M add operation (thinking that here the resource that plus-minus method takies is identical), and in hardware is realized the floating-point operation system, particularly high precision computation, with respect to multiplier, totalizer can be ignored, and therefore only considers 2M multiplier here.

Equally also need to consider in the divisor mapping algorithm multiplication to the mapping coefficient of each mapping scope, because molecule and denominator all need to multiply each other with mapping coefficient, therefore need twice multiplication.But consider in this situation it is multiplication to fixed coefficient, by mapping coefficient being carried out CSD(Canonical Signed-Digit Code, the code symbol code) coding can reduce multiplication hardware resource occupation amount, definition β (β ∈ [0,1]) for fixing multiplier multiplication, general multiplication is taken the resource ratio, represent one fixedly the multiplier multiplication unit to take hardware resource be β times that a general multiplication unit takies resource.

Define this algorithm under M iteration, relative error during less than E needed multiplier number be R (M),

R(M)＝2M+2β （21）

When adopting CSD to encode to be fixed the multiplier multiplication, the mean value of β is 0.3.Macleod proposes a kind of fixedly multiplier least resource multiplication algorithm in Use of minimum-adder multiplier blocks in FIR digital filters one literary composition, reduce by 25% resource than CSD multiplication, and this moment, the mean value of β was 0.25.

Can be found out by (21) formula, when given accuracy requires, can be in the hope of function R(M) minimum value, that is to say after the assigned error precision, the algorithm that the present invention derives can be in the hope of satisfying the needed minimum iterations of this error precision, and take needed all parameters in the situation of resource minimum.

Based on above-mentioned derivation, the core concept that the present invention improves the Goldschmidt division arithmetic is at first, the dividend of relocatable and divisor to be normalized to f * 2 ^eForm, the dividend after normalization and divisor are designated as N and D; Obtain cut off value p, if D drops in [1, p] interval, directly utilize formula (2)～(4) to carry out iteration M time; If D drop on [p, 2) in the interval, D is mapped to [1, p] interval by multiply by mapping coefficient, simultaneously dividend N also be multiply by described mapping coefficient, then carries out iteration M time.M iteration obtains the phase division result of f part, at last with f phase division result and 2 partly ^eSubtracting each other of part combines, and obtains final division arithmetic result.

Referring to Fig. 2, its specific implementation process is as follows:

Step 1, the dividend N of relocatable _fWith divisor D _fBe normalized to f * 2 ^eForm, floating data is divided by and is converted into that magnitude portion is divided by and exponential part is subtracted each other, the span that needs to carry out the dividend N of division arithmetic and divisor D be [1,2).

\begin{matrix} N_{f} = N \times 2^{e 1} \\ D_{f} = D \times 2^{e 2} \end{matrix}

Step 2, given minimum relative error E and in the situation that time delay meets the demands the value of the minimum iterations M of resource requirement.

Step 3, obtain cut off value p according to formula (12), interval [1,2) be divided into [1, p] and [p, 2), then judge the size of normalized divisor D and cut off value p.If the divisor after normalization drops in [1, p] interval, make N ₀=N, D ₀=D jumps to step 6; If normalized divisor drop on [p, 2) in the interval, order is carried out downwards.

Step 4, obtain the value of minimum mapping segments T according to formula (20), and [p, 2) be divided into the T section.

Step 5, through type (17) are obtained border and the mapping coefficient of the scope of respectively shining upon, and find the mapping scope at divisor D place, extract corresponding mapping coefficient; With divisor D and dividend N all with the correspondence mappings multiplication of extracting.With the mapping after divisor D and the N assignment to D ₀And N ₀, change step 6 over to.

Step 6, through type (2)～(4) are carried out M iteration and are obtained the division arithmetic result.

Step 7, the magnitude portion phase division result that step 6 is obtained and exponential part are subtracted each other and are combined, and obtain final division arithmetic result.

So far, this flow process finishes.

It is not overlapping that above flow process is based on all mapping scopes, the situation of formula (16) namely, and this moment, needed mapping segments T was minimum.In practice, also can not limit so strictly, only need to satisfy formula (13) and get final product, mapping coefficient is chosen the numerical value that satisfies formula (14) and is got final product at this moment.

The iteration that the present invention has derived this improvement algorithm is relative error as a result, and has obtained the relation between required hardware resource and accuracy requirement, makes in satisfying the scope of accuracy requirement, obtains taking the divider structure of resource minimum.

In sum, these are only preferred embodiment of the present invention, is not for limiting protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a Goldschmidt division method realization method based on divisor mapping, is characterized in that, the method comprises the steps:

Step 1. Normalize the dividend N _f and the divisor D _f in floating point form into the form of f×2 ^e , where f is the mantissa part and f∈[1,2), and e is the exponent part; the normalized dividend The sum and divisor are recorded as N and D; the division of floating-point data is transformed into the division of the mantissa and the subtraction of the exponent;

Step 2. When dividing the mantissa part, according to the given minimum relative error E and the value of the number of iterations M with the least resource required when the time delay meets the requirements, use formula (I) to find the cut-off value p;

p p = = exp exp ((\frac{ln ln E E.}{22^{M m}})) + + 11 - - - - - - ((I I))

Step 3. Determine the size of the normalized divisor D and the boundary value p; if the normalized divisor D falls within the interval [1,p], directly set N ₀ =N, D ₀ =D, and execute step 4 ; If the normalized divisor D falls within the [p, 2) interval, D is mapped to the [1, p] interval by multiplying the mapping coefficient, and the dividend N is also multiplied by the mapping coefficient, and the mapped Assign D and N to D ₀ and N ₀ , and then perform step 4;

Step 4, performing M iterations through formula (IV) to obtain the division result of the mantissa part;

\{\begin{matrix} {F f}_{i i} = = 22 - - {D D.}_{i i} \\ {N N}_{i i + + 11} = = {N N}_{i i} {F f}_{i i} \\ {D D.}_{i i + + 11} = = {D D.}_{i i} {F f}_{i i} \end{matrix} - - - - - - ((IV IV))

Among them, i=0,2,...,M-1, N _i and D _i respectively represent the dividend and divisor after the ith iteration;

Step 5. Combining the division result of the mantissa part obtained in step 4 with the subtraction of the exponent part to obtain the final result of the division operation.

2. The method according to claim 1, characterized in that, in step 3, said D is mapped to [1, p] interval by multiplying the mapping coefficient, and the dividend N is also multiplied by the step of the mapping coefficient for:

Step 31, calculate the value of the minimum mapping segment number T according to formula (II), then [p, 2) is divided into T segments;

T=ceil(log _p 2-1) (II)

Among them, ceil represents rounding up;

Step 32. Calculate the boundaries and mapping coefficients of each mapping range through formula (III), find the mapping range where the divisor D is located, and extract the corresponding mapping coefficients; multiply the divisor D and the dividend N by the extracted corresponding mapping coefficients;

\{\begin{matrix} {Range Range}_{((j j,, max max))} = = \frac{22}{{p p}^{((T T - - 11 - - j j))}} \\ {Range Range}_{((j j,, min min))} = = \frac{22}{{p p}^{T T - - j j}} \\ c c ((j j)) = = \frac{{p p}^{T T - - j j}}{22} \end{matrix} - - - - - - ((III III))

Range _{(j, min)} and Range _{(j, max)} are the lower limit and upper limit of the jth mapping range, respectively, and c(j) is the mapping coefficient of the jth mapping range; j=0,1,...,T-1 .

3. The method according to claim 1 or 2, characterized in that, the mapping coefficients are pre-CSD coded and stored; when the divisor D and the dividend N are multiplied with the corresponding mapping coefficients, the CSD coding is used to carry out the fixed multiplier multiplication.