Background technology
Floating-point division is applied in the middle of the processor design more and more, becomes the pith that signal is processed.Although divider does not have the frequency of utilization of adder-subtractor high, the performance of divider has directly affected the effect that signal is processed, so accelerate the arithmetic speed of divider and reduce hardware resource to become a key issue.
The modern commerce processor mostly adopts parallel multiplier to improve the speed of division, utilize the iteration of multiplication to realize division arithmetic, and required precision is higher, and the number of times of iteration is more.Now the main iterative division based on multiplication has the Goldschmidt algorithm that Newton-Raphson algorithm that M.J.Flynn proposes and R.E.Goldschmidt propose in Applications of division by convergence one literary composition in On division by functional iteration one literary composition.
The Newton-Raphson algorithm is converted into division the inverse of asking divisor, then obtains the business by multiplying each other with dividend, and then the Goldschmidt algorithm directly tries to achieve the business by normalization divisor and dividend.These two each iteration of algorithm all relate to twice multiply operation, but this twice multiply operation can not parallel work-flow in the Newton-Raphson algorithm, can only carry out serial operation, and these two multiplication can parallel processing in the Goldschmidt algorithm, so adopt the processing speed of Goldschmidt algorithm can be than comparatively fast.
The Goldschmidt algorithm when carrying out division arithmetic, Q=N/D, wherein N is dividend, D is divisor, Q is the business, by constructing a group factor F
1, F
2..., F
M, make molecule and denominator multiply each other with this group factor respectively, when denominator levels off to 1 the time, molecule is desired result of division.Molecule and denominator and this group factor multiply each other and realize by M iteration.
The iterative formula of traditional Goldschmidt division arithmetic can be write as following form:
Wherein, i=1,2 ... M-1, M represents iterations, N
iAnd D
iRepresent respectively dividend and the divisor after iteration the i time, F
iRepresent the factor.Here N
0And D
0Be dividend and the divisor that initially will calculate, and F
0Be that inverse by divisor estimates, but the precision of estimating is limited, the error of establishing initial estimate is s, F
0=(1-s)/D can know that through deriving the error of calculation of traditional Goldschmidt division algorithm is
Can be found out by upper surface analysis, the error of traditional Goldschmidt algorithm iteration result depends on the evaluated error s of initial value, and namely the height of initial value estimated accuracy has determined the size of iteration resultant error.
For this problem, D.DasSarma and D.W.Matula be at Faithful Bipartite ROM Reciprocal Tables one literary composition, and M.J.Schulte and J.E.Stine have proposed multiple loop up table and reduce the initial estimate error to improve the precision of iteration result in Symmetric Bipartite Tables for Accurate Function Approximation one literary composition.
The principle of this method is the initial value of storage 1/D in ROM, in order to improve the precision of computing, just needs to increase the figure place of look-up table, and establishing the look-up table figure place is n, and the resource of look-up table is n2
n, namely needed resource will increase with exponential manner along with the figure place of look-up table.
In sum, in the situation that accuracy requirement is higher, the storage resources of look-up table will sharply increase along with figure place, in order to improve the performance of divider, reduce storage resources and will become a problem demanding prompt solution.
Summary of the invention
In view of this, the present invention proposes a kind of improved Goldschmidt division implementation method in order to solve the large problem of ROM storage resources, and this division algorithm does not need initial estimate, thereby can save a large amount of storage resources.
In order to solve the problems of the technologies described above, the present invention is achieved in that
A kind of Goldschmidt division implementation method based on the divisor mapping, the method comprises the steps:
Step 1, the dividend N of relocatable
fWith divisor D
fBe normalized to f * 2
eForm, wherein, f be magnitude portion and f ∈ [1,2), e is exponential part; Dividend after normalization and divisor are designated as N and D; Floating data be divided by be converted into magnitude portion be divided by and exponential part subtract each other;
Step 2, when magnitude portion is divided by, according to given minimum relative error E and in the situation that time delay meets the demands the value of the minimum iterations M of resource requirement, employing formula (I) is obtained cut off value p;
Divisor D after step 3, judgement normalization and the size of cut off value p; If the divisor D after normalization drops in [1, p] interval, directly make N
0=N, D
0=D, execution in step 4; If normalized divisor D drop on [p, 2) in the interval, D is mapped to [1, p] interval by multiply by mapping coefficient, simultaneously dividend N also be multiply by described mapping coefficient, with the D after mapping and N assignment to D
0And N
0, and then execution in step 4;
Step 4, through type (IV) carry out the division arithmetic result that M iteration obtains magnitude portion;
Wherein, i=0,2 ..., M-1, N
iAnd D
iRepresent respectively dividend and the divisor after iteration the i time;
Step 5, the magnitude portion phase division result that step 4 is obtained and exponential part are subtracted each other and are combined, and obtain final division arithmetic result.
Preferably, in step 3, describedly D is mapped to [1, p] interval by multiply by mapping coefficient, the step that simultaneously dividend N also be multiply by described mapping coefficient is:
Step 31, obtain the value of minimum mapping segments T according to formula (II), and [p, 2) be divided into the T section;
T=ceil(log
p2-1) (II)
Wherein, the ceil representative rounds up;
Step 32, through type (III) are obtained border and the mapping coefficient of the scope of respectively shining upon, and find the mapping scope at divisor D place, extract corresponding mapping coefficient; With divisor D and dividend N all with the correspondence mappings multiplication of extracting;
Range
(j, min)And Range
(j, max)Be respectively lower limit and the upper limit of j mapping scope, c (j) is the mapping coefficient of j mapping scope; J=0,1 ..., T-1.
Preferably, in advance mapping coefficient is carried out the CSD coding, and storage; During with divisor D and dividend N and correspondence mappings multiplication, adopt CSD to encode to be fixed the multiplier multiplication.
Beneficial effect:
(1) this improved Goldschmidt division algorithm of the present invention's proposition, do not need the estimation to the F initial value, can save a large amount of storage resources.
(2) error of this method iteration result is also just relevant with iterations with divisor, makes the operation result error controlled.
(3) this method also utilizes fixed number least resource multiplication algorithm to reduce the resource occupation of divisor mapping algorithm.
(4) this method is easy to the Parallel Implementation of hardware.
Embodiment
Below in conjunction with the accompanying drawing embodiment that develops simultaneously, describe the present invention.
At first the iterative formula of division arithmetic is modified, is write as following form:
Wherein, i=1,2 ... M-1, M represents iterations, N
iAnd D
iRepresent respectively dividend and the divisor after iteration the i time.Here N
0=N represents initial dividend, D
0=D represents initial divisor, and the initial value F of the factor
0Do not need to estimate, can directly be calculated by formula (2), so just can save storing initial estimated value F
0The ROM space.
Bring formula (2) into formula (4), the row iteration of going forward side by side is derived, and can obtain
According to formula (3) and formula (4), can derive D
iAnd N
iIterative relation
Wushu (5) substitution formula (6) can get
After M iteration of process, denominator D
M≈ 1, therefore
Wherein, Q
iBe the business after after the i time iteration.
The definition relative error
Wherein, the result of division relative error under M iteration is
Can find out from formula (10), the relative error of improved division arithmetic result is only relevant with size and the iterations of divisor.
For floating-point system, N and D are represented with scientific notation, namely dividend and divisor all are normalized to f * 2
eForm, for example 2.5 is 1.25 * 2 after being normalized
1, wherein f be magnitude portion and f ∈ [1,2), e is exponential part, the division arithmetic of floating data is that mantissa is divided by, index subtracts each other, so a division arithmetic that only needs here to consider magnitude portion gets final product.The value scope through D and N after normalization be [1,2), in this scope, fixedly the time, relative error function is along with the monotonically increasing function that is increased to of D as iterations M, divisor D is more larger close to 2 relative errors.
Suppose when iterations is fixed as M, make relative error be not more than E for the parameter E of provisioning request, namely
Can release
Divisor maximal value when p satisfies the relative error requirement exactly so, and, p ∈ [1,2).That is to say, only have when divisor D ∈ [l, p) time, could satisfy the requirement of relative error.
Therefore [1, requirement p) is in order to allow the divisor in all scopes can both satisfy error requirements but not all divisor D can satisfy D ∈, for the data of D ∈ (p, 2), take it is mapped to [1, p] method in scope satisfies accuracy requirement, as shown in Figure 1.
As p〉3/2 the time, only need a unified mapping coefficient can be mapped to the number in (p, 2) scope in [1, p] interval.For once mapping, when D ∈ (p, 2), define a mapping coefficient γ ∈ (0.5,1), D '=D * γ ∈ [1, p] is arranged, D ' is the D after shining upon.
Respectively as dividend and divisor after mapping, divisor is mapped in [1, p] scope with N * γ and D * γ.At this moment, the relative error after M iteration of process will guarantee satisfy accuracy requirement by formula (11).
But when p<3/2, aforementioned unified mapping coefficient can not be mapped to [1 with the number in all (3/2,2) scopes, p] in the interval, therefore need (3/2,2) is divided into the T section, set a mapping coefficient c, choose suitable mapping coefficient according to the place section of D and shine upon for every section.Suppose (p, 2) are divided into the T section, wherein adjacent two sections can have common factor, and the mapping scope of note j section is (Range
(j, min), Range
(j, max)], j=0,1 ..., T-1, the mapping scope of T-1 section is (Range
(T-1, min), 2), Range represents the border of the scope of shining upon, and following characteristic is arranged
Note C={c (0), c (1) ..., c (T-1) } be the mapping coefficient of each section mapping scope, through following relation is arranged after mapping
When all mapping scopes were all nonoverlapping, needed mapping segments T was minimum, therefore order
Formula (13) can turn to
In conjunction with formula (15) and (16), can be in the hope of each mapping scope and mapping coefficient, as follows
Can be obtained by foregoing formula (13) again,
Have
log
p2-1≤T≤log
p2 (19)
Therefore, the value of mapping segments T is determined by (19).Consider that T is integer, in order to choose minimum T, get here simultaneously
T=ceil(log
p2-1) (20)
Wherein the ceil representative rounds up.
Derivation by above-mentioned divisor algorithm can be seen, under M time fixing iterated conditional, as long as before iteration, divisor and dividend multiply by different mapping coefficients in different scopes, just can make the operation result of all scope divisors all reach accuracy requirement by a mapping operations.
For the improvement Goldschmidt division algorithm that will carry out M iteration, according to formula (2,3,4) as can be known, each iteration needs 2M multiplication and M add operation (thinking that here the resource that plus-minus method takies is identical), and in hardware is realized the floating-point operation system, particularly high precision computation, with respect to multiplier, totalizer can be ignored, and therefore only considers 2M multiplier here.
Equally also need to consider in the divisor mapping algorithm multiplication to the mapping coefficient of each mapping scope, because molecule and denominator all need to multiply each other with mapping coefficient, therefore need twice multiplication.But consider in this situation it is multiplication to fixed coefficient, by mapping coefficient being carried out CSD(Canonical Signed-Digit Code, the code symbol code) coding can reduce multiplication hardware resource occupation amount, definition β (β ∈ [0,1]) for fixing multiplier multiplication, general multiplication is taken the resource ratio, represent one fixedly the multiplier multiplication unit to take hardware resource be β times that a general multiplication unit takies resource.
Define this algorithm under M iteration, relative error during less than E needed multiplier number be R (M),
R(M)=2M+2β (21)
When adopting CSD to encode to be fixed the multiplier multiplication, the mean value of β is 0.3.Macleod proposes a kind of fixedly multiplier least resource multiplication algorithm in Use of minimum-adder multiplier blocks in FIR digital filters one literary composition, reduce by 25% resource than CSD multiplication, and this moment, the mean value of β was 0.25.
Can be found out by (21) formula, when given accuracy requires, can be in the hope of function R(M) minimum value, that is to say after the assigned error precision, the algorithm that the present invention derives can be in the hope of satisfying the needed minimum iterations of this error precision, and take needed all parameters in the situation of resource minimum.
Based on above-mentioned derivation, the core concept that the present invention improves the Goldschmidt division arithmetic is at first, the dividend of relocatable and divisor to be normalized to f * 2
eForm, the dividend after normalization and divisor are designated as N and D; Obtain cut off value p, if D drops in [1, p] interval, directly utilize formula (2)~(4) to carry out iteration M time; If D drop on [p, 2) in the interval, D is mapped to [1, p] interval by multiply by mapping coefficient, simultaneously dividend N also be multiply by described mapping coefficient, then carries out iteration M time.M iteration obtains the phase division result of f part, at last with f phase division result and 2 partly
eSubtracting each other of part combines, and obtains final division arithmetic result.
Referring to Fig. 2, its specific implementation process is as follows:
Step 1, the dividend N of relocatable
fWith divisor D
fBe normalized to f * 2
eForm, floating data is divided by and is converted into that magnitude portion is divided by and exponential part is subtracted each other, the span that needs to carry out the dividend N of division arithmetic and divisor D be [1,2).
Step 2, given minimum relative error E and in the situation that time delay meets the demands the value of the minimum iterations M of resource requirement.
Step 3, obtain cut off value p according to formula (12), interval [1,2) be divided into [1, p] and [p, 2), then judge the size of normalized divisor D and cut off value p.If the divisor after normalization drops in [1, p] interval, make N
0=N, D
0=D jumps to step 6; If normalized divisor drop on [p, 2) in the interval, order is carried out downwards.
Step 4, obtain the value of minimum mapping segments T according to formula (20), and [p, 2) be divided into the T section.
Step 5, through type (17) are obtained border and the mapping coefficient of the scope of respectively shining upon, and find the mapping scope at divisor D place, extract corresponding mapping coefficient; With divisor D and dividend N all with the correspondence mappings multiplication of extracting.With the mapping after divisor D and the N assignment to D
0And N
0, change step 6 over to.
Step 6, through type (2)~(4) are carried out M iteration and are obtained the division arithmetic result.
Step 7, the magnitude portion phase division result that step 6 is obtained and exponential part are subtracted each other and are combined, and obtain final division arithmetic result.
So far, this flow process finishes.
It is not overlapping that above flow process is based on all mapping scopes, the situation of formula (16) namely, and this moment, needed mapping segments T was minimum.In practice, also can not limit so strictly, only need to satisfy formula (13) and get final product, mapping coefficient is chosen the numerical value that satisfies formula (14) and is got final product at this moment.
The iteration that the present invention has derived this improvement algorithm is relative error as a result, and has obtained the relation between required hardware resource and accuracy requirement, makes in satisfying the scope of accuracy requirement, obtains taking the divider structure of resource minimum.
In sum, these are only preferred embodiment of the present invention, is not for limiting protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.