WO2015156660A1

WO2015156660A1 - A system and method for implementing division

Info

Publication number: WO2015156660A1
Application number: PCT/MY2015/050019
Authority: WO
Inventors: Muhamad Khairol AB RANI
Original assignee: Mimos Berhad
Priority date: 2014-04-09
Filing date: 2015-03-31
Publication date: 2015-10-15
Also published as: MY168512A

Abstract

The present invention relates to a system and a method for obtaining a quotient from a division of a dividend by a divisor. The division system includes a decoder (120) and multiplier circuits. The decoder (120) is used in place of the normally required lookup table circuit providing an approximation of a reciprocal of a divisor input. The multiplier circuits receive the approximation and using a modified Taylor series expansion to obtain the division result.

Description

A System and Method for Implementing Division

Field of the Invention

The present invention relates to a system and a method for digitally obtaining a quotient from a division. More particularly, the present invention relates to a circuitry to perform multiplicative division from the input of dividend and divisor and a method thereof.

Background of the invention

Division circuits have been in use for a wide range of applications and therefore various methods have been disclosed for performing division. The most common method is division by repeated subtraction, which requires an adder circuit without any memory block. However, the drawback of this method is that it produces long latency and not being able to subject to pipeline operation.

Another method commonly used for implementing division is the multiplicative method, in which the algorithm often employed is Newton- Raphson method, Gold-Schmidt method and the Taylor series method. However, the conventional multiplicative methods require a lookup table to calculate the division result. Conventionally, a larger lookup table size produces low latency process and a smaller lookup table size takes longer time to complete the calculation. Therefore the latency of the division is very minimal if a lookup table is used, depending on the size of lookup table and the architecture of the circuit. However, the lookup table signifies that Read-Only-Memory or Random-Access-Memory (RAM) is required in the system. This has limited the fabrication process to a memory dependent system. Memory blocks are among the densest integrated circuit that can be fabricated and therefore, have the highest rate of defects. Fabrication yield will be increase if the use of memory blocks is avoided. Further, the use of memory blocks in chip fabrication increases the associated costs, including the intellectual property royalty fee.

United States Patent Publication No. 2010318592 A1 discloses a conventional fixed point division by multiplicative method where a lookup table is required. The size of the lookup table determines the number of iterations, steps or the degree of latency in the multiplicative method. Several multiplier circuits and squaring circuits are used for refining the quotient obtained from the dividend and the divisor.

United States Patent No. 8176111 B1 discloses a floating point division system which requires obtaining a reciprocal of the divisor from a lookup table for performing multiplicative division. Multiply-add operation is executed based on principles of series expansion. The system therefore requires a memory means for storing estimate of the reciprocal of the divisor which likewise contributes to the increase of fabrication cost.

United States Patent No. 8301682 B2 discloses a divider which includes a subtractor for subtracting the divisor from the dividend, and the result is used for revising the dividend and the preliminary answer. The method involves iterations of the subtractions and revision of the dividend and preliminary result until a required precision is required. This method therefore faces the problem of high latency due to the need of performing several iterations until a satisfactory result is obtained.

Therefore, there is a need for providing a division circuit which is amenable to pipeline operation and does not require any means for memory. Further, the division circuit having latency less than 5 clock cycles is desired. Summary of the invention

An objective of the present invention is to remove the limitation of fabricating integrated circuit which does not require memory support.

Another objective of the present invention is to provide a division system and method which does not suffer high latency without the use of lookup table.

A further objective of the present invention is to provide a division circuit to calculate a quotient from the dividend by the divisor in the format of single precision floating number. Still, another objective of the present invention is to provide a division circuit which employs multiplicative method based on a modified Taylor series expansion.

The division circuit of the present invention includes a decoder and multiplier circuits. The decoder is used in place of the normally required lookup table circuit providing an approximation of a reciprocal of a divisor input.

The present invention relates to a system for implementing division from an input of a dividend and a divisor. The system comprises a preprocessing circuit for receiving the dividend and the divisor, and revising the dividend into operand Y, and the divisor into operand X, a subtracter for subtracting the operand Y from the operand X to produce an operand difference, a decoder for decoding operand Y for providing a control signal, an estimate value, and a balance value, and a multiplicative circuit for performing a series of multiplications and additions from the values obtained from the subtractor and the decoder to obtain a quotient of the division. Further, the present invention relates to a method for implementing division from an input of a dividend and a divisor, in the format of single precision floating point. The method comprises revising the divisor into operand X, and the dividend into operand Y via a preprocessing circuit, subtracting the operand Y from the operand X by a subtracter to produce an operand difference, decoding the operand Y to obtain a control signal, an estimate value, and a balance value by a decoder, and performing a series of multiplicative operations by a multiplicative circuit.

In a preferred embodiment, the series of multiplicative operations comprises the steps of multiplying the operand difference with the estimate value by a first multiplier to produce a first multiply value, multiplying the balance value with estimate value by a second multiplier to produce a second multiply value, expanding the second multiply value up to 27 bits and inverting the most significant bits value thereof by a first logic circuit to produce a first logic value. The method follows by multiplying the first multiply value with the first logic value by a third multiplier to produce a third multiply value, wherein the third multiply value is taken from bits 32 to 16, squaring the second multiply value by a first squaring circuit to produce a first square value, wherein the first square value is taken from bit 32 to 16, squaring bits 16 to 5 of the first square value by a second squaring circuit to produce a second square value, wherein the second square value is taken from bits 23 to 16, adding the first square value and the second square value by a first adder to produce a first addition value; and multiplying the first addition value with the third multiply value to produce a fourth multiply value.

In a preferred embodiment, the division circuit is pipeline with four stages that includes four multiplier circuits and two squaring circuits, which enable the circuit to be implemented with throughput every clock cycle. Brief Description of the drawings

Figure 1 is a diagram showing the format of a single precision floating point number according to the Institute of Electrical and Electronics Engineers (IEEE) 754 standard.

Figure 2 is block diagram showing a system for implementing division according the present invention.

Figure 3 is circuit diagram showing a decoder of the system for implementing division according to the present invention.

Detailed description of the invention

The present invention will now be described in more detail with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The present invention relates to a system for implementing division in an integrated circuit. In a preferred embodiment, the division is performed on numbers in the format of single precision floating point. Figure 1 shows the format of the floating point number for single precision based on IEEE 754 standard. The floating point numbers are packed into a computer datum of binary number comprising the sign bit, the exponent field, and the mantissa, from left to right. The single precision floating point number has a total of 32 bits, wherein 1 bit is for the sign bit, 8 bits for the exponent field and 23 bits is for the mantissa. The mantissa is always appended with bit 1 at the most significant bit, and is always regarded as positive value.

The present invention is a system and method for implementing division based on modified Taylor series expansion according to the Equation (8) below. The Equation (8) is derived from the Taylor series expansion of a division which is shown below:

(i)

Y Ya + Yz

(2)

Y Ya Ya \YaJ \YaJ ^J

Modify the equation of Y = Ya + Yz

Ya = A + B + C and Yz = Z (3) Substitute Equation (3) into (2):

X _ X (4)

Y ~ A+B+C ^ A+B+C ^~*^~ G+B+c) ^" ^

Assume £ is a constant for a fixed value of A, B, and C, then:

(5)

A + B + C A k = ^{A (6)}

A + B + C

If A, B and C are numbers in the power of 2, consecutive and A > B > C. Value ~_A is easily calculated in digital circuits, by using binary shifter. Insert Equation (6) into Equation (4): (7)

Ϋ = (-^{1 +} (-— + {—) + {—) + -)

Further expansion of the Equation (7) up to the power of 8 can be simplified as shown in Equation (8):

Equation (8) can be extended to obtain higher precision number of division. For single precision, this equation is sufficient to obtain an accurate result without the need of using lookup table. The value of A, k and Z is generated by using a decoder circuit.

Figure 3 is block diagram showing the implementation of the division circuit according to the present invention. The present invention relates to a system for implementing division from an input of a dividend and a divisor. In a preferred embodiment, the division system comprises a preprocessing circuit and a multiplicative circuit carried out in four pipeline steps. In a preferred embodiment, the system processes the quotient of the division such that the process for obtaining the quotient thereof is by way of pipelining the system in support of multiplicative processes.

Further, the preprocessing circuit of the division system according to the present invention receives the dividend and the divisor, wherein the dividend and the divisor are provided in the format of single precision floating point and the mantissa of the floating point number is extracted for further calculation. The preprocessing circuit revises the dividend into operand Y (102), and the divisor into operand X (101 ). In a preferred embodiment, the preprocessing circuit attaches bit 1 to the most significant bit of the mantissa of the divisor, which forms the operand X (101 ). Likewise, the preprocessing circuit attaches bit 1 to the most significant bit of the mantissa of the dividend, forming the operand Y (102). In a preferred embodiment, the bit 1 is attached to the left side of the leftmost bit of the mantissa for the dividend and the divisor. In a preferred embodiment, the preprocessing circuit also compares the operand X (101 ) and operand Y (102), and if operand X (101 ) is determined to be less than operand Y (102), operand X (101) is shifted to the left for 1 bit. The exponent can be adjusted accordingly from the shift of operand X.

A subtractor (110) is provided in the system for subtracting the operand Y (102) from the operand X (101 ) to produce an operand difference (111 ). A decoder (120) in the system decodes operand Y for providing a control signal, A (121 ), an estimate value, k (122), and a balance value, Z (123). In a preferred embodiment, the generated control signal, A (121), estimate value, k (122), and balance value, Z (123) are stored in respective storage spaces.

Subsequently, a multiplicative circuit in the system performs a series of multiplications and additions from the values obtained from the subtractor (110) and the decoder (120) to obtain a quotient of the division. In the multiplicative circuit, a first multiplier (130) is provided for multiplying the operand difference (111 ) with the estimate value, k (122) for producing a first multiply value (131 ). In a preferred embodiment, only 26 bits from the first multiply value is stored in a storage space for further use. The bits of the first multiply value (131 ) which will be stored is dependent on the control signal, A (121 ). Specifically, if the control signal, A (121 ) bit is 1 , bit 47 to 22 of the first multiply value (131 ) will be stored. Otherwise, bit 46 to 21 of the first multiply value (131 ) will be stored.

A second multiplier (140) is provided in the system for multiplying the estimate value, k (122) with the balance value, Z (123), for producing a second multiply value (141 ). In a preferred embodiment, the bits which will be stored are dependent on the control signal, A (121 ). In the event that the control signal, A (121 ) is 1 , bits 43 to 21 of the second multiply value (141) will be stored in a storage space for further use. Otherwise, bits 42 to 20 of the second multiply value (141 ) will be stored into the storage space. The numbers of bits which will be stored is 23 bits. Further, only 18 most significant bits of the second multiply value (141 ) will be stored in the storage space.

A logic circuit (150) is also provided in the multiplicative circuit for converting the second multiply value (141 ) into a first logic value (151 ). In the course of the conversion, the second multiply value (141 ) is expanded up to 27 bits and the added spaces are filled with the most significant bits of the first logic value. The most significant bit value of the generated first logic value (151 ) is then inverted. A third multiplier (160) is connected to the first multiplier (130) and the logic circuit (150) for multiplying the first multiply value (131 ) with the first logic value (151 ) for producing a third multiply value (161 ). In a preferred embodiment, 23 bits from the third multiply value (161 ) will be stored into a storage space. Preferably, bits 50 to 28 of the third multiply value (161 ) which is generated from the third multiplier (160) will be stored.

A first squaring circuit (170) is connected to the second multiplier (140) for squaring the second multiply value (141 ) to produce a first square value (171 ). The first squaring circuit (170) stores bits 32 to 16 of the first square value (171 ) into a storage space.

A second squaring circuit (180) is connected to the first squaring circuit (170) for squaring the first square value (171 ) to produce a second square value (181 ). In a preferred embodiment, the second squaring circuit (180) squares bits 16 to 5 of the first square value (171) and stores 8 bits from the second square value (181 ), namely bits 23 to 16 into a storage space. In a subsequent stage of the multiplicative circuit, a first adder (190) is connected to the first squaring circuit (170) and the second squaring circuit

(180) for adding the first square value (171 ) and the second square value

(181 ) to produce a first addition value. The first addition value is then right shifted for 2 bits.

The first adder (190) is further connected to a fourth multiplier (200) which also receives input from the third multiplier (160). The fourth multiplier (200) multiplies the third multiply value (161 ) with the first addition value for producing a fourth multiply value. Bits 37 to 24 of the fourth multiply value is stored.

A second adder (210) is provided for producing a second addition value by adding the third multiply value (161 ) from the third multiplier (160) with the fourth multiply value from the fourth multiplier (200). The second addition value is appended with bit 1 to form a final result of the mantissa division.

With reference to Figure 3, the decoder (120) is shown to further comprise a second logic circuit (125) for generating the control signal, A (121 ), a third logic circuit (126) for generating the estimated value, k (122) and an intermediate variable (124), and a second subtractor (127) for subtracting 20 least significant bits of the dividend from the intermediate variable (124) for producing the balance value (123). The present invention also relates to a method for implementing division from an input of a dividend and a divisor. The dividend and the divisor are provided in the format of single precision floating point and only the mantissa of the floating point number will be taken for further calculation. In a preprocessing circuit, where the dividend and the divisor are received, the dividend is revised into operand X (101 ) by attaching a bit 1 to the most significant bit and the dividend is revised into operand Y (102) by attaching a bit 1 to the most significant bit for further computation of the divisional operation. In a preferred embodiment, the operand X (101) and the operand Y (102) are compared in term of their values. If operand X (101 ) is found less than the operand Y (102), the operand X will be shifted 1 bit to the left before further actions are carried out.

Subsequently, the operand Y (102) is subtracted from the operand X (101 ) by a subtracter (110) to produce an operand difference (111 ). At the same time, the operand Y (102) is received by the decoder (120) to generate a control signal, A (121 ), an estimate value, k (122), and a balance value, Z (123). The generated values from the decoder (120) are used for the Equation 8 below to obtain the result for the division in a multiplicative manner.

In a preferred embodiment, the estimate values, k which are used in the division circuit according to the present invention are listed in Table 1 below. Table 1. A list of estimated values, k, generated by the decoder for use in the multiplicative division according to the present invention.

Parameter 25-bit value

K128 25'b0100000000000000000000000

K130 25'b0011111100000011111100000

K132 25'b0011111000001111100001000

K136 25'b0011110000111100001111000

K144 25'b0011100011100011100011100

K152 25'b0011010111100101000011011

K156 25'b0011010010000011010010000

K160 25'b0011001100110011001100110

K164 25'b0011000111110011100000110

K168 25'b0011000011000011000011000

K176 25'b0010111010001011101000110 K184 25'b0010110010000101100100001

K191 25'b0010101010101010101010101

K192 25'b0101010101010101010101011

K200 25'b0101000111101011100001010

K208 25'b0100111011000100111011001

K220 25'b0100101001111001000001001

K224 25'b0100100100100100100100101

K240 25'b0100010001000100010001001

K248 25'b0100001000010000100001000

K252 25'b0100000100000100000100001

The method then proceeds to a series of multiplicative and addition operations. The operand difference (111) is first multiplied with the estimate value, k (122) to produce a first multiply value (131 ). In a preferred embodiment, only 26 bits of the first multiply value (131) will be stored and used. The selection of 26 bits from the first multiply value (131 ) is dependent on the control signal, A (121 ) which is also produced from the decoder (120). In the event that the control signal, A is 1 , bits 47 to 22 of the first multiply value (131 ) will be stored. In the event that the control signal, A is 0, bits 46 to 21 of the first multiply value (131 ) will be stored.

At the same time, a second multiplier (140) multiplies the balance value, Z (123) with the estimate value, k (122) to produce a second multiply value (141 ). Likewise, only 23 bits of the second multiply value (141 ) is stored, and the selection of the 23 bits is dependent on the control signal, A (121 ). In the event that the control signal, A (121 ) is 1 , bits 43 down to 21 of the second multiply value (141 ) will be stored. In the event that the control signal, A (121 ) is 1 , bits 42 down to 20 of the second multiply value (141) will be stored.

The second multiply value (141 ) up to 27 bits is expanded and the added space is filled with the most significant bits of the second multiply value (141 ). The most significant bits of the second multiply value (141 ) are inverted by a first logic circuit (150) to produce a first logic value (151). In a subsequent stage of the divisional operation, a third multiplier (160) multiplies the first multiply value (131 ) with the first logic value (151 ) to produce a third multiply value (161 ). In a preferred embodiment, bits 50 down to 28 of the third multiply value (161 ) will be stored for further calculation.

A first squaring circuit (170) acts in parallel with the third multiplier (160) to square the second multiply value (141 ) into a first square value (171 ). From the first square value (171 ), bits 32 to 16 will be stored in a storage space. This is followed by the squaring of the first square value (171 ) by a second squaring circuit (180) to produce a second square value (181 ), wherein bits 23 to 16 will be stored. Then, the first square value (171 ) and the second square value (181 ) are added together by a first adder (190) to produce a first addition value. The first addition value is multiplied with the third multiply value (161 ) to produce a fourth multiply value. The fourth multiply value is added with the third multiply value again by a second adder (210) to produce a result. The final result is produced when bit 1 is appended to the result obtained from addition.

In an example, the procedures for carrying out the division can be executed by the steps as shown in Table 2 below.

Table 2. Steps for carrying out the division according to the present invention

Steps Procedures Number of bit

Pre Divisor = I .Mantissa(X); 24

Dividend = 1 .Mantissa(Y); 24

1 If (Divisor < Dividend) then (Divisor «1 ); 25

X = Divisor - Dividend; 25

[Α,Κ,Ζ] = DECODE(Dividend[22:16]); [Α,Κ,Ζ]

Although the present invention has been described in a specific embodiment as in the above description, it is understood that the above description does not limit the invention to the above given details. It will be apparent to those skilled in the art that various changes and modification may be made therein without departing from the principle of the invention or from the scope of the appended claims.

Claims

1. A system for implementing division from an input of a dividend and a divisor, comprising:

a preprocessing circuit for receiving the dividend and the divisor, and revising the dividend into operand Y (102), and the divisor into operand X (101 );

a subtractor (110) for receiving values of the operand Y (102) and the operand X (101) from the preprocessing circuit, and substracting the operand Y (102) from the operand X (101) to derive an operand difference (111 ) value;

a multiplicative circuit for performing multiplicative divisions based on the values derived from operand X (101 ) and operand Y (102); characterized in that the system further comprisinga decoder (120) for receiving value of the operand Y (102) for decoding the operand Y (102) to obtain a control signal (121 ), an estimate value (122), and a balance value (123) for providing to the multiplicative circuit; the multiplicative circuit performs multiplicative division comprising a series of multiplications and additions from the values obtained from the subtractor (110) and the decoder (120) to obtain a quotient of the division.

2. A system for implementing division according to claim 1 , wherein the system processes the quotient of the division such that the process for obtaining the quotient thereof is by way of pipelining the system in support of multiplicative processes.

3. A system for implementing division according to claim 1 , wherein the multiplicative circuit comprises:

a first multiplier (130) for multiplying the operand difference (111 ) with estimate value (122) to produce a first multiply value (131 ); a second multiplier (140) for multiplying the estimate value (122) with the balance value (123) produced from the decoder (120), to produce a second multiply value (141 );

a logic circuit (150) for converting the second multiply value (141 ) to produce a first logic value (151 );

a third multiplier (160) for multiplying the first multiply value (131 ) with the first logic value (151 ) to produce a third multiply value (161 ) a first squaring circuit (170) for squaring the second multiply value (141 ) to produce a first square value (171 );

a second squaring circuit (180) for squaring the first square value to produce a second square value (181 );

a first adder (190) for adding the first square value (171 ) and the second square value (181 ) to produce a first addition value;

a fourth multiplier (200) for multiplying the third multiply value (161 ) with the first addition value for producing a fourth multiply value; a second adder (210) for adding the third multiply value (161 ) and the fourth multiply value for producing a final result (220).

A system for implementing division according claim 1 , wherein the decoder (120) further comprises:

a second logic circuit for generating the control signal (121 ); a third logic circuit for generating the estimated value (122) and an intermediate variable (124);

a second subtractor (127) for subtracting 20 least significant bits of the dividend from the intermediate variable (124) for producing the balance value (123).

A method for implementing division from an input of a dividend and a divisor, in the format of single precision floating point, comprising :

revising the divisor into operand X (101 ), and the dividend into operand Y (102) via a preprocessing circuit; subtracting the operand Y (102) from the operand X (101 ) by a subtractor (110) to produce an operand difference (111 );

decoding the operand Y (102) to obtain a control signal (121 ), an estimate value (122), and a balance value (123) by a decoder (120); performing a series of multiplicative operations by a multiplicative circuit.

A method for implementing division according to claim 5, wherein the dividend and the divisor, provided in the format of single precision floating point comprising a mantissa, are converted into operand X (101 ) and operand Y (102) by attaching bit 1 to the most significant bit of the mantissa respectively.

A method for implementing division according to claim 5, wherein operand X (101 ) is shifted 1 bit to the left before subtraction if operand X (101 ) is less than operand Y (102).

A method for implementing division according to claim 5, wherein the series of multiplicative operations further comprises:

multiplying the operand difference (111) with the estimate value (122) by a first multiplier (130) to produce a first multiply value (131 ); multiplying the balance value (123) with estimate value (122) by a second multiplier (140) to produce a second multiply value (141 );

expanding the second multiply value (141) up to 27 bits and inverting the most significant bits value thereof by a first logic circuit

(150) to produce a first logic value (151 );

multiplying the first multiply value (131) with the first logic value

(151 ) by a third multiplier (160) to produce a third multiply value (161 ), wherein the third multiply value (161 ) is taken from bits 50 to 28;

squaring the second multiply value (141 ) by a first squaring circuit (170) to produce a first square value (171 ), wherein the first square value (171 ) is taken from bit 32 to 16; squaring bits 16 to 5 of the first square value (171 ) by a second squaring circuit (180) to produce a second square value (181 ), wherein the second square value (181 ) is taken from bits 23 to 16;

adding the first square value (171 ) and the second square value (181 ) by a first adder (190) to produce a first addition value; and

multiplying the first addition value with the third multiply value (161 ) to produce a fourth multiply value;

adding the fourth multiply value with the third multiply value (161 ) to produce a second adder value.

9. A method for implementing division according to claim 8, wherein the first multiply value (131 ) is taken from bits 47 to 22 if the control signal (121 ) is 1 or bits 46 to 21 if the control signal (121 ) is 0. 10. A method for implementing division according to claim 8, wherein the second multiply value (141 ) is taken from bits 43 to 21 if the control signal (121 ) is 1 , or bits 42 to 20 if the control signal (122) is 0.