GB2530883A

GB2530883A - Implementing a square root operation in a computer system

Info

Publication number: GB2530883A
Application number: GB1513861.3A
Authority: GB
Inventors: Leonard Rarick
Original assignee: Imagination Technologies Ltd
Current assignee: Imagination Technologies Ltd
Priority date: 2014-08-05
Filing date: 2015-08-05
Publication date: 2016-04-06
Anticipated expiration: 2035-08-05
Also published as: GB201513861D0; US9612800B2; GB2530883B; US20160041947A1

Abstract

A method and for implementing a square root operation on a computer system comprises using an iterative converging approximation technique (e.g. Newton-Raphson), to calculate converging approximations of the reciprocal of the square-root; wherein a concluding iteration calculates a first intermediate value, based on a multiplication of the input value with a previous approximation of the reciprocal; calculates a second intermediate value, based on a multiplication of the first intermediate value with a previous approximation of the reciprocal; and performs a multiplication of the two intermediate values to determine the value of the square-root of the input value. The number of iterations may be based on the number of bits of accuracy in the initial approximation of the reciprocal, and a desired number of bits of accuracy for the determined square-root. Preferably, the concluding iteration multiplier logic comprises rounding down the intermediate values, and the value of the square-root. The determined square-root may be checked in accordance with a rounding mode to check it is correct.

Description

IMPLEMENTING A SQUARE ROOT OPERATION IN A COMPUTER SYSTEM

Background

There are many situations in which a computer system needs to perform a square root operation. To give just some examples, numerical analysis, complex number computations, statistical analysis, computer graphics, and signal processing are among the fields where square root operations are often performed by computer systems. There are many different ways in which a computer system may implement a square root operation. For example a square root may be computed in a digit-by-digit manner, such as in restoring, non-restoring, and SRI (named after its creators: Sweeney, Robertson and Tocher) techniques. However, iterative converging approximation methods are often faster in determining the result of a square root operation to a defined number of bits of accuracy.

Examples of iterative converging approximation techniques used are the Newton-Raphson and Goldschmidt techniques, which start with an initial estimation of the square root or its inverse and then iteratively converge on a better solution. Also, the detailed implementation of these iterative techniques can be done with different factorizations of the basic equations. Further, initial approximations may be obtained by various methods, such as bipartite lookup tables and ITY (Ito-Takagi-Yajima) initial approximation algorithms.

In general, the Newton Raphson technique can be used to find the value of a function, say g(z), for some particular input value of z, called b. It may be the case that the function q(z) cannot easily be computed directly (e.g. if the function is a square root operation), and in that case a different function, let's call it fix), is used wherein f(q(b)) = 0. The Newton-Raphson technique is an example of an iterative converging approximation technique which is good for finding a zero of a function, and if it is applied to the function f(x), then a value for,g(b) can be determined, by finding the value of x at which f(x) = 0. For example, if the function,q(z) is a square root operation, ,q(z) = then the function f(x) may be chosen to be f(x) = b -x2, because this function equals zero when x = V. There are other options for functions f(x) that would equal zero when x = VS.

The general principles of the Newton-Raphson method are well known in the art, but a brief explanation is given here to aid the understanding of the following examples. The Newton-Raphson method starts with an initial guess (denoted Po) for a zero of the function fix). The initial guess is typically close to, but not exactly equal to, the correct answer, such that f(p0) != 0, so!= g(b). From the point (pof(po)), the tangent to the curve f(x) is determined and then the value of x at which the tangent intersects the x-axis is found. The slope of the curve lit) is given by the derivative of f(x), by the equation: f'(x)=---(f(x)). (1) The point (p, q) = (p0' f(p)) and the slope m = f'(p) determines the line of the tangent according to the equation: y=mx-f-b =mx-f-q-mp =xf'(p0)+f(p0)-p0f'(p0). (2) The straight line defined by equation 2 is a local approximation to the curve f(x).

Thus, the value oft where this line crosses the x-axis is similar to the value oft where f(x) crosses the x-axis. Hence, the value oft where this line crosses the x-axis is a better approximation than Po to the value oft where f(x) = 0. So the value oft where the line crosses the x-axis is used as the next approximation, Pi' of the zero of f(x), i.e. the next approximation of g(b). To find where the line of equation 2 intersects the x-axis, y is set to zero and the equation is solved to find x such that: (3) This method is iterated to repeatedly find better approximations of the zero of the function until a desired accuracy of the result is achieved. For example, the desired result may be a single precision floating point number in which case at least 24 bits of precision are desired; or the desired result may be a double precision floating point number in which case at least 53 bits of precision are desired. Therefore, over a sequence of iterations, the method will determine the approximations as: f(Po) Pt -Pu -f1(p) fQ' ) P2 -Pt -fF(p) [(P2) P3 -P2 -[(P2) and in general for the (/+1)th iteration: Pt+i = P JF(pj) (4) Each iteration provides a better approximation than the previous iteration for the zero of the function fix).

As well as ensuring that an accurate solution is provided, other considerations when choosing how to implement an operation in a computer system are how long the operation will take (i.e. the latency of the operation) and the power consumption of performing the operation on the computer system. These considerations are particularly important in computer systems which have particularly limited processing resources, e.g. on mobile devices, for which the processing power is preferably kept low to avoid draining a battery and/or to avoid excess heat generation. Furthermore, the operations often need to be performed in real-time (e.g. when a user is waiting for a response which depends upon the result of the operation, e.g. when the user is playing a game which uses a graphics processor which needs to perform a particular operation (e.g. a square root operation)), and in these cases the latency of the operation is important.

Therefore, any improvement to the speed and/or power consumption of operations, such as square root operations, performed on computer systems may be of significant benefit.

Some mathematical operations are simple to perform in hardware: such as addition, subtraction, multiplication and shifting. However, other mathematical operations are not so simple to perform in hardware such as division and performing a square root. If an iterative converging approximation technique such as the Newton-Raphson technique is used to find the result of a square root operation, some known functions to be used by the Newton Raphson technique for performing a square root would involve performing division computations. For example, if the Newton Raphson method is performed on the function f(x) = b -x2 then equation 4 becomes: (5) Implementing equation 5 would involve a division by x, and as such is not simple to compute in a computer system.

Summary

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

There is provided a method of implementing a square root operation in a computer system to determine a value of V, where b is an input value, the method using an iterative converging approximation technique for determining converging approximations of, the method comprising: obtaining an initial approximation of and implementing one or more iterations of the iterative converging approximation technique using multiplier logic of the computer system, wherein a concluding iteration of the one or more iterations of the iterative converging approximation technique comprises: (i) performing a first computation with the multiplier logic of the computer system to determine a first intermediate parameter r, for the concluding iteration based on a multiplication of the input value b with a previous approximation of 4 (ii) performing a second computation with the multiplier logic to determine a second intermediate parameter s, for the concluding iteration based on a multiplication of the first intermediate parameter r, for the concluding iteration with the previous approximation of-h; and (iii) performing a concluding computation with the multiplier logic to determine the value of IF based on a multiplication of the first intermediate parameter i-c, for the concluding iteration with the second intermediate parameter 5c for the concluding iteration.

There is also provided a computer system configured to implement a square root operation to determine a value of VS, where b is an input value, the computer system comprising an iterative converging approximation module arranged to receive an initial approximation of -and configured to use an iterative converging approximation technique for determining converging approximations of 4 the iterative converging approximation module comprising multiplier logic; wherein the iterative converging approximation module is configured to implement one or more iterations of the iterative converging approximation technique using the multiplier logic, wherein to implement a concluding iteration of the one or more iterations of the iterative converging approximation technique the iterative converging approximation module is configured to: (i) perform a first computation with the multiplier logic to determine a first intermediate parameter r for the concluding iteration based on a multiplication of the input value b with a previous approximation of 4 (H) perform a second computation with the multiplier logic to determine a second intermediate parameter s for the concluding iteration based on a multiplication of the first intermediate parameter ij, for the concluding iteration with the previous approximation of; and (iii) perform a concluding computation with the multiplier logic to determine the value of VE based on a multiplication of the first intermediate parameter ic for the concluding iteration with the second intermediate parameter 5c for the concluding iteration.

There may further be provided computer readable code for generating a computer system according to any of the examples described herein. Furthermore, there may be provided computer readable code adapted to perform the steps of any of the methods described herein when the code is run on a computer. The computer readable code may be encoded in a computer readable storage medium.

The above features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the examples described herein.

Brief Description of the Drawings

Examples will now be described in detail with reference to the accompanying drawings in which: Figure 1 is a schematic diagram of a computer system for implementing a square root operation; Figure 2 is a flow chart for a method of implementing a square root operation in a computer system; and Figure 3 shows a high-level representation of a computer system.

The accompanying drawings illustrate various examples. The skilled person will appreciate that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the drawings represent one example of the boundaries. It may be that in some examples, one element may be designed as multiple elements or that multiple elements may be designed as one element. Common reference numerals are used throughout the figures, where appropriate, to indicate similar features.

Detailed Description

The examples described herein provide a method for finding the result of a square root operation to a desired level of accuracy (i.e. to a desired number of bits of accuracy) using an iterative converging approximation technique which includes fewer computations than conventional methods, and only includes computations which are simple to implement in hardware on a computer system, such as multiplication, addition, subtraction and shifting. Therefore, the methods described herein allow a computer system to perform a square root operation with lower latency and with lower power consumption than conventional methods. This can provide a significant benefit, particularly in computer systems which have limited processing resources, e.g. on mobile devices such as smart phones or tablets. It is noted that square root operations are very common in many different computer applications, so even a small improvement in the way in which a computer system can implement a square root operation may provide a significant benefit.

Figure 1 shows an example of a computer system 100 for implementing a square root operation. The computer system 100 may be implemented in hardware.

Implementing the system 100 in hardware may allow for faster computation. The computer system 100 comprises initial approximation logic 102, an iterative converging approximation module 104, and check logic 106. The iterative converging approximation module 104 comprises a control module 108, multiplier logic 110 and a memory 112. In some examples, the logic and modules shown in Figure 1 are implemented in hardware, e.g. as fixed function circuitry in a computer processor. Each logic block and module shown in Figure 1 may be implemented as one or more units within the computer system 100. The system is arranged to receive an input value b. In particular, the initial approximation logic 102 is arranged to receive the input value b. An output of the initial approximation logic 102 is coupled to an input of the iterative converging approximation module 104, for providing an initial approximation of -to the iterative converging approximation module 104. An output of the iterative converging approximation module 104 is coupled to an input of the check logic 106 for providing a determined value of VS to the check logic 106. An output of the check logic 106 is arranged to provide output of the system 100 as the correctly rounded determined value of VS.

The operation of the computer system 100 is described with reference to Figure 2 which shows a method of implementing a square root operation in the computer system 100. In step S202 an input value b is received at the initial approximation logic 102. In some examples, the system 100 may comprise scaling logic which scales an initial input value by an even power of two such that the input value b received at the initial approximation logic 102 is in the range 1 «= b <4. In those examples, further scaling logic may be provided in the system 100 to scale the output from the check logic 106 to reverse the scaling performed before the input value b is passed to the initial approximation logic 102. For example, if an initial input value is scaled by a factor of 1/4 to determine the input value b then the determined value of VS output from the check logic 106 is scaled by a factor of 2 to determine the square root of the initial input value. Such scaling and reverse scaling would be apparent to those skilled in the art, and can be implemented such that 1 «= b <4. Since the scaling is by an even power of two then both the scaling and the reverse scaling can be achieved by changing exponents.

The examples described below relate to the Newton Raphson technique, but any other suitable iterative converging approximation technique could be used. As described above, if the Newton Raphson technique is used to determine the zeroes of the function f(x) = b -x2 then, according to equation 5, a division operation would be needed, making the computation not simple to implement in hardware in the computer system 100. So instead, it is noted that VE = 4 and the Newton Raphson method can be used to find a value of 4 which can then be multiplied by b to thereby find a value of. Therefore, in order to implement the Newton Raphson technique, a function f(x) is used whereby f (-k) = o. There are many functions which could be used, but in the examples described herein, a function f(x) = b --is used. Therefore, f'(x) = 4. This means that, in accordance with equation 4 given above, the Newton Raphson method involves computing, on the (I+l)th iteration, an approximation of -, denoted Pt+i according to the equation: Pt+i = p. (6) So, for example, Pi = m (3_?0z) is a better approximation than Po to 4 It is noted that the computations involved in performing an iteration according to equation 6 are simple to implement in a computer system, e.g. they comprise multiplication, subtraction and shifting. The only division involved in an iteration is division by two which can be accomplished (in binary) with a shift instead of a divide, and shift operations are trivial to implement.

Therefore, in step S204 the initial approximation logic 102 computes an initial approximation of 4 denoted Po There are various approaches which may be used to compute the initial approximation of, such as a simple table lookup, using parallel table lookups and combining the results, or using a table lookup followed by a multiply to implement an ITY algorithm. For example, the initial approximation Po may have at least three bits of accuracy. The initial approximation Po is provided to the iterative converging approximation module 104.

In other examples, the system 100 may receive an initial approximation of*which has been determined outside of the system 100. The initial approximation may in that case be passed to the iterative converging approximation module 104 and there would be no need to implement the initial approximation logic 102 in the system 100. In general, the system 100 obtains an initial approximation of (e.g. by either determining it or receiving it), and provides the initial approximation to the iterative converging approximation module 104.

Steps S206 to S218 of the method shown in Figure 2 are implemented in the iterative converging approximation module 104 in order to determine a value of VS. The multiplier logic 110 may be implemented as a binary multiplier which multiplies numbers together using binary adders, e.g. by computing a set of partial products and then summing the partial products together. It is therefore simple to implement multiply, add, subtract and shift operations in the multiplier logic 110.

The control module 108 controls the operation of the iterative converging approximation module 104 so as to perform the iterative converging approximation technique (e.g. the Newton Raphson technique) for determining converging approximations of 4 The memory 112 is used to store values for subsequent use, e.g. to store intermediate parameters and/or to store approximations of 4 as will become apparent from the description of the examples below.

In step S206 the iterative converging approximation module 104 (e.g. the control module 108) sets 1=0. I is an index, wherein (1+1) denotes the current iteration.

The control module 108 keeps track of which iteration is being implemented, and controls the number of iterations which are performed before a value is outputted form the iterative converging approximation module 104.

The iterative converging approximation module 104 receives the input value b and the initial approximation Po and stores these values in the memory 112. Each iteration of the iterative converging approximation technique comprises computations including three (and only three in this example) multiplies performed by the multiplier logic 110 of the iterative converging approximation module 104.

On each computation, the multiplier logic 110 is capable of performing a multiply operation and/or one or more add/subtract operations and/or a shift operation. On each non-concluding iteration (i.e. where icc), the iterative converging approximation module 104 determines a value of Pi+i in accordance with equation 6 using the values of pj and b. In order to do this, in step S208 a first computation is performed for the current iteration with the multiplier logic 110 to determine a first intermediate parameter, r1, based on a multiplication of the input value b with the value pj which is the previous approximation of 4 On the first iteration (when i=0) the previous approximation of-is the initial approximation Po As an example, the first intermediate parameter, r1, may be determined according to the equation: = bp. (7) The determined value of the first intermediate parameter, r1, may be stored in the memory 112 for use in subsequent computations.

In step S21 0 a second computation is performed for the current iteration with the multiplier logic 110 to determine a second intermediate parameter, s, based on a multiplication of the first intermediate parameter, r, with the value p1 which is the previous approximation of. For example, the second intermediate parameter, s, may be determined according to the equation: = (8) The determined value of the second intermediate parameter, s, may be stored in the memory 112 for use in subsequent computations.

In step S212 it is determined whether the current iteration is a concluding iteration or not, by determining whether icc. For non-concluding iterations, i.e. where icc, the method passes to step S214.

In step S214 a third computation is performed for the current iteration with the multiplier logic 110 to determine a refined approximation of-, denoted Pt+i which is for use in a subsequent iteration, based on a multiplication of the second intermediate parameter, s1, with the value p which is the previous approximation of 4 For example, the refined approximation, P1+1' may be determined according to the equation: Pt+i = ipj. (9) The refined approximation of 4 P+i may be stored in the memory 112 for use in subsequent iterations.

It can be appreciated that the three computations shown by the equations 7, 8 and 9 provide the result for P1+1 in accordance with equation 6, but each of the three computations is suitable for being performed in a binary multiplier of the multiplier logic 110. In particular, binary multipliers are often capable of multiplying two (but not more than two) numbers together in a single computation. Each of the computations performed in steps S208, S210 and S214 comprise multiplying two numbers together. In this example, step S210 also comprises a subtraction and a shift (i.e. a divide by two), but these processes can be approximated after the multiplication performed in that step. Each of the three computations of an iteration may comprise some number of clock cycles (e.g. 3, 4 or 5 clock cycles) to complete.

When a refined approximation, Pt+i has been determined then, in step S216, a new iteration can be started and the index i is incremented (i.e. i = i + 1). The method then passes back to step S208 and the method proceeds from that point as described above.

The control module 108 determines the number of iterations that are to be performed and sets the value of c to reflect this. As an example, the control module 108 may control the number of iterations of the iterative converging approximation technique which are to be performed based on: (i) the number of bits of accuracy of the initial approximation of, p' and (ii) a desired number of bits of accuracy of the determined value of The Newton-Raphson method of finding the reciprocal square root, using the above equations, provides a convergence that is quadratic. That is, each iteration approximately doubles the accuracy of the approximation. This can be seen as follows. An approximation Pt is not an exact result, so * 4 and instead there is some error, E, in the approximation such that p + i = The error, Et, may be positive or negative. Therefore p can be written as: (10) From equation 10 and equations 7, 8 and 9, it follows that: (11) (12) Pi+i = EN(3 -E1V). (13) As described above, a scaling operation may be performed such that 1 «= b <4, and a good initial approximation is assumed such that the approximation Po has at least three bits of accuracy such that k01 cc 2. Therefore, (3-E1VS) is positive, 2c 1.

and so is -h-. Therefore, P1+1 c-. It is useful to ensure that the approximations of are not larger than the true value of -, because a check procedure performed by the check logic 106 (described below with reference to step S220) may rely on an assumption that the determined value for VE is less than the true value of in order to check that the value for Vi is correctly rounded. The error in the approximation Pt+i is given by: Et+i = 1(3 --1 2 1 -Since Pt+i differs from -by about E1 (whereas p differs from -by E1) it can be appreciated that P1+1 has about twice as many bits of accuracy as p. For example, if the initial approximation, Pa' has 7 bits of accuracy, then Pi would have about 14 bits of accuracy, P2 would have about 28 bits of accuracy (enough for single precision floating point), and p would have about 56 bits of accuracy (enough for double precision floating point).

As an example, the control module 108 may determine that the desired result is a single precision floating point number, for which at least 24 bits of accuracy are desired, and that the initial approximation, Pa' has 7 bits of accuracy. In that case, the controller sets c=1, such that two iterations are performed, and therefore in a simple example six computations may be performed to determine P2-In this simple example, at the end of the second iteration the value of P2 could be multiplied byb in order to determine a value of IE, i.e. result = hp2 such that seven computations are performed to determine a value of V. As another example, the control module 108 may determine that the desired result is a double precision floating point number, for which at least 53 bits of accuracy are desired, and that the initial approximation, Po' has 7 bits of accuracy. In that case, the controller sets c=2, such that three iterations are performed, and therefore in a simple example nine computations may be performed to determine p3. In this simple example, at the end of the third iteration the value of p could be multiplied by b in order to determine a value of Vh i.e. result = bp3, such that ten computations are performed to determine a value of VS.

However, the number of computations performed to determine a value of VS can be reduced compared to the simple examples described in the two preceding paragraphs. This is achieved by implementing the concluding iteration without determining a refined approximation of-on the concluding iteration.

On the final iteration, i.e. the concluding iteration, then the control module 108 has set c such that i=c. Steps S208 and S210 are performed as above in accordance with equations 7 and 8. That is, in step S208 on the final iteration, the first computation is performed with the multiplier logic 110 to determine a first intermediate parameter rc for the concluding iteration based on a multiplication of the input value b with a previous approximation of-i, i.e. Pc For example the first intermediate parameter r, for the concluding iteration may be given by the equation: r=bpc. (14) Furthermore, in step S210 on the final iteration, the second computation is performed with the multiplier logic 110 to determine a second intermediate parameter s, for the concluding iteration based on a multiplication of the first intermediate parameter r for the concluding iteration with the previous approximation of 4 i.e. Pc For example the second intermediate parameter Sc for the concluding iteration may be given by the equation: = (15) Then in step S212 it is determined that/is not less than c because this is the final iteration so i=c. Therefore the method passes from step S212 to step S218. In step S216, instead of determining a refined approximation of 4 i.e. Pc+i a concluding computation is performed with the multiplier logic 110 to determine the value of VS based on a multiplication of the first intermediate parameter i-,, for the concluding iteration with the second intermediate parameter s, for the concluding iteration. For example the value of VS (denoted "result") may be determined according to the equation: result = rcsc. (16) In this way, the two steps from the simple example described above of: (i) determining Pc+i as Pc+i = 5cPc' and then (ii) determining the result as result = bPc+i, are reduced into one step as given by equation 16. The following equation shows that this reduction is valid: result = bpc+i = bScpc = bPcSc = i-cSc. (17) Reducing the number of computations which are performed in the final iteration can provide a significant benefit. Each computation takes a number of clock cycles (e.g. 3, 4 or 5 clock cycles) to be performed. So reducing the number of computations that are performed on the final iteration reduces the time taken to determine the value of VT. This means that the latency is reduced, i.e. the result can be provided sooner. Furthermore, reducing the number of computations which are performed reduces the power that is consumed to determine the value of The value of VT determined in step S218 is outputted from the iterative converging approximation module 104 to the check logic 106. In step S220, the check logic 106 may perform a check procedure on the determined value of VT in accordance with a rounding mode to check that the determined value of VT is correct in accordance with the rounding mode. The rounding mode may for example be a round up mode, a round down mode or a round to nearest mode. Details of the check procedure performed by the check logic 106 are beyond the scope of this disclosure, but it is noted that the check procedure may rely on an assumption that the determined value of VT is not greater than the exactly correct value of VT. The output from the check logic 106 is either the same as the value of VT received from the iterative converging approximation module 104 or is that value incremented by one unit of least precision (ULP) (i.e. the value of VT received from the iterative converging approximation module 104 with the least significant digit incremented by one).

In step S222 the resulting value for VT is outputted from the check logic 106 as the output of the computer system 100. The outputted value of VT may be put to any suitable use after it has been outputted, e.g. stored in a memory or used in subsequent computations, etc. In some examples, the check logic 106 might not be implemented in the system 100. That is, the check procedure might not be performed in some examples. In those examples, the value of VT determined by the iterative converging approximation module 104 is outputted from the system 100 and used to represent the value of vS.

The accuracy of the iterations is now considered. The multiplier logic 110 may be configured to take two k-bit values as input and provide a 2k-bit result. For example, if b and Pi both have k bits then in the first computation when bp1 is computed, the result r1 has twice as many bits (2k) as each of the inputs.

However, when r1 is used as an input to the next computation in computing r1p1, it may only contain k bits. So, the k least significant bits of r1 are removed and the k most significant bits are rounded up or down before being used in the next computation. This rounding may introduce additional error terms. In particular, the first intermediate parameter r has an error, a, introduced by rounding, such that equation 11 becomes: (18) In order to round r up, a would be positive; and in order to round r down, a would be negative. The second intermediate parameter s has an error, fi, introduced by rounding, and using equations 8, 10 and 18, s is given by: s =;+p = = l+E1vS_ç_(_E1)+p. (19) In order to round s up, /9 would be positive; and in order to round s down, /9 would be negative.

On the concluding iteration, the determined value for VS is computed with an error, 6, introduced by rounding. Using equations 16, 18 and 19 result is given by: result = rs + 6 / a 1 result = (VS -sb + a) + ---Ec) + + 6 Eb\/ii a aE.\/S. Eb2 aE.VS aE?b result=V5+Ecb-2 +fi\EcbEbVS+ 2 + 2 -___ aSh a2 a2. 2vS

result = VS -____ -+ a(!_ -+ -+ + + a -b) + S (20) As described above, for the check procedure to work correctly, the result should not be greater than /E To ensure this, on the final iteration, rc, s and result are rounded down, i.e. a, /3 and S are set to be negative. On non-concluding iterations the rounding of, ij, s and m+ does not need to be constrained to any particular rounding mode, but it may be simpler to use a round down mode since this is used on the concluding iteration. On the concluding iteration, the error is small because this is the error in the previous approximation Pc For example, for a single precision result, on the final iteration, lEd < 2-24 and for a double precision result, on the final iteration, ej Also since a and 4? are rounding errors, they have a similar magnitude to so for a single precision result, on the final iteration, 0 »= a > _2_24 and 0 »= /3 > _2_24, and for a double precision result, on the final iteration, 0 »= a > -2 and 0 »= /3 > Furthermore, as described above, 1 «= b <4, such that 1 «= <2. Therefore each term in equation 20 after VF is negative since a, 4? and Ec are all tiny compared to 3, and V. Therefore, the result determined in equation 20 is smaller than VE such that it is suitable for the check procedure. This is true irrespective of whether E is positive or negative, which is why it is not important to constrain the rounding performed in the non-concluding iterations.

In the examples described above, on non-concluding iterations, a refined approximation P+i is determined according to equation 6 using the three computations as set out in equations 7 to 9. In an alternative method, in step S208, the first computation is performed in the same way as described above for a current iteration with the multiplier logic 110. That is, a first intermediate parameter, r1, may be determined according to the equation: rj=bp1. (21) The determined value of the first intermediate parameter, r1, may be stored in the memory 112 for use in subsequent computations.

In the alternative method, in step S210, as described above, a second computation is performed for the current iteration with the multiplier logic 110 to determine a second intermediate parameter, s, based on a multiplication of the first intermediate parameter, r1, with the value Pt which is the previous approximation of 4 However, in contrast to equation 8 given above, in the alternative method the second intermediate parameter, s, may be determined according to the equation: = 1-rp (22) The determined value of the second intermediate parameter, s, may be stored in the memory 112 for use in subsequent computations.

Then for non-concluding iterations, in step S214, a third computation is performed for the current iteration with the multiplier logic 110 to determine a refined approximation of 4 denoted Pt+i, which is for use in a subsequent iteration, based on a multiplication of the second intermediate parameter, s, with the value Pi which is the previous approximation of-k. In the alternative method, the refined approximation, Pi÷i, may be determined according to the equation: = s1p1 + p. (23) The refined approximation of 4 Pt+i, may be stored in the memory 112 for use in subsequent iterations.

For concluding iterations, in step S218, a concluding computation is performed with the multiplier logic 110 to determine the value of I3 based on a multiplication of the first intermediate parameter r for the concluding iteration with the second intermediate parameter s, for the concluding iteration. In the alternative method, the result may be determined according to the equation: result = rs + r. (24) This alternative method has the same advantages as the method described in detail above. In particular, the final iteration avoids a computation to determine a refined approximation of 4 As described above, reducing the number of computations that are performed on the final iteration has benefits in terms of the power consumption and latency of the system 100 in determining the value of.

By way of explanation, equation 24 has the same result as performing two computations: (i) determining Pc+i as Pc+i = 5cPc + Pc' and then (ii) determining the result as result = bpc+i. This is shown in the following equation: result = = bsp + bp = rcsc -E r. (25) In the alternative method, all operations typically use fused multiply-add operations with the requested rounding mode for the final result, in which case no check procedure is needed.

In the examples described above, a plurality of iterations of the iterative converging approximation technique are performed (i.e. c »= 1), such that a refined approximation of -determined in the iteration preceding the final iteration is used as a previous approximation of* in the concluding iteration. However, in other examples, only one iteration of the iterative converging approximation technique may be performed, such that the first iteration is the concluding iteration (i.e. c=0) and the initial approximation of is used as the previous approximation of* in the concluding iteration.

The computing system 100 described above with reference to Figure 1 can be implemented as a unit at a processor in a larger computer system. For example, Figure 3 shows a larger computer system 300 which comprises a processor 302 on which the system 100 is implemented. The processor 302 is a central processing unit (CPU). In the example shown in Figure 3, the computer system 300 also comprises a memory 304, a graphics processing unit (GPU) 306 and other devices 308, such as a display 310, speakers 312, a microphone 314 and a keypad 316. The components of the computer system 300 can communicate with each other via a communications bus 318. In other examples, the system 100 may be implemented as a unit on the GPU 306 as well as or instead of being implemented as a unit on the CPU 302. When a square root operation is to be performed, an input value b can be provided to the unit 100 and the unit 100 operates as described above to output a value of V15 which can then be used in the system 300 as appropriate.

Examples are described above, by way of example only, of a computer system which is configured to implement a square root operation using an iterative converging approximation technique in a manner which has low latency and low power consumption. For example, the number of computations is lower than might be expected. This results in a faster method which uses less power than conventional methods, and the method may be implemented in hardware which is smaller and simpler to implement than conventional hardware for implementing square root operations.

The terms module', block' and logic' are used herein to generally represent hardware, including fixed function hardware, configurable hardware, programmable hardware, and combinations thereof. Firmware, software, or some combination thereof can be used to configure and/or program such hardware.

In one example, the methods described may be performed by a computer configured with software in machine readable form stored on a computer-readable medium. The computer-readable medium may be configured as a non-transitory computer-readable storage medium and thus is not a signal bearing medium.

Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (RaM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.

The software may be in the form of a computer program comprising computer program code. The program code can be stored in one or more computer readable media. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

Those skilled in the art will realize that all, or a portion of the functionality, techniques, logic or methods may be carried out by a dedicated circuit, an application-specific integrated circuit, a programmable logic array, a field-programmable gate array, or the like. For example, the module, block, unit or logic may comprise hardware in the form of circuitry. Such circuitry may include transistors and/or other hardware elements available in a manufacturing process.

Such transistors and/or other elements may be used to form circuitry or structures that implement and/or contain memory, such as registers, flip flops, or latches, logical operators, such as Boolean operations, mathematical operators, such as adders, multipliers, or shifters, and interconnects, by way of example. Such elements may be provided as custom circuits or standard cell libraries, macros, or at other levels of abstraction. Such elements may be interconnected in a specific arrangement. The module, block, unit or logic (e.g. the components shown in Figure 1) may include circuitry that is fixed function and circuitry that can be programmed to perform a function or functions; such programming may be provided from a firmware or software update or control mechanism. In an example, hardware logic has circuitry that implements a fixed function operation, state machine or process.

It is also intended to encompass software which describes" or defines the configuration of hardware that implements a module, block, unit or logic described above, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code for generating a computer system (e.g. computer hardware) configured to perform any of the methods described herein, or for generating a computer system (e.g. computer hardware) comprising any apparatus described herein. One such configuration of a computer-readable medium is signal bearing medium and thus is configured to transmit the instructions (e.g. as a carrier wave) to the computing device, such as via a network. The computer-readable medium may also be configured as a non-transitory computer-readable storage medium and thus is not a signal bearing medium. Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.

The term processor' and computer' are used herein to refer to any device, or portion thereof, with processing capability such that it can execute instructions, or a dedicated circuit capable of carrying out all or a portion of the functionality or methods, or any combination thereof.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. It will be understood that the benefits and advantages described above may relate to one example or may relate to several examples.

Any range or value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person. The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.

Claims

Claims 1. A computer system configured to implement a square root operation to determine a value of V1 where b is an input value, the computer system comprising an iterative converging approximation module arranged to receive an initial approximation of and configured to use an iterative converging approximation technique for determining converging approximations of -4, the iterative converging approximation module comprising multiplier logic; wherein the iterative converging approximation module is configured to implement one or more iterations of the iterative converging approximation technique using the multiplier logic, wherein to implement a concluding iteration of the one or more iterations of the iterative converging approximation technique the iterative converging approximation module is configured to: (i) perform a first computation with the multiplier logic to determine a first intermediate parameter r, for the concluding iteration based on a multiplication of the input value b with a previous approximation of-p; (ii) perform a second computation with the multiplier logic to determine a second intermediate parameter s for the concluding iteration based on a multiplication of the first intermediate parameter L for the concluding iteration with the previous approximation of -4; and (iii) perform a concluding computation with the multiplier logic to determine the value of VF based on a multiplication of the first intermediate parameter r for the concluding iteration with the second intermediate parameter 5c for the concluding iteration.
2. The computer system of claim 1 wherein the iterative converging approximation module further comprises: a memory for storing at least the first intermediate parameter rc and the previous approximation of-4; and a control module configured to control which values are provided to the multiplier logic for the respective computations of the iterative converging approximation technique.
3. The computer system of claim 2 wherein the control module is further configured to control the number of iterations of the iterative converging approximation technique which are implemented based on: (i) the number of bits of accuracy of the initial approximation of, and (ii) a desired number of bits of accuracy of the determined value of \/ii
4. The computer system of any preceding claim wherein said one or more iterations of the iterative converging approximation technique comprises only one iteration, and wherein said previous approximation of -is the initial approximation of 4
5. The computer system of any of claims 1 to 3 wherein said one or more iterations of the iterative converging approximation technique comprises a plurality of iterations, and wherein said previous approximation of used in the concluding iteration is determined in the iteration preceding the concluding iteration.
6. The computer system of claim 5 wherein to implement each of at least one non-concluding iteration of the plurality of iterations of the iterative converging approximation technique the iterative converging approximation module is configured to: (i) perform a first computation with the multiplier logic to determine a first intermediate parameter r for the current iteration based on a multiplication of the input value b with a previous approximation of 4 (H) perform a second computation with the multiplier logic to determine a second intermediate parameter s for the current iteration based on a multiplication of the first intermediate parameter i-1 for the current iteration with the previous approximation of 4 and (Hi) perform a third computation with the multiplier logic to determine a refined approximation of -for use in a subsequent iteration based on a multiplication of the second intermediate parameter s for the current iteration with the previous approximation of
7. The computer system of any preceding claim wherein the iterative converging approximation module is configured to perform three computations with the multiplier logic for each of the one or more iterations of the iterative converging approximation technique.
8. The computer system of any preceding claim wherein the iterative converging approximation module is configured to implement the concluding iteration without determining a refined approximation of -on the concluding iteration.
9. The computer system of any preceding claim wherein the iterative converging approximation module is configured such that: the first computation of the concluding iteration comprises determining the first intermediate parameter r in accordance with the equation r = bp where Pc is the previous approximation of the second computation of the concluding iteration comprises determining the second intermediate parameter s, in accordance with the equation s, = (3 7'cPc) and the concluding computation of the concluding iteration comprises determining the value of V, denoted result, in accordance with the equation result =
10. The computer system of any preceding claim wherein the iterative converging approximation module is configured such that for the concluding iteration: the first computation performed by the multiplier logic comprises rounding down to determine the first intermediate parameter r,; the second computation performed by the multiplier logic comprises rounding down to determine the second intermediate parameter c; and the concluding computation performed by the multiplier logic comprises rounding down to determine the value of.
11. The computer system of any preceding claim further comprising check logic which is configured to: receive the determined value of VE from the iterative converging approximation module; and perform a check procedure on the determined value of IF in accordance with a rounding mode to check that the determined value of V7 is correct in accordance with the rounding mode.
12. The computer system of any of claims ito S wherein the iterative converging approximation module is configured such that: the first computation of the concluding iteration comprises determining the first intermediate parameter r in accordance with the equation r = bp where Pc is the previous approximation of the second computation of the concluding iteration comprises determining the second intermediate parameter s in accordance with the equation s = !(1 cPc) and the concluding computation of the concluding iteration comprises determining the value of V, denoted result, in accordance with the equation result = rcsc + rc.
13. The computer system of any preceding claim further comprising initial approximation logic configured to: obtain the initial approximation of-4 by: (i) computing the initial approximation of or (ii) receiving the initial approximation of and provide the obtained initial approximation of to the iterative converging approximation module.
14. The computer system of any preceding claim wherein the iterative converging approximation technique is a Newton-Raphson technique.
15. A method of implementing a square root operation in a computer system to determine a value of V1 where b is an input value, the method using an iterative converging approximation technique for determining converging approximations of the method comprising: obtaining an initial approximation of 4 and implementing one or more iterations of the iterative converging approximation technique using multiplier logic of the computer system, wherein a concluding iteration of the one or more iterations of the iterative converging approximation technique comprises: (I) performing a first computation with the multiplier logic of the computer system to determine a first intermediate parameter r for the concluding iteration based on a multiplication of the input value b with a previous approximation of 4 (ii) performing a second computation with the multiplier logic to determine a second intermediate parameter s for the concluding iteration based on a multiplication of the first intermediate parameter rc for the concluding iteration with the previous approximation of -; and v b (Di) performing a concluding computation with the multiplier logic to determine the value of VE based on a multiplication of the first intermediate parameter r, for the concluding iteration with the second intermediate parameter s for the concluding iteration.
16. The method of claim 15 wherein said implementing one or more iterations of the iterative converging approximation technique using multiplier logic of the computer system comprises implementing only one iteration of the iterative converging approximation technique, and wherein said previous approximation of is the initial approximation of-.
17. The method of claim 15 wherein said implementing one or more iterations of the iterative converging approximation technique using multiplier logic of the computer system comprises implementing a plurality of iterations of the iterative converging approximation technique, and wherein said previous approximation of -used in the concluding iteration is determined in the iteration preceding the concluding iteration.
18. The method of claim 17 wherein each of at least one non-concluding iteration of the plurality of iterations of the iterative converging approximation technique comprises: (i) performing a first computation with the multiplier logic of the computer system to determine a first intermediate parameter rj for the current iteration based on a multiplication of the input value b with a previous approximation of-; (H) performing a second computation with the multiplier logic to determine a second intermediate parameter s for the current iteration based on a multiplication of the first intermediate parameter r1 for the current iteration with the previous approximation of and (iii) performing a third computation with the multiplier logic to determine a refined approximation of -for use in a subsequent iteration based on a multiplication of the second intermediate parameter s for the current iteration with the previous approximation of
19. The method of any of claims 15 to 18 wherein each of the one or more iterations of the iterative converging approximation technique comprises three computations performed by the multiplier logic.
20. The method of any of claims iSto 19 wherein the concluding iteration is implemented without determining a refined approximation of-on the concluding iteration.
21. The method of any of claims 15 to 20 wherein: the first computation of the concluding iteration comprises determining the first intermediate parameter r in accordance with the equation r = bp where Pc is the previous approximation of -4; the second computation of the concluding iteration comprises determining the second intermediate parameter s, in accordance with the equation s, = !(3 -rpj; and the concluding computation of the concluding iteration comprises determining the value of VS, denoted result, in accordance with the equation result = TcSc
22. The method of any of claims 15 to 21 wherein for the concluding iteration: the first computation performed by the multiplier logic comprises rounding down to determine the first intermediate parameter r,; the second computation performed by the multiplier logic comprises rounding down to determine the second intermediate parameter sc; and the concluding computation performed by the multiplier logic comprises rounding down to determine the value of VS.
23. The method of any of claims 15 to 22 further comprising performing a check procedure on the determined value of VS in accordance with a rounding mode to check that the determined value of VS is correct in accordance with the rounding mode.
24. The method of any of claims 15 to 20 wherein: the first computation of the concluding iteration comprises determining the first intermediate parameter r in accordance with the equation r = bpc where Pc is the previous approximation of -4; the second computation of the concluding iteration comprises determining the second intermediate parameter 5c in accordance with the equation 5c = (1-npc);and the concluding computation of the concluding iteration comprises determining the value of VE, denoted result, in accordance with the equation result = rs + r.
25. The method of any of claims 15 to 24 further comprising scaling an initial input value by an even power of two to determine the input value b such that 1«=b<4.
26. The method of any of claims 15 to 25 wherein said obtaining an initial approximation ofcomprises: (computing the initial approximation of-4, or(ii) receiving the initial approximation of -4.
27. The method of any of claims 15 to 26 wherein the initial approximation of -has at least three bits of accuracy.
28. The method of any of claims 15 to 27 wherein the computer system further comprises a control module which controls the number of iterations of the iterative converging approximation technique which are implemented based on: (i) the number of bits of accuracy of the initial approximation of and (ii) a desired number of bits of accuracy of the determined value of V.
29. The method of any of claims 15 to 28 wherein the iterative converging approximation technique is a Newton-Raphson technique.
30. Computer readable code defining a computer system according to any of claims 1 to 14, whereby the computer system is manufacturable.
31. A computer readable storage medium having encoded thereon computer readable code for defining a computer system according to any of claims 1 to 14, whereby the computer system is manufacturable.
32. Computer readable code adapted to perform the steps of the method of any of claims 15 to 29 when the code is run on a computer.
33. A computer readable storage medium having encoded thereon the computer readable code of claim 32.