WO2024078033A1 - 一种浮点数平方根计算方法及浮点数计算模块 - Google Patents

一种浮点数平方根计算方法及浮点数计算模块 Download PDF

Info

Publication number
WO2024078033A1
WO2024078033A1 PCT/CN2023/104073 CN2023104073W WO2024078033A1 WO 2024078033 A1 WO2024078033 A1 WO 2024078033A1 CN 2023104073 W CN2023104073 W CN 2023104073W WO 2024078033 A1 WO2024078033 A1 WO 2024078033A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
mantissa
bit width
point number
parameter
Prior art date
Application number
PCT/CN2023/104073
Other languages
English (en)
French (fr)
Inventor
罗元勇
谷志岩
王建峰
龙子超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024078033A1 publication Critical patent/WO2024078033A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • the present application relates to the field of electronic technology, and in particular to a floating point number square root calculation method and a floating point number calculation module.
  • Floating point square root calculation has developed into a basic operation supported by processors. At present, it is widely used in processors that support floating point calculation. For example, central processing unit (CPU), graphics processing unit (GPU), artificial intelligence (AI) processors, etc. Floating point square root calculation is widely used in digital signal processing, graphics computing, high performance computing and other fields.
  • CPU central processing unit
  • GPU graphics processing unit
  • AI artificial intelligence
  • the square root solution equation is used, and the initial approximation value is used as the input. After each iterative calculation, an inaccurate full-bit width square root value can be obtained. Through multiple rounds of iterations, the full-bit width calculation result that meets the high-precision requirements is obtained. In the existing floating point square root solution process, there are defects such as a large number of iterations and a slow convergence speed.
  • the present application provides a floating point number square root calculation method and a floating point number calculation module, which have short calculation delay and high throughput.
  • the present application provides a floating-point number square root calculation method, which can be executed or implemented by a processor, a calculator, a processing device or a computing device.
  • the processor can receive a floating-point number calculation instruction, which can carry a floating-point number to be calculated (Z).
  • the processor can obtain a target mantissa (X), and the target mantissa (X) includes the mantissa of a first floating-point number (W), and the first floating-point number (W) is a normalized floating-point number, and the value of the floating-point number to be calculated is the same as the value of the first floating-point number (W).
  • the mantissa and exponent of the floating-point number to be calculated (Z) can be different or the same as the mantissa and exponent of the first floating-point number (W), that is, the expression format of the floating-point number to be calculated (Z) can be different or the same as the expression format of the first floating-point number (W).
  • the format of the floating-point number (Z) to be calculated received by the processor is different from the format of the first floating-point number (W).
  • the processor can process the floating-point number (Z) to be calculated as the first floating-point number (W). In this process, the value of the received floating-point number is not changed, and only the format of the floating-point number (Z) to be calculated is changed.
  • the relationship between the target mantissa (X) and the mantissa of the first floating-point number (W) can be: if the exponent of the first floating-point number (W) is an even number, the target mantissa (X) is the same as the mantissa of the first floating-point number (W); if the exponent of the first floating-point number (W) is an odd number, the target mantissa (X) is Q times the mantissa of the first floating-point number (W), where Q is the base of the floating-point number, Q is a positive number, and Q is an even number.
  • the target mantissa (X) can be obtained by shifting the mantissa of the first floating-point number (W) left by one bit.
  • the processor may determine a first bit width part (f u ) of the square root of the target mantissa (X) according to all or part of the bit width of the target mantissa (X), wherein the first bit width part (f u ) includes the highest bit of the square root of the target mantissa (X).
  • the processor may calculate a second bit width part (f l ) of the square root of the target mantissa (X) based on a first relationship, the first bit width part (f u ) and all or part of the bit width of the target mantissa (X), wherein the first relationship characterizes a relationship between the first bit width part (f u ) of the square root of the target mantissa (X), the target mantissa (X), and the second bit width part (f l ) of the square root of the target mantissa (X).
  • the processor may determine the square root of the target mantissa (X) based on the first bit width part (f u ) and the second bit width part (f l ), and determine the fractional part of the square root of the target mantissa (X) as the mantissa of the square root of the floating-point number (Z) to be calculated.
  • the highest bit of the decimal part of the square root of the target mantissa (X) can be determined as the integer part of the mantissa of the square root of the floating-point number (Z), and the bit width of the decimal part of the square root of the target mantissa (X) excluding the highest bit is determined as the decimal part of the mantissa of the square root of the floating-point number (Z).
  • the processor calculates the mantissa part of the square root of the floating-point number (Z) to be calculated, which is also the mantissa of the square root of the first floating-point number (W), and can be achieved by determining the square root of the target mantissa (X).
  • the processor can determine the high-order part and the low-order part of the square root of the target mantissa (X), that is, the first bit width part ( fu ) and the second bit width part ( fl ) respectively.
  • the processor can use the determined first bit width part ( fu ) and the second bit width part ( fl ) to determine the square root of the target mantissa (X).
  • the processor does not need to iterate in the process of determining the square root of the target mantissa (X), so the calculation delay is short and has a high throughput.
  • the processor can determine in parallel
  • the processor may determine the first bit width part ( fu ) and the second bit width part ( fl ) in series. For example, after determining the first bit width part ( fu ) , the processor determines the second bit width part ( fl ).
  • the partial bit width of the mantissa of the floating point number when the partial bit width of the mantissa of the floating point number includes multiple bit widths, the multiple bit widths are continuous. That is, the partial bit width also refers to a partial continuous bit width.
  • the part of the floating point number can be a part of the mantissa of the floating point number, and when the part of the mantissa includes multiple bit widths, the multiple bit widths are continuous.
  • the first relationship conforms to the following relationship: Wherein, X is the target mantissa, f u is the first bit width part, and f l is the second bit width part.
  • the processor can implement the operation of determining the second bit width part (f l ) by using the first relationship through software or hardware. The embodiment of the present application does not make too many restrictions on this.
  • the second bit width portion (f l ) includes a partial bit width of the square root of the target mantissa (X) and includes the least significant bit of the square root of the target mantissa (X), wherein the sum of the bit width length of the first bit width portion (f u ) and the bit width length of the second bit width portion (f l ) is greater than or equal to the full bit width length of the square root of the target mantissa (X).
  • the first bit width part (f u ) may refer to the continuous bit width of the part including the highest bit of the square root of the target mantissa (X).
  • the second bit width part (f l ) may refer to the continuous bit width of the part including the lowest bit of the square root of the target mantissa (X).
  • the sum of the bit width of the first bit width part (f u ) and the bit width of the second bit width part (f l ) is greater than or equal to the full bit width of the square root of the target mantissa (X).
  • the floating-point number calculation method further includes: when the processor determines the first bit width part ( fu ) of the square root of the target mantissa (X) according to all or part of the bit width of the target mantissa (X), the coefficient of the preset first polynomial fitting equation can be determined based on the target first query parameter (r1) and the target second query parameter (r2), wherein the target first query parameter (r1) is the first part of the mantissa of the first floating-point number (W), and the target second query parameter (r2) is the partial bit width of the exponent of the first floating-point number (W), and includes the lowest bit width of the exponent of the first floating-point number (W).
  • the processor can calculate the first bit width part ( fu ) according to the coefficient of the first polynomial fitting equation and the second part of the mantissa of the first floating-point number (W), and the bit width corresponding to the second part of the mantissa of the first floating-point number (W) does not overlap with the bit width corresponding to the first part of the mantissa of the first floating-point number (W).
  • the part of the first floating point number (W) may refer to the partial bit width, partial bit width bits or partial bit width data of the first floating point number (W).
  • the target first query parameter (r1) may be the first part of the mantissa of the first floating point number (W).
  • the second part of the mantissa of the first floating point number (W) may be used to calculate the first bit width part ( fu ).
  • the second part of the mantissa of the first floating point number (W) is the partial bit width bits or partial bit width data of the first floating point number (W) except the first part.
  • the target second query parameter (r2) is the partial bit width of the exponent of the first floating point number (W), and the partial bit width of the exponent of the first floating point number (W) includes the lowest bit width of the exponent of the first floating point number (W). It can be seen that the target second query parameter (r2) can reflect the parity of the exponent of the first floating point number (W).
  • the processor determines the coefficients of the preset first polynomial fitting equation based on the target first query parameter (r1) and the target second query parameter (r2), if the target second query parameter (r2) is an odd number, the coefficients of the first polynomial fitting equation corresponding to the target first query parameter (r1) can be queried from the first odd number search subtable, wherein the first odd number search subtable includes the correspondence between the coefficients of the first polynomial fitting equation when the order of the first floating point number (W) is an odd number.
  • the coefficients of the first polynomial fitting equation corresponding to the target first query parameter (r1) can be queried from the first even number search subtable, wherein the first even number search subtable includes the correspondence between the coefficients of the first polynomial fitting equation when the order of the first floating point number (W) is an even number.
  • the processor can obtain or configure the first odd number search subtable and the first even number search subtable. Such a design can reduce the processing overhead of the processor.
  • the processor may obtain or configure a first polynomial lookup table, and the first polynomial lookup table may include a correspondence between multiple first query parameter combinations and multiple first fitting parameter combinations.
  • a first query parameter combination can be used as an index.
  • An index corresponds to a first fitting parameter combination
  • a first fitting parameter combination includes a set of coefficients of a first polynomial fitting equation.
  • the processor can use the target first query parameter (r1) and the target second query parameter (r2) as an index. From the first polynomial coefficient lookup table, find the first fitting parameter combination corresponding to the index. Thereby determining the coefficients of the first polynomial fitting equation corresponding to the target first query parameter (r1) and the target second query parameter (r2).
  • the processor may use the first floating point number (W) to calculate the reciprocal of the first bit width portion ( fu ).
  • W the first floating point number
  • the Newton-Raphson method, the Sweeney-Robertson-Tocher algorithm (SRT algorithm), etc. are used to calculate the reciprocal of the first bit width portion ( fu ).
  • SRT algorithm Sweeney-Robertson-Tocher algorithm
  • the present application also provides several design schemes for calculating the reciprocal of the first bit width portion ( fu ).
  • the processor may determine the preset The coefficients of the second polynomial fitting equation are obtained by calculating the coefficients of the second polynomial fitting equation, wherein the target third query parameter (h1) is the third part of the mantissa of the first floating-point number (W), and the target fourth query parameter (h2) is the partial bit width of the exponent of the first floating-point number (W), and includes the lowest bit width of the exponent of the first floating-point number (W).
  • the reciprocal of the first bit width part ( fu ) is determined, and the bit width corresponding to the third part of the mantissa of the first floating-point number (W) does not overlap with the bit width corresponding to the fourth part of the mantissa of the first floating-point number (W).
  • the processor can calculate the first bit width part ( fu ) and the second bit width part (f l ) in parallel.
  • the processor can use the reciprocal of the square root of the target mantissa (X) to approximate the reciprocal of the first bit width part ( fu ).
  • the processor queries the coefficient of the second polynomial fitting equation corresponding to the target third query parameter (h1) from the second odd number search subtable, wherein the second odd number search subtable includes the correspondence between the coefficients of the second polynomial fitting equation and the third query parameter when the exponent of the first floating point number (W) is an odd number.
  • the processor queries the coefficient of the second polynomial fitting equation corresponding to the target third query parameter (h1) from the second even number search subtable, wherein the second even number search subtable includes the correspondence between the coefficients of the second polynomial fitting equation and the third query parameter when the exponent of the first floating point number (W) is an even number.
  • the processor may obtain or configure a second polynomial lookup table, and the second polynomial lookup table may include a correspondence between multiple second query parameter combinations and multiple second fitting parameter combinations.
  • a second query parameter combination can be used as an index.
  • An index corresponds to a second fitting parameter combination
  • a second fitting parameter combination includes a set of coefficients of a second polynomial fitting equation.
  • the processor can use the target third query parameter (h1) and the target fourth query parameter (h2) as an index. From the second polynomial coefficient lookup table, find the second fitting parameter combination corresponding to the index. Thereby determining the coefficients of the second polynomial fitting equation corresponding to the target third query parameter (h1) and the target fourth query parameter (h2).
  • the processor may determine the coefficients of a preset third polynomial fitting equation based on a target fifth query parameter (g1), wherein the target fifth query parameter (g1) is the fifth part of the first bit width portion ( fu ).
  • the reciprocal of the first bit width portion ( fu ) is determined according to the coefficients of the third polynomial fitting equation and the sixth part of the first bit width portion ( fu ), wherein the bit width corresponding to the fifth part of the first bit width portion ( fu ) does not overlap with the bit width corresponding to the sixth part of the first bit width portion ( fu ).
  • the processor may acquire or configure a third polynomial lookup table, and the third polynomial lookup table may include a correspondence between multiple fifth query parameters and multiple third fitting parameter combinations.
  • the processor may use the target fifth query parameter (g1) as an index to search the third fitting parameter combination corresponding to the target fifth query parameter (g1) in the third polynomial lookup table. This achieves the determination of the coefficients of the third polynomial fitting equation corresponding to the target fifth query parameter (g1).
  • the processor may add the first bit width part ( fu ) and the second bit width part ( fl ), and determine the result of the addition as the square root of the target mantissa (X).
  • the processor may determine the decimal part of the square root of the target mantissa (X) as the mantissa of the square root of the first floating-point number (W).
  • the processor when the processor determines the square root of the target mantissa (X) based on the first bit width part ( fu ) and the second bit width part ( fl ), the processor may determine the square root of the target mantissa (X) based on a configured rounding mode.
  • the processor can determine two candidate results according to the first bit width part ( fu ) and the second bit width part ( fl ); calculate a first rounding determination parameter (ie) based on the first bit width part ( fu ), the second bit width part ( fl ), and a partial bit width of the target mantissa (X), wherein the first rounding determination parameter (ie) represents a deviation between a first value and the target mantissa (X), and the first value is the square of the square root of the target mantissa (X); and select one candidate result from the two candidate results according to a comparison result between the first rounding determination parameter (ie) and a preset value and determine it as the square root of the target mantissa (X).
  • the first rounding discrimination parameter (ie) may be a very small positive number or a very small negative number.
  • the processor may use the effective sign bit of the first rounding discrimination parameter (ie) and all bits after the effective sign bit to select the candidate result.
  • the processor may use the low-order part of (f u 2 +f l 2 +2 ⁇ f u ⁇ f l ) and the low-order part of the target mantissa (X) to calculate the first rounding discrimination parameter (ie), which may reduce circuit overhead and circuit chip area.
  • the processor may perform a round toward positive (RP) mode.
  • the processor may determine a first rounding discrimination parameter (ie) based on the first bit width portion (f u ) and the second bit width portion (f l ), and determine the first rounding discrimination parameter (ie).
  • the processor may determine a plurality of selected results based on the first bit width part ( fu ) and the second bit width part ( fl ), and the plurality of selected results may include a first selected result f1 and a second selected result f2.
  • f1 fu + fl
  • f2 f1+ulp.
  • Ulp represents the smallest valid number that can be expressed in the full bit width of the square root of the target mantissa (X).
  • the processor may select a selected result from the plurality of selected results according to the comparison result between the first rounding discrimination parameter (ie) and the preset value, and determine the selected selected result as the square root of the target mantissa (X).
  • the preset value may be configured as 0.
  • the processor may determine that the first selected result f1 is the square root of the target mantissa (X) according to the first rounding discrimination parameter (ie) being greater than or equal to 0.
  • the processor may determine that the second selected result f2 is the square root of the target mantissa (X) according to the first rounding discrimination parameter (ie) being less than 0.
  • the processor may perform a round toward zero (RZ) mode.
  • the processor may determine a first rounding discrimination parameter (ie) based on the first bit width portion ( fu ) and the second bit width portion ( fl ).
  • the processor may determine a plurality of candidate results based on the first bit width portion ( fu ) and the second bit width portion ( fl ), and the plurality of candidate results may include a first candidate result f1 and a third candidate result f3.
  • f1 fu + fl
  • f3 f1-ulp.
  • Ulp represents the smallest valid number that can be expressed in the full bit width of the square root of the target mantissa (X).
  • the processor may select a candidate result from the plurality of candidate results based on the comparison result of the first rounding discrimination parameter (ie) and the preset value, and determine the selected candidate result as the square root of the target mantissa (X).
  • the preset value may be configured as 0.
  • the processor may determine that the first candidate result f1 is the square root of the target mantissa (X) based on the first rounding discrimination parameter (ie) being less than or equal to 0.
  • the processor may determine that the third candidate result f3 is the square root of the target mantissa (X) according to the first rounding determination parameter (ie) being greater than 0.
  • a plurality of candidate results may be determined according to the first bit width part ( fu ) and the second bit width part ( fl ), the plurality of candidate results including a first candidate result, a second candidate result and a third candidate result, wherein the second candidate result is greater than the first candidate result, and the first candidate result is greater than the third candidate result; based on the first bit width part ( fu ), the second bit width part ( fl ), and a partial bit width of the target mantissa (X), a second rounding discrimination parameter (ien) is calculated, wherein the second rounding discrimination parameter (ien) represents a deviation between the square of a first distance and the square of a second distance, the first distance being the distance between the first candidate result and the real number of the square root of the target mantissa (X), and the second distance
  • the difference between the second candidate result and the first candidate result is less than or equal to a minimum precision unit.
  • the difference between the first candidate result and the third candidate result is less than or equal to a minimum precision unit.
  • the processor may perform a round-half (RH) mode.
  • the processor may perform a round-half (RH) mode based on the formula Determine the square root of the target mantissa (X), where else can refer to the case where iep ⁇ 0, or the case where ien ⁇ 0.
  • the processor may calculate the aforementioned first rounding discrimination parameter (ie), and use the first rounding discrimination parameter (ie) to calculate the second rounding discrimination parameter (ien) and the third rounding discrimination parameter (iep), which may reduce circuit overhead and optimize circuit chip area.
  • the embodiment of the present application also provides a floating-point number calculation module, which can be used in floating-point number calculation scenarios, such as calculating the square root of a floating-point number.
  • the floating-point number calculation module provided in the embodiment of the present application can be applied in a processor or a calculator to implement the function of the processor or the calculator to perform floating-point number calculation.
  • the floating-point number calculation module is used to receive a floating-point number calculation instruction, and the instruction carries a floating-point number (Z) to be calculated; obtain a target mantissa (X), and the target mantissa (X) includes the mantissa of a first floating-point number (W), and the first floating-point number (W) is a normalized floating-point number, and the value of the first floating-point number (W) is the same as the value of the floating-point number (Z) to be calculated;
  • the floating-point number calculation module includes: a high-bit calculation unit, which is used to determine the first bit width part ( fu ) of the square root of the target mantissa (X) according to all or part of the bit width of the target mantissa (X), and the first bit width part ( fu ) contains the most bit width of the square root of the target mantissa (X).
  • a low bit calculation unit used to calculate the second bit width part (f l ) of the square root of the target mantissa (X) based on a first relationship, the first bit width part (f u ) and all or part of the bit width of the target mantissa (X), wherein the first relationship represents the relationship between the first bit width part (f u ) of the square root of the target mantissa (X), the target mantissa (X) and the second bit width part (f l ) of the square root of the target mantissa (X); a precise rounding unit, used to determine the square root of the target mantissa (X) based on the first bit width part (f u ) and the second bit width part (f l ), and determine the decimal part of the square root of the target mantissa (X) as the mantissa of the square root of the floating-point number (Z) to be calculated.
  • the floating-point number calculation module calculates the mantissa of the square root of the first floating-point number (W), which can be achieved by determining the square root of the target mantissa (X).
  • the floating-point number calculation module can respectively determine the first bit width part ( fu ) and the second bit width part ( fl ) of the square root of the target mantissa (X), and determine the square root of the target mantissa (X) using the determined first bit width part ( fu ) and the second bit width part ( fl ).
  • the floating-point number calculation module does not need to iterate during the process of determining the square root of the target mantissa (X), so that the calculation delay is short and has a high throughput rate.
  • the high-order calculation unit and the low-order calculation unit can work in parallel, or the high-order calculation unit and the low-order calculation unit can work in series.
  • the target mantissa (X) is the same as the mantissa of the first floating-point number (W); if the exponent of the first floating-point number (W) is an odd number, the target mantissa (X) is Q times the mantissa of the first floating-point number (W), where Q is the base of the floating-point number, Q is a positive number, and Q is an even number.
  • the first relationship conforms to the following relationship: Wherein, X is the target mantissa, fu is the first bit width part, and fl is the second bit width part.
  • the second bit width portion (f l ) includes a partial bit width of the square root of the target mantissa (X) and includes the least significant bit of the square root of the target mantissa (X), wherein the sum of the bit width length of the first bit width portion (f u ) and the bit width length of the second bit width portion (f l ) is greater than or equal to the full bit width length of the square root of the target mantissa (X).
  • the high-order calculation unit is specifically used to: determine the coefficients of a preset first polynomial fitting equation based on a target first query parameter (r1) and a target second query parameter (r2), wherein the target first query parameter (r1) is the first part of the mantissa of the first floating-point number (W), and the target second query parameter (r2) is a partial bit width of the exponent of the first floating-point number (W), and includes the lowest bit width of the exponent of the first floating-point number (W); calculate the first bit width part (f u ) according to the coefficients of the first polynomial fitting equation and the second part of the mantissa of the first floating-point number (W), and the bit width corresponding to the second part of the mantissa of the first floating-point number (W) does not overlap with the bit width corresponding to the first part of the mantissa of the first floating-point number (W).
  • the high-order calculation unit may include a first table lookup circuit, a first square operation circuit, and a first polynomial summation circuit.
  • the first table lookup circuit may be coupled to a storage module.
  • the storage module or storage circuit is used to store a first odd number lookup sub-table and a first even number lookup sub-table, wherein the first odd number lookup sub-table includes a correspondence between a plurality of first query parameters and coefficients of a first polynomial fitting equation when the exponent of the first floating point number (W) is an odd number, and the first even number lookup sub-table includes a correspondence between a plurality of first query parameters and coefficients of a first polynomial fitting equation when the exponent of the first floating point number (W) is an even number.
  • the first table lookup circuit may query the coefficients of the first polynomial fitting equation corresponding to the target first query parameter (r1) from the first odd-numbered lookup subtable when the target second query parameter (r2) is an odd number.
  • the first table lookup circuit may query the coefficients of the first polynomial fitting equation corresponding to the target first query parameter (r1) from the first even-numbered lookup subtable when the target second query parameter (r2) is an even number.
  • the first square operation circuit may determine the square of the second part of the mantissa of the first floating point number (W) based on the second part of the mantissa of the first floating point number (W).
  • the first polynomial summation circuit may calculate the first bit width part (f u ) of the target mantissa (X) by using the coefficients of the first polynomial fitting equation queried by the first table lookup circuit, the second part of the mantissa of the first floating point number (W), and the square of the second part of the mantissa of the first floating point number ( W ).
  • the low-order calculation unit may use the first floating-point number (W) to calculate the reciprocal of the first bit width portion ( fu ).
  • W the first floating-point number
  • the Newton-Raphson method, the Sweeney-Robertson-Tocher algorithm (SRT algorithm), etc. are used to calculate the reciprocal of the first bit width portion ( fu ).
  • SRT algorithm Sweeney-Robertson-Tocher algorithm
  • the present application also provides several design schemes for calculating the reciprocal of the first bit width portion ( fu ).
  • the low-bit calculation unit may include a first high-bit reciprocal calculation circuit and a low-bit operation circuit.
  • the first high-bit reciprocal calculation circuit may determine the coefficients of a preset second polynomial fitting equation based on a target third query parameter (h1) and a target fourth query parameter (h2), wherein the target third query parameter (h1) is the third part of the mantissa of the first floating-point number (W), and the target fourth query parameter (h2) is a partial bit width of the exponent of the first floating-point number (W), and includes the lowest bit width of the exponent of the first floating-point number (W).
  • the reciprocal of the first bit width part (f u ) is determined, and the bit width corresponding to the third part of the mantissa of the first floating-point number (W) is the same as the bit width of the first floating-point number (W).
  • the bit width corresponding to the fourth part of the mantissa does not overlap.
  • the first high-order reciprocal calculation circuit can calculate the first bit width part ( fu ) and the second bit width part (f l ) in parallel.
  • the first high-order reciprocal calculation circuit can use the reciprocal of the square root of the target mantissa (X) to approximate the reciprocal of the first bit width part ( fu ).
  • the low-order operation circuit is used to determine the second bit width part (f l ) by using the relationship between the first bit width part ( fu ), the reciprocal of the first bit width part ( fu ), and the target mantissa (X).
  • the process of calculating the inverse of the first bit width part ( fu ) by the low-order calculation unit can be parallel to the process of calculating the first bit width part ( fu ) by the high-order calculation unit, so that the high-order calculation unit and the low-order calculation unit can work in parallel.
  • the first high-order reciprocal calculation circuit queries the coefficients of the second polynomial fitting equation corresponding to the target third query parameter (h1) from the second odd number search subtable, wherein the second odd number search subtable includes the corresponding relationship between the coefficients of the second polynomial fitting equation and the third query parameters when the exponent of the first floating point number (W) is an odd number.
  • the first high-order reciprocal calculation circuit queries the coefficients of the second polynomial fitting equation corresponding to the target third query parameter (h1) from the second even number search subtable, wherein the second even number search subtable includes the corresponding relationship between the coefficients of the second polynomial fitting equation and the third query parameters when the exponent of the first floating point number (W) is an even number.
  • the first high-order reciprocal calculation circuit can obtain or configure a second polynomial lookup table, and the second polynomial lookup table can include a correspondence between multiple second query parameter combinations and multiple second fitting parameter combinations.
  • a second query parameter combination can be used as an index.
  • An index corresponds to a second fitting parameter combination
  • a second fitting parameter combination includes a set of coefficients of a second polynomial fitting equation.
  • the first high-order reciprocal calculation circuit can use the target third query parameter (h1) and the target fourth query parameter (h2) as an index. From the second polynomial coefficient lookup table, find the second fitting parameter combination corresponding to the index. Thereby determining the coefficients of the second polynomial fitting equation corresponding to the target third query parameter (h1) and the target fourth query parameter (h2).
  • the execution process of the first high-order inverse calculation circuit can be parallel to the high-order calculation unit, so that the low-order calculation unit can be parallel to the high-order calculation unit.
  • the low-bit calculation unit may include a second high-bit reciprocal calculation circuit and a low-bit operation circuit.
  • the second high-bit reciprocal calculation circuit may determine the coefficient of a preset third polynomial fitting equation based on the target fifth query parameter (g1), wherein the target fifth query parameter (g1) is the fifth part of the first bit width part ( fu ).
  • the reciprocal of the first bit width part ( fu ) is determined, and the bit width corresponding to the fifth part of the first bit width part ( fu ) does not overlap with the bit width corresponding to the sixth part of the first bit width part ( fu ).
  • the low-bit operation circuit is used to determine the second bit width part (f l ) by using the relationship between the first bit width part ( fu ), the reciprocal of the first bit width part (fu), and the target mantissa (X).
  • the second high-order reciprocal calculation circuit can obtain or configure a third polynomial lookup table, and the third polynomial lookup table can include a correspondence between multiple fifth query parameters and multiple third fitting parameter combinations.
  • the processor can use the target fifth query parameter (g1) as an index to find the third fitting parameter combination corresponding to the target fifth query parameter (g1) in the third polynomial lookup table.
  • the second high-order reciprocal calculation circuit determines the coefficient of the third polynomial fitting equation corresponding to the target fifth query parameter (g1).
  • the second high-order inverse calculation circuit is in a serial relationship with the high-order calculation unit, so the low-order calculation unit can be connected in series with the high-order calculation unit.
  • the precise rounding unit may perform a summation process on the first bit width portion and the second bit width portion, and a result of the summation process is a square root of the target mantissa (X).
  • the precise rounding unit may determine two candidate results according to the first bit width part ( fu ) and the second bit width part ( fl ); calculate a first rounding determination parameter (ie) based on the first bit width part ( fu ), the second bit width part ( fl ), and a partial bit width of the target mantissa (X), wherein the first rounding determination parameter (ie) represents a deviation between a first value and the target mantissa (X), and the first value is the square of the square root of the target mantissa (X); and select one candidate result from the two candidate results according to a comparison result between the first rounding determination parameter (ie) and a preset value and determine it as the square root of the target mantissa (X).
  • the first rounding determination parameter (ie) can be a very small positive number or a very small negative number.
  • the precise rounding unit can use the effective sign bit of the first rounding determination parameter (ie) and all bits after the effective sign bit to select the candidate result.
  • the precise rounding unit can use the low-order part of (f u 2 +f l 2 +2 ⁇ f u ⁇ f l ) and the low-order part of the target mantissa (X) to calculate the first rounding result.
  • the discrimination parameter (ie) By inputting the discrimination parameter (ie), the circuit overhead can be reduced and the circuit area occupied can be reduced.
  • the precise rounding unit can perform a rounding to positive value (RP) mode.
  • the precise rounding unit can determine a first rounding discrimination parameter (ie) based on the first bit width part ( fu ) and the second bit width part ( fl ), and determine the first rounding discrimination parameter (ie).
  • the precise rounding unit can determine multiple candidate results based on the first bit width part ( fu ) and the second bit width part ( fl ), and the multiple candidate results can include a first candidate result f1 and a second candidate result f2.
  • f1 fu + fl
  • f2 f1+ulp.
  • Ulp represents the smallest valid number that can be expressed in the full bit width of the square root of the target mantissa (X).
  • the precise rounding unit can select a candidate result from the multiple candidate results according to the comparison result of the first rounding discrimination parameter (ie) and the preset value, and determine the selected candidate result as the square root of the target mantissa (X).
  • the preset value can be configured as 0.
  • the precise rounding unit may determine that the first candidate result f1 is the square root of the target mantissa (X) according to the first rounding determination parameter (ie) being greater than or equal to 0.
  • the precise rounding unit may determine that the second candidate result f2 is the square root of the target mantissa (X) according to the first rounding determination parameter (ie) being less than 0.
  • the precise rounding unit may perform a rounding to zero (RZ) mode.
  • the precise rounding unit may determine a first rounding discrimination parameter (ie) based on the first bit width portion ( fu ) and the second bit width portion ( fl ).
  • the precise rounding unit may determine a plurality of candidate results based on the first bit width portion ( fu ) and the second bit width portion ( fl ), and the plurality of candidate results may include a first candidate result f1 and a third candidate result f3.
  • f1 fu + fl
  • f3 f1-ulp.
  • Ulp represents the smallest valid number that can be expressed in the full bit width of the square root of the target mantissa (X).
  • the precise rounding unit may select a candidate result from the plurality of candidate results based on the comparison result of the first rounding discrimination parameter (ie) and the preset value, and determine the selected candidate result as the square root of the target mantissa (X).
  • the preset value may be configured as 0.
  • the precise rounding unit can determine that the first candidate result f1 is the square root of the target mantissa (X) according to the first rounding determination parameter (ie) being less than or equal to 0.
  • the precise rounding unit can determine that the third candidate result f3 is the square root of the target mantissa (X) according to the first rounding determination parameter (ie) being greater than 0.
  • a plurality of candidate results may be determined according to the first bit width part ( fu ) and the second bit width part ( fl ), the plurality of candidate results including a first candidate result, a second candidate result and a third candidate result, wherein the second candidate result is greater than the first candidate result, and the first candidate result is greater than the third candidate result; based on the first bit width part ( fu ), the second bit width part ( fl ), and a partial bit width of the target mantissa (X), a second rounding discrimination parameter (ien) is calculated, wherein the second rounding discrimination parameter (ien) represents the deviation between the square of a first distance and the square of a second distance, the first distance being the distance between the first candidate result and the real number of the square root of the target mantissa (X), and the
  • the difference between the second candidate result and the first candidate result is less than or equal to a minimum precision unit.
  • the difference between the first candidate result and the third candidate result is less than or equal to a minimum precision unit.
  • the precise rounding unit may perform a rounding to nearest value (RH) mode.
  • the precise rounding unit may perform a rounding to nearest value (RH) mode based on the formula Determine the square root of the target mantissa (X), where else can refer to the case where iep ⁇ 0, or the case where ien ⁇ 0.
  • the precise rounding unit can calculate the aforementioned first rounding discrimination parameter (ie), and use the first rounding discrimination parameter (ie) to calculate the second rounding discrimination parameter (ien) and the third rounding discrimination parameter (iep), which can reduce circuit overhead and optimize circuit chip area.
  • the precise rounding unit pre-configures multiple rounding modes.
  • the precise rounding unit can obtain rounding mode configuration parameters; determine multiple results to be selected according to the rounding mode corresponding to the rounding mode configuration parameters, the first bit width part (f u ), and the second bit width part (f l ); and calculate rounding discrimination parameters based on the first bit width part (f u ), the second bit width part (f l ), and the target mantissa (X) according to the rounding mode corresponding to the rounding mode configuration parameters; and determine the rounding discrimination parameters based on the rounding discrimination parameters and the preset rounding discrimination parameters.
  • the comparison result of the numerical values selects a candidate result from the multiple candidate results as the square root of the target mantissa (X).
  • the precise rounding unit can execute any rounding method provided in the above embodiments, which will not be described in detail here.
  • an embodiment of the present application further provides a processing device, which may include a first register, a second register, and a floating-point number calculation module as in the second aspect and any one of its designs.
  • the first register stores a floating-point number to be calculated.
  • the floating-point number calculation module is used to obtain the floating-point number to be calculated from the first register; and calculate the mantissa of the square root of the floating-point number to be calculated.
  • the second register is used to store the mantissa of the square root of the floating-point number to be calculated.
  • the processing device also includes a third register; the third register stores rounding mode configuration parameters; the floating-point number calculation module is also used to obtain the rounding mode configuration parameters and execute the rounding mode corresponding to the rounding mode configuration parameters.
  • the processor or calculator includes hardware structures and/or software modules corresponding to the execution of each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application scenario and design constraints of the technical solution.
  • Fig. 1 is a schematic diagram of a floating point number format
  • FIG2A is a schematic diagram of a five-stage pipeline
  • FIG2B is a schematic diagram of the structure of a processing device provided in an embodiment of the present application.
  • FIG3 is a floating point number square root calculation method provided by an embodiment of the present application.
  • FIG4 is a schematic diagram of floating point number processing
  • FIG5 is a schematic diagram of the structure of a floating point calculation module
  • FIG6 is a schematic diagram of a high-order part and a low-order part of a mantissa
  • FIG7 is a schematic diagram showing the relationship between a storage module and a fitting parameter
  • FIG8 is a schematic diagram of the summing process
  • FIG9 is a schematic diagram of the relationship between multiple selection results
  • FIG10 is a schematic diagram of a specific structure of a floating point calculation module
  • FIG11 is a schematic diagram of a specific structure of a precise rounding unit
  • FIG12 is a schematic diagram of the specific structure of another precise rounding unit.
  • FIG13 is a schematic diagram of the structure of another floating point calculation module
  • FIG14 is a schematic diagram of the specific structure of another floating point calculation module
  • FIG15 is a schematic diagram of the structure of another floating point calculation module
  • FIG16 is a schematic diagram of the structure of yet another floating-point number calculation module.
  • any real number that is approximated is called a floating-point number.
  • Floating-point numbers are usually represented by a combination of a mantissa and an exponent offset value (also called an exponent).
  • an exponent offset value also called an exponent
  • a floating-point number can be identified by the product of the mantissa and an integer exponent of a base.
  • the Institute of Electrical and Electronics Engineers (IEEE) 754 standard defines floating-point operation standards and expressions, and is also the most widely supported and used binary floating-point arithmetic standard. Among them, the standard IEEE 754 stipulates that a floating-point number is composed of a sign bit, an exponent offset value (exponent bits) and a mantissa bit. In the standard IEEE 754, floating-point numbers have multiple types, such as single precision (SP) floating-point numbers, double precision (DP) floating-point numbers, extended single precision floating-point numbers, extended double precision floating-point numbers, etc.
  • SP single precision
  • DP double precision floating-point numbers
  • extended single precision floating-point numbers extended double precision floating-point numbers
  • the full bit width of a floating-point number may include the full bit width of the sign bit, the exponent offset value, and the full bit width of the mantissa value.
  • the bit width of a single precision floating-point number is 32 bits (bit)
  • the bit width of a double precision floating-point number is 64 bits
  • the bit width of an extended single precision floating-point number is 43 bits
  • the bit width of an extended double precision floating-point number is 79 bits.
  • a single-precision floating point number is shown in FIG1 , and the full bit width of the single-precision floating point number is 32 bits, wherein bits 0 to 22 represent the mantissa value, bits 23 to 30 represent the exponent offset value, and bit 31 represents the sign bit.
  • Another method is to solve the square root of the floating point number by bit iteration.
  • the result of each iterative calculation is a non-full-bit-width result with a fixed bit accuracy.
  • the bit-by-bit iterative method of solving the square root of floating point numbers has a low throughput and is difficult to implement pipeline processing.
  • an embodiment of the present application provides a floating point square root calculation method and a floating point calculation module, which have short calculation delay and high throughput.
  • the present application provides a floating point number square root calculation method which can be implemented by a floating point number calculation module.
  • the floating point number calculation module can be applied to a processor (or a calculator). For example, it can be applied to a CPU, a GPU or a digital signal processor (DSP).
  • DSP digital signal processor
  • the CPU usually uses a five-stage pipeline 100 to perform computing tasks.
  • the five-stage pipeline may include instruction fetch 101, decoding 102, execution 103, memory access 104 and write back 105, a total of five stages. Instructions can be extracted in the instruction fetch 101 stage.
  • the decoding 102 stage can translate the extracted instructions into instructions and parameters of identifiable operations.
  • the execution 103 stage may refer to the logical operation and mathematical operation stage. In some scenarios, when the floating-point square root calculation method provided by the embodiment of the present application is implemented by the CPU, the CPU may implement the floating-point square root calculation method provided by the embodiment of the present application in the execution 103 stage.
  • FIG2B shows a structural schematic diagram of a processing device in an embodiment of the present application, and the processing device may be implemented as a CPU, a GPU, an AI processor, etc.
  • the processing device 200 may include a register group 201 and a floating point number calculation module 202 provided in the present application.
  • the register group 201 may include multiple registers. A first register among the multiple registers may store a floating point number to be calculated.
  • the floating point number calculation module 202 may obtain a floating point number to be calculated from the first register, and calculate the square root of the floating point number to be calculated.
  • a second register among the multiple registers may store the square root of the floating point number to be calculated.
  • the plurality of registers may include a third register.
  • the third register may store a rounding configuration mode.
  • the floating point number calculation module 202 may obtain a rounding configuration parameter from the third register, execute a rounding configuration parameter corresponding to the rounding configuration parameter, and obtain a square root of the floating point number to be calculated.
  • the processing device 200 may include a control module 203.
  • the control module 203 may perform the aforementioned processes such as instruction fetch 101 and decoding 102.
  • the processing device 200 may include a storage module 204, and the storage module 204 may include a cache for storing data.
  • the processing device 200 may include an integer calculation module 205 for processing integer operations.
  • the processing device 200 may also include other operation modules 206, which may perform logical operations, such as logical shift operations. In some scenarios, other operation modules 206 may be implemented as image-specific calculation modules. Or other operation modules 206 may perform multiplication and addition operations of large arrays. This application does not make specific restrictions on this.
  • the processing device 200 may also include an I/O interface 207.
  • FIG3 shows a floating point square root calculation method provided in an embodiment of the present application, which can be executed by a processor (or a calculator).
  • the floating point square root calculation method provided in an embodiment of the present application can include the following steps:
  • Step S100 receiving a floating point number calculation instruction, wherein the instruction carries a floating point number to be calculated.
  • Step S101 obtaining a target mantissa, the target mantissa comprising a mantissa of a first floating-point number, the first floating-point number being a normalized floating-point number, the value of the first floating-point number being the same as the value of the floating-point number to be calculated.
  • the floating-point number to be calculated is different from the first floating-point number only in the form of expression, that is, the mantissa and the exponent of the floating-point number are different. It can be seen that calculating the square root of the floating-point number Z is also calculating the square root of the floating-point number W.
  • the processor can obtain or receive the floating-point number Z (that is, the aforementioned floating-point number to be calculated) of the square root result to be calculated, and the floating-point number Z can be a normalized floating-point number or a non-normalized floating-point number.
  • the target mantissa is recorded as the target mantissa X
  • the first floating-point number is recorded as the floating-point number W.
  • the full bit width of a "mantissa" can include an integer part and a decimal part.
  • the integer part and the decimal part of the mantissa are arranged in sequence. Among them, in the integer part, it is arranged from the highest bit to the lowest bit; in the decimal part, it is arranged from the highest bit to the lowest bit.
  • the processor can normalize (or normalize) the floating point number Z to obtain a first floating point number, that is, a floating point number W.
  • the floating point number W and the floating point number Z have the same numerical value, but are expressed in different forms.
  • a floating point number is normalized so that the integer part of the mantissa of the floating point number is not 0.
  • the mantissa of the first floating point number W is recorded as mantissa M1, and the exponent is recorded as E0.
  • the sign of the square root of a floating-point number W is the same as the floating-point number.
  • the calculation of the square root of a floating-point number W includes two parts, namely, the calculation of the mantissa of the square root of the floating-point number W and the calculation of the exponent of the square root of the floating-point number W.
  • the relationship between the exponent of the square root of the floating-point number W and the exponent EW of the floating-point number W is:
  • the exponent offset value of a floating-point number is usually used to indicate the exponent of the floating-point number.
  • the exponent offset is related to the precision type of the floating point number W.
  • the critical path for solving the square root of the floating point number W is to solve the square root of the mantissa of the floating point number W.
  • the relationship between the target mantissa X and the floating-point number W obtained by the processor is that the target mantissa X includes the mantissa of the floating-point number W, and the target mantissa X may include all data of the mantissa of the floating-point number W.
  • the target mantissa X is the same as the mantissa of the first floating-point number W; if the exponent of the first floating-point number W is an odd number, the target mantissa X is Q times the mantissa of the first floating-point number W, where Q is the base of the floating-point number, Q is a positive number, and Q is an even number.
  • the target mantissa X can be obtained by shifting the mantissa of the first floating-point number W left by one bit.
  • the following example uses the floating point number base 2 as an example. If the exponent EW of the floating point number W is an even number and a positive number, then the exponent of the square root of the floating point number W is EW.
  • the target mantissa X is the same as the mantissa of the floating point number W. That is, the integer part of the target mantissa X is the same as the integer part of the mantissa of the floating point number W, and the decimal part of the target mantissa X is the same as the decimal part of the floating point number W.
  • the decimal part of the square root of the target mantissa X is the mantissa of the square root of the floating point number W.
  • the exponent EW of the floating point number W is an odd number and positive, then the exponent of the square root of the floating point number W is (EW-1).
  • the target mantissa X is twice the mantissa of the floating point number W, where 2 is the base of the floating point number.
  • the target mantissa X can be obtained by shifting the mantissa of the floating point number W to the left by one position.
  • the decimal part of the square root of the target mantissa X is the mantissa of the square root of the floating point number W.
  • the processor may perform one or more of exponent parity determination processing, exponent conversion processing, and mantissa conversion processing on the floating-point number W to obtain the target mantissa X.
  • the processor may perform normalization processing on the received floating-point number to be calculated, that is, the floating-point number Z, and convert it into the floating-point number W.
  • the exponent of the floating-point number W is recorded as the exponent EW and the mantissa M1.
  • the base of the floating-point number is 2 when the processor performs floating-point number calculation processing.
  • the processor can perform an exponent parity determination process on the exponent EW of the floating-point number W.
  • the processor can perform a first mantissa transformation process on the mantissa M1, such as multiplying the mantissa M1 by Q to obtain the mantissa M2.
  • Q is 2.
  • the mantissa M1 includes an integer part and a decimal part. The black box shows the bits of the integer part, and the white box shows the bits of the decimal part.
  • the integer part of the mantissa M1 of the floating-point number W is 1 bit
  • the integer part of the mantissa M1 is the bit shown in the s1th position
  • the decimal part of the mantissa M1 is the bits shown in the 0th to vth positions.
  • the numerical range of the mantissa M1 is [1, 2).
  • the processor multiplies the mantissa M1 by 2, that is, shifts each bit of the mantissa M1 to the left by one bit, to obtain the mantissa M2.
  • the integer part of the mantissa M2 is 2 bits
  • the integer part of the mantissa M2 is the bits shown in the s1th and s2th bits
  • the decimal part of the mantissa M2 is the bits shown in the 0th to vth bits.
  • the value range of the mantissa M2 is [2, 4). It can be understood that the integer part of the mantissa M2 is one bit more than the integer part of the mantissa M1 to make up for the default integer bits in the IEEE754 format.
  • the target mantissa X is twice the mantissa M1 of the floating-point number W, that is, the target mantissa X is the same as the mantissa M2.
  • the integer part of the target mantissa X can include 2 bits.
  • the numerical range of the target mantissa X is [2, 4).
  • the target mantissa X is the same as the mantissa M1 of the floating-point number W.
  • the integer part of the mantissa M1 of the floating-point number W is 1 bit
  • the integer part of the mantissa M1 is the bit shown in the s1th position
  • the decimal part of the mantissa M1 is the bit shown in the 0th to vth positions.
  • the numerical range of the mantissa M1 is [1, 2).
  • the mantissa of the floating-point number W is the same as the mantissa M1 after the normalization of the floating-point number Z, then the numerical range of the mantissa of the floating-point number W is [1, 2).
  • the numerical range of the target mantissa X is [1, 2).
  • the integer part of the target mantissa X can be added with an extra bit compared to the integer part of the mantissa M1, which is configured as 0 to fill the default integer bit of the IEEE754 format. For example, the s2th bit is added, and the value is configured as 0. Such an operation does not change the numerical range of the target mantissa X.
  • the target mantissa X includes the mantissa M1 of the floating point number W.
  • the target mantissa X can be any value in the preset set.
  • the preset set can be [1, 4).
  • the minimum value of the preset set can be 1, and the maximum value of the preset set can be close to 4, but the preset set does not include 4.
  • the decimal part of the square root of the target mantissa X is the mantissa in the square root of the floating point number W.
  • the square root of the target mantissa X that is, Denoted as f.
  • f can be a fixed-point number, including an integer part and a decimal part.
  • the processor can determine the square root of the floating-point number W by determining the decimal part of the square root of the target mantissa X, that is, the decimal part of f. The decimal part of .
  • FIG. 3 shows that step S102 and step S103 are in a parallel relationship, it does not mean that the processor can only perform the operations in step S102 and step S103 in parallel.
  • the processor can perform the operations in step S102 and step S103 serially.
  • the processor can perform the operations in step S102 and step S103 in parallel.
  • parallel execution may include but is not limited to starting execution at the same time and starting execution synchronously. Within a preset time length, starting to execute the operations in step S102 and the operations in step S103 synchronously or asynchronously can also be regarded as executing the operations in step S102 and step S103 in parallel.
  • Step S102 determining a first bit width portion of the square root of the target mantissa according to all or part of the bit width of the target mantissa, wherein the first bit width portion includes the highest bit of the square root of the target mantissa.
  • the square root of the target mantissa X is recorded as f, which can represent the square root of the target mantissa X determined by the processor.
  • the first bit width part of the square root f of the target mantissa X includes the highest bit of the square root f of the target mantissa X. Since the square root f of the target mantissa X is a fixed-point number, it includes an integer part and a decimal part. The integer part and the decimal part can be arranged in sequence, and the highest bit of the square root f of the target mantissa X is also the highest bit of the integer part of the square root f of the target mantissa X.
  • the numerical range of the target mantissa X is [1, 4)
  • the numerical range of the square root f of the target mantissa X is [1, 2)
  • the integer part of the square root f of the target mantissa X is less than 2
  • the full bit width of the integer part of the square root f of the target mantissa X can be 1.
  • the full bit width of the square root f of the target mantissa X can be understood as the full bit width of the valid data part of the square root f.
  • the highest bit of the integer part of the square root f of the target mantissa X can also be understood as the highest bit of the valid part of the integer part of the square root f of the target mantissa X.
  • the first bit width part can be called the high-order part of the square root f of the target mantissa X, and can also be referred to as the high-order part f u of f.
  • the high-order part f u of f can refer to the high m bits of f
  • the high m bits can refer to the first m bits from the highest bit to the lowest bit in the full bit width of f, and can also be understood as the highest m bits in the full bit width of f.
  • bit width of the first bit width part of the square root of the target mantissa X is m.
  • the high-order part f u of f can be understood as the approximate calculation result of the square root of the target mantissa X.
  • m is a positive integer, and m is a value less than or equal to the full bit width of f.
  • the processor may be configured with a preset correspondence between all or part of the mantissa bit width and the first bit width portion of the square root of the mantissa.
  • the processor may use the first bit width portion corresponding to all or part of the bit width of the target mantissa X as the first bit width portion of the square root f of the target mantissa X, i.e., the high-order portion f u of f, based on all or part of the bit width of the target mantissa X and the preset correspondence between all or part of the bit width of the mantissa and the first bit width portion.
  • the preset correspondence between all or part of the bit width of the mantissa and the first bit width portion of the square root of the mantissa usually requires a large amount of storage resources, and the processor query speed is slow.
  • the processor may determine the high-order part fu of the square root f of the target mantissa X according to the whole or part of the bit width of the target mantissa X based on a polynomial approximation. Since the target mantissa X includes the whole bit width of the mantissa M1 of the floating-point number W, the processor may determine the high-order part fu of the square root f of the target mantissa X according to the whole bit width or part of the bit width of the mantissa M1 of the floating-point number W.
  • the processor may determine the target first query parameter r1 based on the mantissa M1 of the floating point number W, and the processor may determine the target second query parameter r2 based on the exponent EW of the floating point number W.
  • the target first query parameter may be the first part (the first part bit width) of the decimal part of the mantissa M1 of the floating point number W.
  • the target first query parameter may be the high nr1 bit (or low nr1 bit) of the decimal part of the mantissa M1 of the floating point number W, nr1 being a positive integer, and nr1 being less than or equal to the full bit width of the decimal part of the mantissa M1 of the floating point number W.
  • the target second query parameter is a partial bit width of the exponent EW of the floating point number W, and includes the lowest bit width of the exponent EW of the floating point number W.
  • the target second query parameter r2 may be the low nr2 bit of the exponent EW of the floating point number W, nr2 being a positive integer, and nr2 being less than or equal to the full bit width of the exponent EW, and it can be seen that the low nr2 bit of the exponent EW includes the lowest bit of the exponent EW.
  • the target second query parameter r2 may be data of the lower 1 bit of the exponent EW of the floating point number W, and the data may reflect whether the exponent EW is an odd number or an even number.
  • the processor may determine coefficients of a first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2 based on the target first query parameter r1 and the target second query parameter r2.
  • the coefficients of the first polynomial fitting equation may include a first fitting parameter a1, a second fitting parameter b1, and a third fitting parameter c1.
  • the processor may calculate the high-order part fu of f according to the coefficients of the first polynomial fitting equation and all or part of the bit width of the decimal part of the mantissa M1 of the floating-point number W.
  • X1 is the second part (second part bit width) of the decimal part of the mantissa M1 of the floating-point number W, and the bit width corresponding to the second part in the decimal part of the mantissa M1 of the floating-point number W does not overlap with the bit width corresponding to the first part in the decimal part of the mantissa M1 of the floating-point number W.
  • X1 is the high t1 bits of the other bit widths of the mantissa M1 of the floating-point number W except the bit width of the aforementioned first part, and t1 is a positive integer.
  • the processor uses the first part of the decimal part of the mantissa M1 of the floating-point number W to determine the coefficients of the polynomial fitting equation, and uses the second part of the decimal part of the mantissa M1 of the floating-point number W to participate in the calculation of the polynomial fitting equation, and determines the approximate solution of the square root of the target mantissa X, that is, the high-order part fu of f.
  • the processor may acquire or be configured with a first polynomial coefficient lookup table.
  • the first polynomial coefficient lookup table may include a correspondence between a plurality of first fitting parameter combinations and a plurality of first query parameter combinations.
  • Each first fitting parameter combination may include a first fitting parameter a1, a second fitting parameter b1, and a third fitting parameter c1.
  • the processor may determine the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2 in a manner including but not limited to any of the following examples A1 and A2.
  • the first polynomial coefficient lookup table may include a first odd number lookup sub-table and a first even number lookup sub-table.
  • the first odd number lookup sub-table represents a first fitting parameter combination corresponding to the first query parameter when the second query parameter is an odd number.
  • the first even number lookup sub-table includes a first fitting parameter combination corresponding to the first query parameter when the second query parameter is an even number.
  • the processor may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first even number search subtable based on the target second query parameter r2 being an even number.
  • the processor may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first odd number search subtable based on the target second query parameter r2 being an odd number.
  • the processor is implemented to search for the first fitting parameter combination corresponding to the target first query parameter combination from the first polynomial coefficient search table, thereby determining the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2.
  • the processor can use the target second query parameter r2 as the first index to determine the first even search subtable or the first odd search subtable, and can use the target second query parameter r1 as the second index to search for the corresponding first fitting parameter combination from the determined subtable.
  • the first polynomial coefficient lookup table may include a correspondence between a plurality of first fitting parameter combinations and a plurality of first query parameter combinations, wherein a first query parameter combination may be used as an index.
  • An index corresponds to a first fitting parameter combination.
  • the processor may use the target first query parameter r1 and the target second query parameter r2 as an index, and search the first fitting parameter combination corresponding to the index from the first polynomial coefficient lookup table. This achieves the determination of the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2.
  • Step S103 based on a first relationship, the first bit width part and all or part of the bit width of the target mantissa, calculate the second bit width part of the square root of the target mantissa, wherein the first relationship characterizes the relationship between the first bit width part of the square root of the target mantissa, the target mantissa, and the second bit width part of the square root of the target mantissa.
  • the second bit width part may include the lowest bit of the square root mantissa of the target mantissa X, and the sum of the number of bits of the second bit width part and the number of bits of the first bit width part is greater than or equal to the number of bits of the square root of the target mantissa X, wherein the sum of the bit width of the first bit width part and the bit width of the second bit width part is greater than or equal to the full bit width of the square root of the target mantissa X.
  • the second bit width part of the square root of the target mantissa X may also be referred to as the low bit part fl of the square root of the target mantissa X.
  • the low bit part fl of f may refer to the low n bits of f
  • the low n bits may refer to the last n bits from the highest bit to the lowest bit in the full bit width of f.
  • the bit width of the second bit width part of the square root of the target mantissa X is n.
  • n is a positive integer
  • n is a value less than the full bit width of f.
  • the sum of m and n is greater than or equal to the full bit width of f.
  • the relationship between the low-order part f l , the high-order part f u and the target mantissa X is:
  • the calculation process of the low-order part f l of f is simplified and solved using known quantities and a limited number of variables.
  • the first relationship in the embodiment of the present application that is, the first relationship between the low-order part f l of f, the high-order part f u of f and the target mantissa X, can be configured as
  • the processor may calculate the low part fl of f based on the first relationship among the low part fl of f, the high part fu of f and the target mantissa X, the target mantissa X in step S101 and the high part fu of f determined in step S102 .
  • the processor may retain the upper n bits as the lower part of f, f l . That is, the upper n+1 bits of the bit width of f l calculated by the processor using the first relationship are rounded off. For example, the processor may The high n+1 bits of are added to "1", and the high n bits of the result of the addition are retained as f The low part of fl .
  • the processor determines the reciprocal of the high-order part of f u
  • the processor may use any operation including but not limited to the following method 1 and method 2 to determine the reciprocal of the high-order part f u of f
  • the processor can determine the high-order part fu of f and the reciprocal of the high-order part fu of f in parallel.
  • the processor can determine the high-order part f u of f.
  • the processor can determine the reciprocal of the high-order part f u of the square root f of the target mantissa X based on the polynomial approximation method according to all or part of the bit width of the target mantissa X.
  • the processor may determine the target third query parameter h1 based on the mantissa M1 of the floating point number W, and the processor may determine the target fourth query parameter h2 based on the exponent EW of the floating point number W.
  • the target third query parameter is the third part (the third part bit width) of the mantissa M1 of the floating point number W.
  • the target third query parameter h1 may refer to the high nh1 bits (or low nh1 bits) of the decimal part of the mantissa M1 of the floating point number W, nh1 is a positive integer, and nh1 is less than or equal to the full bit width of the decimal part of the mantissa M1 of the floating point number W.
  • the target fourth query parameter h2 is a partial bit width of the exponent EW of the floating point number W, and includes the lowest bit width of the exponent EW of the floating point number W.
  • the target fourth query parameter may refer to the high nh2 bits (or low nh2 bits) of the exponent EW of the floating point number W, h2 is a positive integer, and nh2 is less than or equal to the full bit width of the exponent EW of the floating point number W.
  • the target fourth query parameter h2 may be data of the lower 1 bit of the exponent EW of the floating point number W, and the data may reflect whether the exponent EW is an odd number or an even number.
  • the processor can determine the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2 based on the target third query parameter h1 and the target fourth query parameter h2.
  • the coefficients of the second polynomial fitting equation may include the fourth fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2.
  • the processor can calculate the reciprocal of the high-order part f u according to the coefficients of the second polynomial fitting equation and all or part of the bit width of the decimal part of the floating-point number W.
  • the processor can calculate the reciprocal of the high-order part f u of f based on the coefficients of the second polynomial fitting equation and the fourth part (the fourth part bit width) of the decimal part of the floating-point number W. in, c2.
  • the processor can output the reciprocal of the high-order part of f u X2 is the fourth part (the fourth part bit width) of the decimal part of the mantissa of the floating-point number W, and the bit width corresponding to the fourth part of the decimal part of the floating-point number W does not overlap with the bit width corresponding to the third part of the decimal part of the floating-point number W.
  • X2 is the high t2 bits of the bit width of the decimal part of the floating-point number W except the aforementioned third part bit width, and t2 is a positive integer.
  • the processor uses the third part of the decimal part of the floating-point number W to determine the coefficients of the polynomial fitting equation, uses the fourth part of the decimal part of the floating-point number W to participate in the calculation of the polynomial fitting equation, and determines the reciprocal of the high-order part f u of f.
  • the processor may acquire or be configured with a second polynomial coefficient lookup table.
  • the second polynomial coefficient lookup table may characterize the correspondence between multiple second fitting parameter combinations and multiple second query parameter combinations.
  • Each second fitting parameter combination may include a fourth fitting parameter a2, a fifth fitting parameter b2, and a sixth fitting parameter c2.
  • the processor may determine the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2 in a manner including but not limited to any of the following examples B1 and B2.
  • the second polynomial coefficient lookup table may include a second odd number lookup sub-table and a second even number lookup sub-table.
  • the second odd number lookup sub-table represents the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an odd number.
  • the second even number lookup sub-table includes the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an even number.
  • the processor may search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second even number search subtable based on the target fourth query parameter h2 being an even number.
  • the processor may search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second odd number search subtable based on the target fourth query parameter h2 being an odd number.
  • the processor is implemented to search for the second fitting parameter combination corresponding to the target second query parameter combination from the second polynomial coefficient search table, thereby determining the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2.
  • the processor can use the target fourth query parameter h2 as the third index to determine the second even search subtable or the second odd search subtable, and can use the target third query parameter h1 as the fourth index to search for the corresponding second fitting parameter combination from the determined subtable.
  • the second polynomial coefficient lookup table may include a plurality of second fitting parameter combinations and a plurality of second query The corresponding relationship of the parameter combination, wherein a second query parameter combination can be used as an index, and an index corresponds to a second fitting parameter combination.
  • the processor can use the target third query parameter h1 and the target fourth query parameter h2 as an index, and search the second fitting parameter combination corresponding to the index from the second polynomial coefficient lookup table. Thereby, the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2 are determined.
  • the processor can serially determine the high-order part fu of f and the reciprocal of the high-order part fu of f.
  • the processor can determine the high-order part f u of f. For details, refer to the relevant introduction in step S102, which will not be repeated here.
  • the processor can determine the reciprocal of the high-order part f u of the square root of the target mantissa X based on the whole or part of the bit width of the high-order part f u of f in a polynomial approximation manner.
  • the processor may determine the target fifth query parameter g1 based on the high-order part fu of the square root f of the target mantissa X.
  • the processor may determine the coefficient of the preset third polynomial fitting equation based on the target fifth query parameter g1, wherein the target fifth query parameter g1 is the fifth part (the fifth part bit width) of the high-order part fu of f.
  • the target fifth query parameter g1 may be the high g1 bit (or low g1 bit) of the decimal part of the high-order part fu of f , g1 is a positive integer, and g1 is less than or equal to the full bit width of the decimal part of the high-order part fu of f .
  • the processor may determine the reciprocal of the high-order part f u of f according to the coefficients of the third polynomial fitting equation and the sixth part of the first bit-width part (the sixth bit-width part)
  • the bit width corresponding to the fifth part of the high-order part f u of f does not overlap with the bit width corresponding to the sixth part of the high-order part f u of f.
  • the processor may calculate the reciprocal of the high-order part f u of f based on the coefficients of the third polynomial fitting equation and all or part of the bit width of the high-order part f u of f.
  • g2 is the high g2 bits of the bit width of the decimal part of the high-order part f u excluding the bit width of the aforementioned fifth part, and g2 is a positive integer.
  • the processor may determine the coefficients of the third polynomial fitting equation corresponding to the target fifth query parameter g1 based on the target fifth query parameter g1, and the coefficients of the third polynomial fitting equation may include the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3.
  • the following is an example in which the processor determines the coefficients of the third polynomial fitting equation corresponding to the target fifth query parameter g1 based on the target fifth query parameter g1, and the coefficients of the third polynomial fitting equation may include the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3.
  • the processor may determine the coefficients of the third polynomial fitting equation corresponding to the target fifth query parameter g1 by using a method including but not limited to the following example C1.
  • the processor may acquire or be configured with a third polynomial coefficient lookup table.
  • the third polynomial coefficient lookup table may characterize the correspondence between multiple third fitting parameter combinations and multiple fifth query parameters. Among them, each fifth query parameter has a corresponding third fitting parameter combination.
  • Each third fitting parameter combination may include a seventh fitting parameter a3, an eighth fitting parameter b3, and a ninth fitting parameter c3.
  • the processor may find the third fitting parameter combination corresponding to the target fifth query parameter from the third polynomial coefficient lookup table, thereby realizing the coefficient of the third polynomial fitting equation corresponding to the target fifth query parameter g1.
  • the relationship between the low-order part f l , the high-order part f u , and the target mantissa X can be configured as
  • the processor may calculate the low part fl of f based on the relationship among the low part fl of f, the high part fu of f and the target mantissa X, the target mantissa X in step S101 and the high part fu of f determined in step S102 .
  • the processor may calculate the reciprocal of the high-order part f u of f by using methods including but not limited to those provided in the aforementioned method 1 and method 2, or may calculate the reciprocal of the high-order part f u of f by using the prior art.
  • the processor may calculate the reciprocal of the high-order part f u of f by using the SRT method. This application does not impose too many restrictions on this.
  • Step S104 determining the square root of the target mantissa based on the first bit width part and the second bit width part, and determining the decimal part of the square root of the target mantissa as the mantissa of the square root of the floating-point number to be calculated.
  • the processor may perform an addition operation on the first bit width part and the second bit width part to obtain the square root of the target mantissa X, and determine the decimal part of the square root of the target mantissa X as the mantissa of the square root of the floating point number W.
  • the processor may calculate a rounding determination parameter corresponding to the rounding method based on the configured rounding method, the target mantissa X, the first bit width part, and the second bit width part; and calculate a plurality of candidate results corresponding to the rounding method based on the first bit width part and the second bit width part; and select a candidate result from the plurality of candidate results as the square root of the target mantissa X based on a comparison result of the rounding determination parameter and a preset value.
  • the processor may configure a rounding mode, wherein the configured rounding mode may be any one of the following: round to nearest value RH, round to positive value RP, round to zero RZ.
  • the RH mode, RP mode, and RZ mode may be the rounding modes specified in IEEE 754.
  • the processor may configure multiple rounding modes, wherein the multiple rounding modes may be at least two of the RP mode, the RH mode, and the RZ mode.
  • the multiple rounding modes correspond one-to-one to the multiple rounding mode configuration parameters.
  • the rounding mode represented (or corresponding) by the first rounding configuration parameter is the RP mode.
  • the rounding mode corresponding to the second rounding configuration parameter is the RZ mode.
  • the rounding mode corresponding to the third rounding configuration parameter is the RH mode.
  • the processor may execute the rounding mode corresponding to the received rounding mode configuration parameter according to the received rounding mode configuration parameter.
  • the processor may execute the RP mode according to the received rounding configuration parameter being the first rounding configuration parameter.
  • the processor may execute the RZ mode according to the received rounding configuration parameter being the second rounding configuration parameter.
  • the processor may execute the RH mode according to the received rounding configuration parameter being the third rounding configuration parameter.
  • the processor can determine the first rounding discrimination parameter ie based on the first bit width part (that is, f u ), the second bit width part (that is, f l ) and the partial bit width of the target mantissa X.
  • the first rounding discrimination parameter ie can represent the deviation between the first value and the target mantissa X, and the first value is the square of the square root of the target mantissa X, that is, f.
  • the processor may determine a plurality of candidate results based on the first bit width part (ie, f u ) and the second bit width part (ie, f l ), and the plurality of candidate results may include a first candidate result f1 and a second candidate result f2.
  • f1 fu +f l
  • f2 f1+ulp.
  • ULP represents The smallest valid number that can be expressed in the full bit width of the calculation result.
  • the processor may select a candidate result from multiple candidate results according to the comparison result between the first rounding determination parameter ie and the preset value, and determine the selected candidate result as the square root of the target mantissa X.
  • the preset value may be configured as 0.
  • the processor may determine that the first candidate result f1 is the square root of the target mantissa X according to the first rounding determination parameter ie being greater than or equal to 0.
  • the processor may determine that the second candidate result f2 is the square root of the target mantissa X according to the first rounding determination parameter ie being less than 0.
  • the processor may determine a plurality of candidate results based on the first bit width portion and the second bit width portion, and the plurality of candidate results may include a first candidate result f1 and a third candidate result f3.
  • f1 fu + f1
  • f3 f1-ulp.
  • ulp represents The smallest valid number that can be expressed in the full bit width of the calculation result.
  • the processor may select a candidate result from multiple candidate results according to the comparison result between the first rounding determination parameter ie and the preset value, and determine the selected candidate result as the square root of the target mantissa X.
  • the preset value may be configured as 0.
  • the processor may determine that the first candidate result f1 is the square root of the target mantissa X according to the first rounding determination parameter ie being less than or equal to 0.
  • the processor may determine that the third candidate result f3 is the square root of the target mantissa X according to the first rounding determination parameter ie being greater than 0.
  • a candidate result is usually selected from multiple candidate results based on the comparison result of f and fr.
  • the same comparison result can be obtained by comparing f 2 and fr 2 , that is, the difference between f 2 and the target mantissa X can be calculated, so only f 2 needs to be calculated.
  • the first rounding discrimination parameter ie can characterize the difference between f 2 and the target mantissa X.
  • the processor can be based on the formula Determine the square root f of the target mantissa X.
  • the processor can use the formula Determine the square root f of the target mantissa X.
  • the distances between two possible candidate results and fr can be compared, and the candidate result with the smallest distance from fr can be determined as the square root of the target mantissa X.
  • first rounding parameter ie is greater than 0, at this time, f1>fr>f3.
  • the deviation between fr and the aforementioned f1 is recorded as the first distance
  • the deviation between the first distance and the second distance is (f1-fr), and the deviation between f3 and fr is recorded as the second distance (fr-f3).
  • the deviation between the first distance and the second distance is (f1-fr)-(fr-f3), recorded as the first deviation.
  • the result of (f1-fr)+(fr-f3) is a positive number.
  • the positive and negative properties of the first deviation i.e., the first deviation is positive, 0, or negative
  • the rounding discriminant parameter ie1 i.e., the rounding discriminant parameter ie1 is positive, 0, or negative.
  • the smallest significant digit is 2N digits after the decimal point, where N is the bit width of the decimal part of the target mantissa X.
  • the effective digits of ulp 2 /4 are 2N+2 digits after the decimal point, which shows that ulp 2 /4 is outside the effective data range of the first rounding discriminant parameter ie and the calculation of ulp ⁇ f1.
  • the result of removing ulp 2 /4 from ie1 has the same sign as ie1 and is not zero.
  • the second rounding discriminant parameter ien can characterize the deviation between the square of the first distance and the square of the second distance.
  • the result of (f2-fr)+(fr-f1) is an integer, and the positive and negative properties of the second deviation (i.e., the second deviation is positive, 0, or negative) are the same as the positive and negative properties of the rounding discrimination parameter ie2 (i.e., the rounding discrimination parameter ie2 is positive, 0, or negative).
  • the smallest significant digit is 2N digits after the decimal point, where N is the bit width of the decimal part of the target mantissa X.
  • the effective digits of ulp 2 /4 are 2N+2 digits after the decimal point, which shows that ulp 2 /4 is outside the effective data range of the two operations of determining ie and calculating ulp ⁇ f1.
  • the result of removing ulp 2 /4 from ie2 has the same sign as ie2 and is not zero.
  • the third rounding discrimination parameter iep can represent the difference between the square of the third distance and the square of the fourth distance.
  • the processor may select a candidate result from a plurality of candidate results according to the comparison result between the second rounding determination parameter ien and the preset value and the comparison result between the third rounding determination parameter iep and the preset value, and determine the selected candidate result as the square root of the target mantissa X.
  • the preset value may be configured as 0.
  • the processor may determine that the second candidate result f2 is the square root of the target mantissa X according to the third rounding determination parameter iep being less than 0.
  • the processor may determine that the third candidate result f3 is the square root of the target mantissa X according to the second rounding determination parameter ien being greater than or equal to 0.
  • the processor may determine that the first candidate result f1 is the square root of the target mantissa X according to the third rounding determination parameter iep being greater than or equal to 0, or the second rounding determination parameter ien being less than 0.
  • the first rounding determination parameter ie, the second rounding determination parameter ien, and the third rounding determination parameter iep determined by the processor can ensure that the square root of the target mantissa X is equal to The error between the real numbers is less than 1ulp(2 -N ), where N is the bit width of the fractional part of the target mantissa X, that is, Ulp is the unit of least precision (ulp), which represents the square root of the target mantissa X ( The smallest valid number that can be expressed in the full bit width of the target mantissa X is the calculation result of the RP method.
  • the square root f of the target mantissa X is selected from the first candidate result f1 and the second candidate result f2.
  • the square root f of the target mantissa X is selected from the first candidate result f1 and the third candidate result f3.
  • the square root f of the target mantissa X is selected from the first candidate result f1, the second candidate result f2 and the third candidate result f3.
  • f1 can be directly selected as the square root f of the target mantissa X.
  • the processor may select one of the first candidate result f1 and the third candidate result f3 as the square root f of the target mantissa X based on the positive or negative of the second rounding determination parameter ien.
  • the processor may select one of the first candidate result f1 and the third candidate result f3 based on the formula Determine the square root f of the target mantissa X.
  • the processor may select one of the first candidate result f1 and the second candidate result f2 as the square root f of the target mantissa X based on the positive or negative property of the third rounding determination parameter iep.
  • the processor may select one of the first candidate result f1 and the second candidate result f2 based on the formula Determine the square root f of the target mantissa X.
  • the processor can use the formula Determine the square root f of the target mantissa X, where else can refer to the case where iep ⁇ 0, or the case where ien ⁇ 0.
  • the second rounding determination parameter ien and the third rounding determination parameter iep determined by the processor can ensure the calculation accuracy and have a smaller calculation amount.
  • the processor may determine the exponent offset value of the square root of the floating point number W based on the exponent EW. For example, if the exponent EW of the floating point number W is an even number and a positive number, the processor may The exponent offset is determined as the exponent offset value of the square root of the floating point number W. If the exponent EW of the floating point number W is an odd number and a positive number, the processor can +Exponent offset is determined as the exponent offset value of the square root of the floating point number W.
  • the processor can output the square root of the floating point number W in,
  • the sign bit of is the same as the sign bit of the floating point number W,
  • the mantissa of is the same as the decimal part of the square root of the target mantissa X.
  • the bit width of the fractional part of the target mantissa X can be configured as N bits
  • the full bit width of the first bit width part can be configured as d+2 bits
  • the bit width of the fractional part of the first bit width part is d+1 bits, wherein the relationship between d and N can meet the preset condition, which can be
  • the processor uses part of the bit width of the target mantissa X, for example, the upper t4 bits of the decimal part of the target mantissa X (denoted as Xt), to determine the first bit width part (i.e., f u ), where t4 is a positive integer, and t4 is less than the full bit width of the target mantissa X.
  • the bit width of the decimal part of f u is d+1 bits.
  • the processor adopts the operation in the aforementioned method one to determine the reciprocal of the first bit width part (i.e., ),at this time The width of the fractional part of is d+1 bits.
  • the error generated by the processor in determining the first bit width is Where
  • the processor determines the inverse of the first bit width.
  • the error generated in the process is Where,
  • the processor is based on the relationship between the low-order part fl of f, the high-order part f u of f, and the target mantissa X, that is, In actual scenarios, The bit width may exceed n bits, and the bit width of the second bit width part is n, the processor reserves The high n bits of as the second bit width part (ie, fl ) will produce an error eRH , That is, eRH is 2- (N+1) .
  • the error generated by the processor in determining the second bit width portion includes the calculation and The error e c produced in the multiplication process and the error e RH produced in the retention operation.
  • the error err can be expressed as
  • the target first query parameter r1 can be the first part (the first part bit width) of the decimal part of the target mantissa X.
  • the target first query parameter can be the high nr1 bits (or low nr1 bits) of the decimal part of the target mantissa X, nr1 is a positive integer, and nr1 is less than or equal to the full bit width of the decimal part of the target mantissa X.
  • the preset condition can be configured as
  • the value of the bit width t4 of the fractional part of Xt is d+2 bits (that is, ), and the decimal part of the first bit width part (i.e., f u ) has a bit width of d+1 bits, which can realize the square root of the target mantissa X and The error between the real numbers is less than 1ulp (i.e. ). It can be seen that in the embodiment of the present application, the processor can use the high fractional part of the target mantissa X
  • the first bit width part i.e., f u
  • the high nr1 bit in the bit can be used as an index bit to determine the target first query parameter r1, so as to determine the coefficients of the first polynomial fitting equation.
  • the high part The other bits except the high nr1 bits can be used as calculation bits, used as variable values in the first polynomial fitting equation, and participate in the calculation of the first bit width part (ie, fu ).
  • the processor may include hardware structures and/or software modules corresponding to the execution of each step (or function).
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain step (or function) is executed in the form of hardware or computer software driving hardware may depend on the specific application scenario and design constraints of the technical solution.
  • FIG5 shows a floating-point number calculation module according to an exemplary embodiment.
  • the floating-point number calculation module may include a high-order calculation unit, a low-order calculation unit, and a precise rounding unit.
  • the high-order calculation unit, the low-order calculation unit, and the precise rounding unit can obtain all or part of the bit width of the target mantissa X.
  • the floating-point number calculation module may also include a pre-processing unit.
  • the pre-processing unit may receive a floating-point number Z to be calculated, and convert the floating-point number Z into a floating-point number W by a pre-processing method.
  • the following is an introduction in conjunction with the floating-point number representation.
  • the exponent offset is a preset number and is related to the type of the floating-point number Z. Exemplarily, the floating-point number Z is a single-precision floating-point number, and the exponent offset is 127; the floating-point number is a double-precision floating-point number, and the exponent offset is 1023.
  • the pre-processing unit can receive the floating-point number Z through multiple signal lines.
  • the signal of the first level can represent "0", and the signal of the second level can represent "1".
  • the first level can be a high level, and the second level can be a low level.
  • the first level can be a low level, and the second level can be a high level.
  • a signal line can correspond to one bit width in the full bit width of the floating-point number Z.
  • the connections between the units in the floating-point calculation module provided in the embodiment of the present application represent the interaction between the units, and do not represent the actual connection method between the units.
  • the pre-processing unit may have the ability to pre-process the floating point number Z.
  • the pre-processing unit may include, but is not limited to, the following functions: normalization processing function, exponent parity determination function, exponent conversion processing function, and mantissa conversion processing function, so as to support the pre-processing unit to pre-process the floating point number Z.
  • the floating point number Z received by the pre-processing unit may be a normalized floating point number, that is, a regularized floating point number.
  • the floating point number Z received by the pre-processing unit may also be a non-normalized floating point number, that is, a denormalized floating point number.
  • FIG. 4 shows the pre-processing process of the floating-point number Z according to an exemplary embodiment.
  • the pre-processing unit has the ability to perform normalization processing on non-normalized floating-point numbers.
  • the floating-point number Z after normalization processing can be recorded as a floating-point number W, and the floating-point number W can be represented as S ⁇ M1 ⁇ Q EW .
  • S is the sign bit
  • Q is the radix
  • EW is the exponent value
  • M1 is the mantissa
  • the value of each bit in M1 is between 0 and the radix Q, and the highest bit of M1 is not zero.
  • the mantissa M1 is a fixed-point number, in which the highest bit is the integer part, and the part other than the highest bit is the decimal part, and the integer part of the mantissa M1 of the normalized floating-point number is not 0.
  • the pre-processing unit can reflect that the exponent EW of the floating-point number W is an odd number based on the fact that the lowest bit of the exponent EW is 0.
  • the lowest bit of the exponent EW of the floating-point number W is 1, which can reflect that the exponent EW of the floating-point number W is an even number.
  • the pre-processing unit may perform a mantissa transformation process on the mantissa M1 of the floating-point number W and an exponent transformation process on the exponent EW of the floating-point number W based on the exponent EW of the floating-point number W being an odd number.
  • the pre-processing unit may perform a first mantissa conversion process on the mantissa M1 of the floating-point number W according to the fact that the exponent EW of the floating-point number W is an odd number, to obtain a mantissa M2.
  • the first mantissa conversion process may be a multiplication operation of Q.
  • the pre-processing unit may perform a multiplication operation of Q on the mantissa M1 according to the fact that the exponent EW of the floating-point number W is an odd number, to obtain a mantissa M2.
  • the mantissa M1 includes an integer part and a decimal part.
  • the black box shows the bits of the integer part
  • the white box shows the bits of the decimal part.
  • the integer part of the mantissa M1 of the floating-point number W is 1 bit
  • the integer part of the mantissa M1 is the bit shown in the s1th bit
  • the decimal part of the mantissa M1 is the bits shown in the 0th to vth bits.
  • the value range of the mantissa M1 is [1, 2).
  • the pre-processing unit multiplies the mantissa M1 by 2, that is, shifts each bit of the mantissa M1 to the left by one bit, to obtain the mantissa M2.
  • the integer part of the mantissa M2 is 2 bits
  • the integer part of the mantissa M2 is the bits shown in the s1th and s2th bits
  • the decimal part of the mantissa M2 is the bits shown in the 0th to vth bits.
  • the value range of the mantissa M2 is [2, 4).
  • the mantissa M2 The integer part of the target mantissa X is increased by one bit compared to the integer part of the mantissa M1 to fill the default integer bit of the IEEE754 format.
  • the target mantissa X is twice the mantissa M1 of the floating-point number W, that is, the target mantissa X is the same as the mantissa M2.
  • the integer part of the target mantissa X can include 2 bits. At this time, the numerical range of the target mantissa X is [2, 4).
  • the pre-processing unit performs the first mantissa transformation processing on the mantissa M1 to obtain the mantissa M2, thereby obtaining the target mantissa X, which is used to clarify the process of the pre-processing unit obtaining the target mantissa X from the mantissa M1 when the exponent EW of the floating-point number W is an odd number.
  • the pre-processing unit can directly obtain the target mantissa X from the mantissa M1 based on a preset mantissa transformation processing method, and output the target mantissa X.
  • the floating-point number calculation module may also include an exponent processing unit.
  • the pre-processing unit may perform an exponent conversion process on the exponent EW of the floating-point number W based on the exponent EW of the floating-point number W being an odd number, and subtract 1 from the lowest bit in the full bit width of the exponent EW to obtain the exponent EW-1, where the exponent EW-1 is an even number.
  • the pre-processing unit may output the exponent value EW-1 to the exponent processing unit based on the exponent EW of the floating-point number W being an odd number, so that the exponent processing unit can determine the exponent or exponent offset value of the square root of the floating-point number W.
  • the pre-processing unit can provide the exponent value EW-1 to the exponent processing unit according to the exponent EW of the floating point number W being an odd number.
  • the pre-processing unit can shift the exponent value EW-1, such as shifting it one bit toward the low bit direction of the exponent bit width, to obtain The order code is (EW-1), also
  • the exponential processing unit can be based on The order The sum of the preset index offsets is calculated The exponential offset value is output, where The index offset value is +Exponent offset.
  • the preset exponent offset is related to the type of floating point number Z. For example, if the floating point number Z is a single-precision floating point number, the exponent offset may be 127; if the floating point number is a double-precision floating point number, the exponent offset may be 1023.
  • the target mantissa X is the same as the mantissa M1 of the floating-point number W.
  • the pre-processing unit can output the target mantissa X, and the target mantissa is the same as the mantissa M1 of the floating-point number W.
  • the integer part of the mantissa M1 of the floating-point number W is 1 bit
  • the integer part of the mantissa M1 is the bit shown in the s1th position
  • the decimal part of the mantissa M1 is the bit shown in the 0th to vth positions.
  • the numerical range of the mantissa M1 is [1, 2).
  • the mantissa of the floating-point number W is the same as the mantissa M1 after the normalization of the floating-point number Z, then the numerical range of the mantissa of the floating-point number W is [1, 2).
  • the numerical range of the target mantissa X is [1, 2).
  • the integer part of the target mantissa X can be added with an extra bit compared to the integer part of the mantissa M1, which is configured as 0 to fill the default integer bit of the IEEE754 format. For example, the s2th bit is added, and the numerical value is configured as 0. Such an operation does not change the numerical range of the target mantissa X.
  • the pre-processing unit can provide the exponent value EW to the exponent processing unit based on the exponent EW of the floating point number W being an even number.
  • the order The sum of the preset index offsets is calculated
  • the exponential offset value is output, where
  • the index offset value is Exponent offset.
  • the preset exponent offset is related to the type of floating point number Z. For example, if the floating point number Z is a single-precision floating point number, the exponent offset may be 127; if the floating point number is a double-precision floating point number, the exponent offset may be 1023.
  • f is generally a fixed-point number. If the full bit width of the decimal part of the target mantissa X is p bits, the full bit width of f is p+1 bits, where the highest bit is the pth bit and the lowest bit is the 0th bit, and the p-1th bit to the 0th bit are the decimal part after the decimal point of the fixed-point number.
  • the floating point calculation module provided in the embodiment of the present application can determine the high m bits of data of f (referred to as the high part of f) and the low n bits of data (referred to as the low part of f) respectively.
  • the floating point calculation module can determine the high part and the low part of f based on the determined high part and the low part of f.
  • the calculation result may refer to the first m bits in the direction from the highest bit to the lowest bit in the full bit width of f.
  • the low n bits may refer to the last n bits in the direction from the highest bit to the lowest bit in the full bit width of f.
  • the high m bits of f and the low n bits of f may have overlapping bits, as shown in (a) in Figure 6.
  • the floating point number Z is a single-precision floating point number and the full bit width of the mantissa is 23 bits
  • the full bit width of f is 24 bits
  • the highest bit is the 23rd bit
  • the lowest bit is the 0th bit.
  • the high m bits of f are the first m bits in the direction from the 23rd bit to the 0th bit
  • the low n bits of f are the last n bits in the direction from the 23rd bit to the 0th bit.
  • the high m bits of f may not have overlapping bits with the last n bits of f.
  • the values of m and n may be pre-configured.
  • m and n may be configured according to the type of the floating point number Z.
  • the high w bits of "A” may refer to the first w bits of data from the highest bit to the lowest bit of "A”.
  • the low w bits of "A” may refer to the last w bits of data from the highest bit to the lowest bit of "A”.
  • the high m bits of f are represented by fu
  • the low n bits of f are represented by fl
  • the relationship between the high part fu and the low part fl of f is To simplify the calculation process of f, the relationship between the high-order part f u and the low-order part f l can be approximated as In the embodiment of the present application, the high part f u represents The approximate value of (f u +f l ) can be expressed as The exact value of , that is, the exact value of f.
  • the pre-processing unit can output the target mantissa X so that other units can use the full bit width or partial bit width of the target mantissa X.
  • the target mantissa X represents the mantissa of the floating-point number Z after pre-processing, hereinafter referred to as the target mantissa X of the floating-point number Z.
  • EW represents the exponent of the floating-point number Z after normal value processing, hereinafter referred to as the exponent EW of the floating-point number Z.
  • the high-order calculation unit can be connected to the pre-processing unit.
  • the high-order calculation unit can receive all or part of the bit width of the target mantissa X output by the pre-processing unit.
  • the high-order calculation unit can receive all or part of the bit width of the mantissa M1 of the floating-point number W output by the pre-processing unit, and all or part of the bit width of the exponent EW of the floating-point number W.
  • the high-order calculation unit may determine the high-order part fu of the square root f of the target mantissa X based on the first polynomial fitting equation and all or part of the bit width of the target mantissa X by using a polynomial approximation method.
  • the high-order calculation unit can determine the target first query parameter r1 based on the mantissa M1 of the floating-point number W, and the high-order calculation unit can determine the target second query parameter r2 based on the exponent EW of the floating-point number W.
  • the target first query parameter can be the first part (the first part bit width) of the decimal part of the mantissa M1 of the floating-point number W.
  • the target first query parameter can be the high nr1 bit (or low nr1 bit) of the decimal part of the mantissa M1 of the floating-point number W, nr1 is a positive integer, and nr1 is less than or equal to the full bit width of the decimal part of the mantissa M1 of the floating-point number W.
  • the target second query parameter is a partial bit width of the exponent EW of the floating-point number W, and includes the lowest bit width of the exponent EW of the floating-point number W.
  • the target second query parameter r2 can be the low nr2 bit of the exponent EW of the floating-point number W, nr2 is a positive integer, and nr2 is less than or equal to the full bit width of the exponent EW. It can be seen that the low nr2 bit of the exponent EW includes the lowest bit of the exponent EW.
  • the target second query parameter r2 may be data of the lower 1 bit of the exponent EW of the floating point number W, and the data may reflect whether the exponent EW is an odd number or an even number.
  • the high-order calculation unit can determine the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2 based on the target first query parameter r1 and the target second query parameter r2.
  • the coefficients of the first polynomial fitting equation may include the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1.
  • the high-order calculation unit can calculate the high-order part fu of f according to the coefficients of the first polynomial fitting equation and all or part of the bit width in the decimal part of the mantissa M1 of the floating-point number W.
  • X1 is the second part (the second part bit width) of the decimal part of the mantissa M1 of the floating-point number W, and the bit width corresponding to the second part in the decimal part of the mantissa M1 of the floating-point number W does not overlap with the bit width corresponding to the first part in the decimal part of the mantissa M1 of the floating-point number W.
  • X1 is the high t1 bits of the other bit widths in the decimal part of the mantissa M1 of the floating-point number W except the bit width of the aforementioned first part, and t1 is a positive integer.
  • the high-order calculation unit uses the first part of the decimal part of the mantissa M1 of the floating-point number W to determine the coefficients of the polynomial fitting equation, and uses the second part of the decimal part of the mantissa M1 of the floating-point number W to participate in the calculation of the polynomial fitting equation to determine the approximate solution of the square root of the target mantissa X, that is, the high-order part fu of f.
  • the bit width of the fractional part of the target mantissa X is N bits
  • the high-order calculation unit can receive the high t1 bits of the bit width of the fractional part of the target mantissa X excluding the bit width of the aforementioned first part, and receive the target first query parameter r1, for calculating the high-order part f u of f, wherein the full bit width of the aforementioned X1 is t1 bits, and the full bit width of the target first query parameter r1 is nr1 bits.
  • the relationship between t1, nr1 and N is:
  • the floating point number Z is a DP floating point number
  • the high-order calculation unit can use the high 29 (i.e. ) bits, calculate the high-order part f u of f.
  • the high nr1 bits of the high 29 bits of the decimal part of the target mantissa X can be used as the target first query parameter r1, and the other parts are used as the aforementioned X1, and the value of nr1 can be flexibly configured.
  • the floating point number Z is a DP floating point number
  • the full bit width of the high-order part f u of f can be 29 bits.
  • the high-order computing unit may acquire or be configured with a first polynomial coefficient lookup table.
  • the first polynomial coefficient lookup table may characterize the correspondence between multiple first fitting parameter combinations and multiple first query parameter combinations.
  • each first query parameter combination has a corresponding first fitting parameter combination.
  • Each first fitting parameter combination may include a first fitting parameter a1, a second fitting parameter b1, and a third fitting parameter c1.
  • Each first query parameter combination may include a first query parameter and a second query parameter.
  • the target first query parameter r1 and the target second query parameter r2 may constitute a target first query parameter combination.
  • the high-order calculation unit can find the first fitting parameter combination corresponding to the target first query parameter combination from the first polynomial coefficient lookup table, thereby achieving the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2. coefficient.
  • the floating point number Z is a DP floating point number
  • the target second query parameter can be the lower 1 bit (smallest bit) of the exponent EW of the floating point number W (the floating point number Z is normalized)
  • the target first query parameter can be the upper 7 bits of the decimal part of the target mantissa X, that is, the upper 7 bits of the part of the target mantissa X except the highest bit, that is, the 11th to 18th bits in the decimal part of the target mantissa X.
  • the high-order calculation unit can use 8-bit data for table lookup.
  • the first polynomial coefficient lookup table can have 256 entries, which are 256 first query parameter combinations and the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1 corresponding to each first query combination.
  • the first polynomial coefficient lookup table may include multiple first fitting parameter combinations corresponding to each first query parameter.
  • a first query parameter may have a first fitting parameter combination corresponding to the case where the second query parameter is an odd number, and a first fitting parameter combination corresponding to the case where the second query parameter is an even number.
  • the first polynomial coefficient lookup table may include a first odd number lookup sub-table and a first even number lookup sub-table.
  • the first odd number lookup sub-table includes the first fitting parameter combination corresponding to the first query parameter when the second query parameter is an odd number.
  • the first even number lookup sub-table includes the first fitting parameter combination corresponding to the first query parameter when the second query parameter is an even number.
  • the high-order calculation unit may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first even-number search sub-table according to the target second query parameter r2 being an even number.
  • the high-order calculation unit may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first odd-number search sub-table according to the target second query parameter r2 being an odd number. This achieves the determination of the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2.
  • the floating point number Z is a DP floating point number
  • the target second query parameter can be the lower 1 bit (smallest bit) of the exponent EW of the floating point number W (the floating point number Z is normalized)
  • the target first query parameter can be the upper 7 bits of the decimal part of the target mantissa X, that is, the upper 7 bits of the part of the target mantissa X except the highest bit, that is, the 11th to 18th bits in the decimal part of the target mantissa X.
  • the high-order calculation unit can use 8-bit data to look up the table.
  • the first polynomial coefficient lookup table can include a first odd number lookup subtable and a first even number lookup subtable.
  • the first odd number lookup subtable can include 128 table items, each of which represents the coefficient of the first polynomial fitting equation corresponding to each target first query parameter when the exponent of the floating point number W is an odd number.
  • the first even number lookup subtable can include 128 table items, each of which represents the coefficient of the first polynomial fitting equation corresponding to each target first query parameter when the exponent of the floating point number W is an even number.
  • the high-order calculation unit may query the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 from the first odd-number search subtable based on the target second query parameter r2 being an odd number.
  • the high-order calculation unit may query the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 from the first even-number search subtable based on the target second query parameter r2 being an even number.
  • the first fitting parameter a1, the second fitting parameter b1, the third fitting parameter c1 in the pre-configured first polynomial coefficient lookup table, that is, in the correspondence between the first fitting parameter a1, the second fitting parameter b1, the third fitting parameter c1, and the first query parameter combination, can be stored in the same first storage module, as shown in (a) in FIG7.
  • the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1 can be stored in three first storage modules, respectively, as shown in (b) in FIG7.
  • any two parameters of the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1 are stored in the same first storage module, as shown in (c) in FIG7, the first fitting parameter a1 and the second fitting parameter b1 are stored in the same first storage module, and the third fitting parameter c1 is stored in another first storage module.
  • the first polynomial coefficient lookup table may include the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1 corresponding to a preset number of first query parameter combinations.
  • the low-bit calculation unit may include a first high-bit inverse calculation circuit and a low-bit operation circuit, wherein the first high-bit inverse calculation circuit may run in parallel with the high-bit calculation unit, so that the low-bit calculation unit may run in parallel with the high-bit calculation unit.
  • the first high-order reciprocal calculation circuit can be connected to the pre-processing unit.
  • the first high-order reciprocal calculation circuit can receive all or part of the bit width of the target mantissa X output by the pre-processing unit.
  • the first high-order reciprocal calculation circuit can receive all or part of the bit width of the exponent EW output by the pre-processing unit.
  • the first high-order reciprocal calculation circuit can determine the target third query parameter h1 based on the mantissa M1 of the floating-point number W, and the first high-order reciprocal calculation circuit can determine the target fourth query parameter h2 based on the exponent EW of the floating-point number W.
  • the target third query parameter is the third part (the third part bit width) of the mantissa M1 of the floating-point number W.
  • the target third query parameter h1 may refer to the high nh1 bits (or low nh1 bits) of the decimal part of the mantissa M1 of the floating-point number W, nh1 is a positive integer, and nh1 is less than or equal to the full bit width of the decimal part of the mantissa M1 of the floating-point number W.
  • the target fourth query parameter h2 is a partial bit width of the exponent EW of the floating-point number W, and includes the lowest bit width of the exponent EW of the floating-point number W.
  • the target fourth query parameter may refer to the high nh2 bits (or low nh2 bits) of the exponent EW of the floating-point number W, h2 is a positive integer, and nh2 is less than or equal to the full bit width of the exponent EW of the floating-point number W.
  • the target fourth query parameter h2 may be data of the lower 1 bit of the exponent EW of the floating point number W, and the data may reflect whether the exponent EW is an odd number or an even number.
  • the first high-order inverse calculation circuit can determine the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2 based on the target third query parameter h1 and the target fourth query parameter h2.
  • the coefficients of the second polynomial fitting equation may include the fourth fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2.
  • the first high-order reciprocal calculation circuit can calculate the reciprocal of the high-order part f u of f according to the coefficients of the second polynomial fitting equation and all or part of the bit width of the decimal part of the floating-point number W.
  • the first high-order reciprocal calculation circuit can calculate the reciprocal of the high-order part f u of f according to the coefficients of the second polynomial fitting equation and the fourth part (the fourth part bit width) of the decimal part of the floating-point number W.
  • the first high-order reciprocal calculation circuit can output the reciprocal of the high-order part f u of f X2 is the fourth part (the fourth part bit width) of the decimal part of the mantissa of the floating-point number W, and the bit width corresponding to the fourth part of the decimal part of the floating-point number W does not overlap with the bit width corresponding to the third part of the decimal part of the floating-point number W.
  • X2 is the high t2 bits of the bit width of the decimal part of the floating-point number W except the aforementioned third part bit width, and t2 is a positive integer.
  • the full bit width of the decimal part of the target mantissa X is N bits
  • the first high-order reciprocal calculation circuit can receive the target third query parameter h1, and receive the high t2 bits of the bit width of the decimal part of the target mantissa X except the aforementioned third part bit width, for calculating the reciprocal of the high-order part f u of f
  • the full bit width of the target third query parameter h1 is nh1
  • the full bit width of X2 is t2 bits.
  • nh1, t2 and N The relationship between nh1, t2 and N is:
  • the floating point number Z is a DP floating point number
  • the first high-order reciprocal calculation circuit can use the high 29 of the decimal part of the target mantissa X. , calculate the reciprocal of the high-order part of f u
  • the high nh1 bits of the high 29 bits of the decimal part of the target mantissa X can be used as the target third query parameter h1, and the other parts are used as the aforementioned X2, and the value of nh1 can be flexibly configured.
  • the floating point number Z is a DP floating point number
  • the first high-order reciprocal calculation circuit can output the reciprocal of the high-order part f u of f
  • the full bit width is 29 bits.
  • the first high-order reciprocal calculation circuit can be configured with a second polynomial coefficient lookup table, and the second polynomial coefficient lookup table can characterize the correspondence between multiple second fitting parameter combinations and multiple second query parameter combinations.
  • each second query parameter combination has a corresponding second fitting parameter combination.
  • Each second fitting parameter combination may include a fourth fitting parameter a2, a fifth fitting parameter b2, and a sixth fitting parameter c2.
  • the first high-order reciprocal calculation circuit can find out the second fitting parameter combination corresponding to the target second query parameter combination from the second polynomial coefficient lookup table, wherein the fitting parameters in the second fitting parameter combination corresponding to the target second query parameter combination are used as coefficients of the second polynomial fitting equation corresponding to the target second query parameter combination.
  • the floating point number Z is a DP floating point number
  • the fourth query parameter may refer to the lower 1 bit (least bit) of the exponent EW of the floating point number W
  • the third query parameter may refer to the upper 8 bits of the decimal part of the target mantissa X.
  • the fourth query parameter may refer to the lower 2 bits of the exponent EW of the floating point number W
  • the third query parameter may refer to the upper 7 bits of the decimal part of the target mantissa X. It can be seen that the first high-order reciprocal calculation circuit can use the third query parameter and the fourth query parameter, a total of 9 bits of data, for table lookup.
  • the second polynomial coefficient lookup table may have 2 9 , i.e., 512 entries, which are respectively 512 second query combinations and the fourth fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2 corresponding to each second query combination.
  • the second polynomial coefficient lookup table may include multiple second fitting parameter combinations corresponding to each third query parameter.
  • a third query parameter may have a second fitting parameter combination corresponding to the case where the fourth query parameter is an odd number, and a second fitting parameter combination corresponding to the case where the fourth query parameter is an even number.
  • the second polynomial coefficient lookup table may include a second odd number lookup sub-table and a second even number lookup sub-table.
  • the second odd number lookup sub-table represents the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an odd number.
  • the second even number lookup sub-table includes the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an even number.
  • the first high-order reciprocal calculation circuit can search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second even-number search subtable according to the target fourth query parameter h2 being an even number.
  • the first high-order reciprocal calculation circuit can search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second odd-number search subtable according to the target fourth query parameter h2 being an odd number.
  • the first high-order reciprocal calculation circuit is implemented to search for the second fitting parameter combination corresponding to the target second query parameter combination from the second polynomial coefficient search table, thereby determining the coefficients of the first polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2.
  • the second fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2 can be stored in the same second storage module.
  • the second fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2 can be stored in three second storage modules respectively.
  • the second fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2 can be stored in three second storage modules respectively. Any two parameters in the fitting parameter c2 are stored in the same second storage module, and other parameters are stored in another second storage module.
  • the second polynomial coefficient lookup table may include a fourth fitting parameter a2, a fifth fitting parameter b2, and a sixth fitting parameter c2 corresponding to a preset number of second query parameter combinations.
  • the low-order operation circuit can be connected to the first high-order inverse calculation circuit, to the pre-processing unit, and to the high-order calculation unit.
  • the low-order operation circuit can receive the inverse of the high-order part f u output f by the first high-order inverse calculation circuit (i.e. ), the target mantissa X output by the pre-processing unit, and the high-order part f u output by the high-order calculation unit.
  • the low-order calculation unit can calculate the relationship between the high-order part f u and the low-order part f l according to the relationship between the high-order part f u and the low-order part f l
  • the low-order part fl of f is obtained by calculation.
  • the low-order operation circuit can output the low-order part fl of f, and also realizes that the low-order calculation unit outputs the low-order part fl of f.
  • the precise rounding unit may receive the high-order part f u of f output by the high-order calculation unit and the low-order part f l of f output by the low-order calculation unit.
  • the precise rounding unit may obtain the rounding mode configuration parameters in advance, and calculate the square root of the target mantissa X according to the rounding mode corresponding to the rounding mode configuration parameters.
  • the RH mode, the RP mode, and the RZ mode may be the rounding modes specified in IEEE 754.
  • the precise rounding unit may determine a plurality of candidate calculation results according to f u output by the high-order calculation unit and f l output by the low-order calculation unit.
  • the precise rounding unit may calculate a plurality of candidate results based on a preconfigured rounding method.
  • the plurality of candidate results may be calculated based on the acquired rounding method configuration parameters.
  • the plurality of candidate calculation results may include at least two of the following: a first candidate result f1, a second candidate result f2, and a third candidate result f3.
  • the precise rounding unit may calculate the first candidate result f1 and the second candidate result f2 based on the RP method.
  • the precise rounding unit may calculate the first candidate result f1 and the third candidate result f3 based on the RZ method.
  • the precise rounding unit may calculate the first candidate result f1, the second candidate result f2 and the third candidate result f3 based on the RH method.
  • the process of determining f1 is introduced below, that is, the meaning of the addition process of fu and fl ( fu + f1 ) is explained.
  • the upper m bits of the square root f of the target mantissa X and the lower n bits of the square root f of the target mantissa X do not overlap, and the sum of m and n is equal to the full bit width of the calculation result of the square root f of the target mantissa X.
  • the precise rounding unit After the precise rounding unit receives fu output by the high-order calculation unit and fl output by the low-order calculation unit, it concatenates fu and fl to obtain f1.
  • the upper m bits of f1 obtained by the precise rounding unit are the same as fu
  • the lower n bits of f1 are the same as fl .
  • the high m bits of the square root f of the target mantissa X overlap with the low n bits of the square root f of the target mantissa X.
  • the q bits with the lowest median of the high m bits of the square root f of the target mantissa X overlap with the q bits with the highest median of the low n bits of the square root f of the target mantissa X.
  • the dotted ellipse in (b) of FIG8 shows the overlapping bits of the high m bits and the low n bits, wherein the 2 bits with the lowest median of the high m bits overlap with the 2 bits with the highest median of the low n bits.
  • the precise rounding unit after receiving fu output by the high-bit calculation unit and fl output by the low-bit calculation unit, performs sum processing on fu and fl .
  • the low nq bits of the first candidate result f1 obtained by the precise rounding unit are the same as the low nq bits of fl .
  • the result of the sum processing of fu and the high q bits of fl is the high m+q bits of the first candidate result f1 obtained by the precise rounding unit.
  • Ulp represents the smallest valid number that can be expressed in the full bit width of the square root f of the target mantissa X. As shown in (a) of Figure 9, assuming that the full bit width of the square root f of the target mantissa X is 24 bits, the lowest bit is the 0th bit, and the highest bit is the 23rd bit, then the number represented by ulp is that the 0th bit is 1, and the other bits are all 0.
  • the full bit width of the decimal part of the target mantissa X is Nt, then ulp is Q -Nt .
  • the base Q of the floating-point number is generally 2.
  • the precise rounding unit can add the first candidate result f1 and ulp to obtain the second candidate result f2.
  • the precise rounding unit can perform subtraction processing on the first candidate result f1 and ulp to obtain the third candidate result f3.
  • the precise rounding unit can calculate the first rounding discrimination parameter ie according to f u and fl .
  • the floating point number Z is a DP floating point number
  • the value of t3 can be 3.
  • the precise rounding unit can calculate the first rounding discrimination parameter (ie) using the low-order part of (f u 2 +f l 2 +2 ⁇ f u ⁇ f l ) and the low-order part of the target mantissa (X), which can reduce circuit overhead and reduce the circuit chip area.
  • the precise rounding unit executes the RP mode, it can output the first candidate result f1 according to the first rounding judgment parameter ie being greater than or equal to 0. That is The calculation result is the first candidate result f1. Or according to the first rounding judgment parameter ie is less than 0, the second candidate result f2 is output, that is, The calculation result is f2.
  • the first rounding judgment parameter ie is less than or equal to 0, and the first candidate result f1 is output, that is, The calculation result is the first candidate result f1. Or according to the first rounding judgment parameter ie is greater than 0, the third candidate result f3 is output, that is, The calculation result is the third candidate result f3.
  • the precise rounding unit can determine the second rounding discrimination parameter ien according to the first candidate result f1 and the first rounding discrimination parameter ie.
  • the circuit for calculating the first rounding discrimination parameter ie can be reused, reducing circuit overhead and optimizing circuit chip area.
  • the deviation between the real number fr and the aforementioned f1 is recorded as the first distance (f1-fr), and the aforementioned f3 and
  • the deviation between the square of the first distance and the square of the second distance is recorded as the second distance (fr-f3).
  • the second rounding discrimination parameter ien can represent the deviation between the square of the first distance and the square of the second distance.
  • the precise rounding unit can align the highest bit of the significant digit of the first candidate result f1 with the highest bit of the significant digit of the first rounding discrimination parameter ie, which can be achieved by multiplying ulp with the first candidate result f1.
  • the data after the highest bit of the significant digit of the first candidate result f1 is aligned with the highest bit of the significant digit of the first rounding discrimination parameter ie can be expressed as ulp ⁇ f1.
  • the precise rounding unit may determine the third rounding determination parameter iep according to the first candidate result f1 and the first rounding determination parameter ie.
  • the deviation between the real number fr and the aforementioned f2 is recorded as the third distance (f2-fr), and the aforementioned f1 and The deviation between the real number fr of 1 and 2 is recorded as the fourth distance (fr-f1).
  • the third rounding discriminant parameter iep can represent the deviation between the square of the third distance and the square of the fourth distance.
  • the precise rounding unit can output the second candidate result f2 according to the third rounding judgment parameter iep being less than 0, that is, The calculation result is the second candidate result f2.
  • the precise rounding unit can output the third candidate result according to the second rounding judgment parameter ien being greater than or equal to 0, that is, The calculation result is the aforementioned f3.
  • the precise rounding unit can output the first candidate result f1 according to the third rounding judgment parameter iep being greater than or equal to 0, or the second rounding judgment parameter ien being less than 0, that is, The calculation result is the aforementioned first candidate result f1.
  • the floating point number calculation module can output the calculation result of the square root of the floating point number Z Among them, the calculation result of the square root of the floating point number Z is In the example, the sign bit is the same as the sign bit of the floating point number Z, and the mantissa is the output of the precise rounding unit.
  • the exponent offset value is the exponent offset output by the exponent processing unit. If the exponent EW of the floating point number W (the floating point number Z after normalization) is an even number, the exponent processing unit can output + Exponent offset. If the exponent EW of the floating point number W (the floating point number Z after normalization) is an odd number, the exponent processing unit can output (EW-1) + exponent offset.
  • Fig. 10 exemplarily shows a schematic diagram of the specific structure of some units in the floating point calculation module.
  • the high-order calculation unit may include a first table lookup circuit, a first square operation circuit and a first polynomial summation circuit.
  • the first table lookup circuit can receive the target first query parameter r1 and the target second query parameter r2 output by the pre-processing unit, and output the first fitting parameter a1, the second fitting parameter b1, and the third fitting parameter c1 corresponding to the target first query parameter r1 and the target second query parameter r2.
  • the first table lookup circuit can be implemented in many ways.
  • the first table lookup circuit may be connected to a first storage module storing a plurality of first fitting parameters a1, a plurality of second fitting parameters b1, and a plurality of third fitting parameters c1.
  • the first table lookup circuit may be connected to a pre-processing unit, and the first table lookup circuit may receive a target first query parameter r1 and a target second query parameter r2 output by the pre-processing unit.
  • the target first query parameter may be the first part (first part bit width) of the decimal part of the mantissa M1 of the floating point number W.
  • the target second query parameter is a partial bit width of the exponent EW of the floating point number W, and includes the lowest bit width of the exponent EW of the floating point number W.
  • the first polynomial coefficient lookup table may include a first odd-number lookup sub-table and a first even-number lookup sub-table.
  • the first odd-number lookup sub-table represents the first fitting parameter combination corresponding to the first query parameter when the second query parameter is an odd number.
  • the first even-number lookup sub-table includes the first fitting parameter combination corresponding to the first query parameter when the second query parameter is an even number.
  • the first table lookup circuit may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first even-number lookup sub-table based on the target second query parameter r2 being an even number.
  • the first table lookup circuit may search for the first fitting parameter combination corresponding to the target first query parameter r1 from the first odd-number lookup sub-table based on the target second query parameter r2 being an odd number.
  • the first table lookup circuit is implemented to search for the first fitting parameter combination corresponding to the target first query parameter combination from the first polynomial coefficient lookup table, thereby determining the first polynomial corresponding to the target first query parameter r1 and the target second query parameter r2.
  • the coefficients of the fitted equation are implemented to search for the first fitting parameter combination corresponding to the target first query parameter combination from the first polynomial coefficient lookup table, thereby determining the first polynomial corresponding to the target first query parameter r1 and the target second query parameter r2.
  • the first square operation circuit can be connected to the pre-processing unit, and can receive the high t1 bits (also the partial bit width of the target mantissa X) of the bit width of the decimal part of the mantissa M1 of the floating-point number W output by the pre-processing unit except the first partial bit width.
  • the first square operation circuit can calculate the square (X1) 2 of the second part X1 of the mantissa M1 of the floating-point number W, and output X1 2 .
  • the first polynomial summation circuit can be connected to the first table lookup circuit, to the pre-processing unit, and to the first square operation circuit.
  • the first polynomial summation circuit can receive the coefficients of the first polynomial fitting equation corresponding to the target first query parameter r1 and the target second query parameter r2 output by the first table lookup circuit.
  • the first polynomial summation circuit can receive the second part X1 of the mantissa M1 of the floating point number W output by the pre-processing unit.
  • the first polynomial summation circuit can receive (X1) 2 output by the first square operation circuit.
  • the first polynomial summation circuit may include a multiplier and an adder, or the first polynomial summation circuit may include a multiplier-adder.
  • the low-order calculation unit may include a first high-order reciprocal calculation circuit and a low-order operation circuit.
  • the first high-order reciprocal calculation circuit may include a second table lookup circuit, a second square operation circuit, and a second polynomial summation circuit.
  • the second table lookup circuit can receive the target third query parameter h1 and the target fourth query parameter h2 output by the pre-processing unit, and output the fourth fitting parameter a2, the fifth fitting parameter b2, and the sixth fitting parameter c2 corresponding to the target third query parameter h1 and the target fourth query parameter h2.
  • the second table lookup circuit can have multiple implementations.
  • the second table lookup circuit may be connected to a second storage module storing a plurality of fourth fitting parameters a2, a plurality of fifth fitting parameters b2, and a plurality of sixth fitting parameters c2.
  • the second table lookup circuit may be connected to the pre-processing unit, and the second table lookup circuit may receive the target third query parameter h1 and the target fourth query parameter h2 output by the pre-processing unit.
  • the target third query parameter is the third part (third part bit width) of the mantissa M1 of the floating point number W.
  • the target fourth query parameter h2 is the partial bit width of the exponent EW of the floating point number W, and includes the lowest bit width of the exponent EW of the floating point number W.
  • the second polynomial coefficient lookup table may include a second odd number lookup sub-table and a second even number lookup sub-table.
  • the second odd number lookup sub-table represents the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an odd number.
  • the second even number lookup sub-table includes the second fitting parameter combination corresponding to the third query parameter when the fourth query parameter is an even number.
  • the second table lookup circuit can search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second even number lookup subtable according to the target fourth query parameter h2 being an even number.
  • the second table lookup circuit can search for the second fitting parameter combination corresponding to the target third query parameter h1 from the second odd number lookup subtable according to the target fourth query parameter h2 being an odd number.
  • the second table lookup circuit is implemented to search for the second fitting parameter combination corresponding to the target second query parameter combination from the second polynomial coefficient lookup table, thereby determining the coefficients of the second polynomial fitting equation corresponding to the target third query parameter h1 and the target fourth query parameter h2.
  • the second square operation circuit can be connected to the pre-processing unit, and can receive the fourth part X2 of the mantissa M1 of the floating-point number W output by the pre-processing unit.
  • X2 is the high t2 bits of the bit width of the decimal part of the mantissa M1 of the floating-point number W except the bit width of the third part, and t2 is a positive integer.
  • the second square operation circuit can calculate the square (X2) 2 of the fourth part X2 of the mantissa M1 of the floating-point number W, and output (X2) 2 .
  • the second polynomial summation circuit can be connected to the second table lookup circuit, to the pre-processing unit, and to the second square operation circuit.
  • the second polynomial summation circuit can receive the coefficients of the second polynomial equation corresponding to the target second query parameter combination output by the second table lookup circuit.
  • the second polynomial summation circuit can receive the fourth part X2 of the mantissa M1 of the floating point number W output by the pre-processing unit.
  • the second polynomial summation circuit may receive (X2) 2 output by the second square operation circuit.
  • the second polynomial summation circuit may calculate the reciprocal of the high part fu of f according to the fourth fitting parameter a2, the fifth fitting parameter b2, the sixth fitting parameter c2, the fourth part X2 of the target mantissa X, and the square (X2) 2 of the fourth part X2 of the target mantissa X corresponding to the received target second query parameter combination.
  • the second polynomial summing circuit may include a multiplier and an adder.
  • the second polynomial summing circuit may include a multiplier and an adder.
  • the low-bit operation circuit in the low-bit calculation unit may include a third square operation circuit, a subtractor, a first multiplier, and a rounding circuit.
  • the third square operation circuit can be connected to the first polynomial summation circuit.
  • the third square operation circuit can calculate the square fu 2 of the high part fu of f, and output the square fu 2 of the high part fu of f.
  • the subtractor may be connected to the third square operation circuit and to the pre-processing unit.
  • the subtractor may receive the square fu 2 of the high part fu of f output by the third square operation circuit.
  • the subtractor may receive the target mantissa X output by the pre-processing unit.
  • the subtractor may calculate the difference Xfu 2 between the target mantissa X and the square fu 2 of the high part fu of f according to the difference Xfu 2 between the target mantissa X and the high part fu of f.
  • an adder may be used in the low-bit operation circuit to implement the function of the subtractor, and this application does not impose too many restrictions on this.
  • the first multiplier can be connected to the second polynomial summing circuit and to the subtractor.
  • the first multiplier can receive the output of the subtractor
  • the first multiplier can receive the inverse of the high part f u of f output by the second polynomial summation circuit
  • the first multiplier can be calculated to get
  • the rounding circuit can be connected to the first multiplier, and the rounding circuit can receive the output of the first multiplier. Based on the output of the first multiplier The rounding operation is performed to obtain the low-order part f l of f. Exemplarily, the rounding circuit can perform rounding operation on the output of the first multiplier. The high n+1 bits of f are added with "1", and the high n bits of the result of the addition are retained as the low-order part f l of f.
  • the precise rounding unit may include a rounding determination parameter calculation circuit, a selected result calculation circuit, and a calculation result selection circuit.
  • the rounding discrimination parameter calculation circuit may be connected to the first polynomial summation circuit, to the rounding circuit, and to the third square operation circuit.
  • the rounding discrimination parameter calculation circuit may receive the high part fu of f output by the first polynomial summation circuit.
  • the rounding discrimination parameter calculation circuit may receive the low part fl of f output by the rounding circuit.
  • the rounding discrimination parameter calculation circuit may receive the square fu2 of the high part fu of f output by the third square operation circuit.
  • the rounding discrimination parameter calculation circuit may have the ability to calculate the rounding discrimination parameter corresponding to at least one rounding mode.
  • the rounding determination parameter calculation circuit may calculate the first rounding determination parameter ie according to the low-order part fl of f , the high-order part fu of f, the square fu 2 of the high-order part fu of f , and the target mantissa X.
  • the rounding determination parameter calculation circuit may calculate the second rounding determination parameter ien and the third rounding determination parameter iep according to the low-order part fl of f , the high-order part fu of f, and the square fu 2 of the high-order part fu of f .
  • the rounding discrimination parameter calculation circuit outputs the sign bit of the first rounding discrimination parameter ie, which can characterize the positive or negative nature of the first rounding discrimination parameter ie.
  • the sign bit of the first rounding discrimination parameter ie can indicate that ie is a positive number (ie is greater than), or ie is a negative number (ie is less than 0), or ie is equal to 0.
  • the rounding discrimination parameter calculation circuit can output the sign bit of the second rounding discrimination parameter ien, which is used to indicate the positive or negative nature of the second rounding discrimination parameter ien.
  • the rounding discrimination parameter calculation circuit can output the sign bit of the third rounding discrimination parameter iep, which is used to indicate the positive or negative nature of the third rounding discrimination parameter iep.
  • the circuit for calculating the result to be selected can be connected to the rounding circuit and the first polynomial summing circuit.
  • the circuit for calculating the result to be selected can receive the low-order part fl of f output by the rounding circuit.
  • the circuit for calculating the result to be selected can receive the high-order part fu of f.
  • the circuit for calculating the result to be selected can calculate and output multiple results to be selected based on the low-order part fl of f and the high-order part fu of f.
  • the multiple results to be selected include a first result to be selected f1 and a second result to be selected f2.
  • the multiple results to be selected include the first result to be selected f1 and a third result to be selected f3.
  • the multiple results to be selected include the first result to be selected f1, the second result to be selected f2, and the third result to be selected f3.
  • the candidate result calculation circuit can output the first candidate result f1, the second candidate result f2, and the third candidate result f3.
  • the calculation result selection circuit can be connected to the rounding discrimination parameter calculation circuit and the to-be-selected result calculation circuit.
  • the calculation result selection circuit can receive the sign bit of the rounding discrimination parameter output by the rounding discrimination parameter calculation circuit.
  • the calculation result selection circuit can receive multiple to-be-selected results output by the to-be-selected result calculation circuit.
  • the calculation result selection circuit can select a candidate result from the received multiple candidate results according to the pre-configured rounding mode and the received rounding judgment parameter as the result of the calculation result selection.
  • the calculation result is obtained and output.
  • the pre-configured rounding method may be any one of the RH method, the RP method, and the RZ method.
  • the calculation result selection circuit can receive To the rounding mode configuration parameter, and according to the rounding mode corresponding to the rounding mode configuration parameter, combined with the rounding discrimination parameter corresponding to the rounding mode, select a selected result from multiple selected results as the square root of the target mantissa X, and output it.
  • the rounding mode represented by the first rounding configuration parameter is the RP mode.
  • the rounding mode represented by the second rounding configuration parameter is the RZ mode.
  • the rounding mode represented by the third rounding configuration parameter is the RH mode.
  • the rounding mode configuration parameter received by the calculation result selection circuit is the first rounding mode configuration parameter
  • the calculation result selection circuit can output the first candidate result f1 according to the first rounding judgment parameter ie being greater than or equal to 0.
  • the calculation result selection circuit can output the second candidate result f2 according to the first rounding judgment parameter ie being less than 0.
  • the rounding mode configuration parameter received by the calculation result selection circuit is the second rounding configuration parameter
  • the calculation result selection circuit can output the first selected result f1 according to the first rounding judgment parameter ie being less than or equal to 0.
  • the calculation result selection circuit can output the second selected result f2 according to the first rounding judgment parameter ie being greater than 0.
  • the rounding mode configuration parameter received by the calculation result selection circuit is the third rounding configuration parameter
  • the calculation result selection circuit can output the second selected result f2 according to the third rounding determination parameter iep being less than 0.
  • the calculation result selection circuit can output the third selected result f3 according to the second rounding determination parameter ien being greater than or equal to 0.
  • the calculation result selection circuit can output the first selected result f1 according to the third rounding determination parameter iep being greater than or equal to 0, or the second rounding determination parameter ien being less than 0.
  • Fig. 11 exemplarily shows a specific structural diagram of a precise rounding unit.
  • the rounding determination parameter calculation circuit may include a second multiplier, a first adder, a second adder, a third adder, and a fourth square operation circuit.
  • the second multiplier may be connected to the first polynomial summation circuit and the rounding circuit.
  • the second multiplier may receive the high part f u of f output by the first polynomial summation circuit.
  • the second multiplier may receive the low part f l of f output by the rounding circuit.
  • the fourth square operation circuit may be connected to the rounding circuit.
  • the fourth square operation circuit may receive the low-order part f1 of f output by the rounding circuit.
  • the fourth square operation circuit may calculate the square of the received low-order part f1 of f, obtain the square f12 of the low-order part f1 of f, and output it.
  • the first adder can be connected to the pre-processing unit, to the second multiplier, to the third square operation circuit, and to the fourth square operation circuit.
  • the first adder can receive the high t3 bits X3 of the mantissa of the floating point number Z output by the pre-processing unit.
  • the first adder can receive the first intermediate parameter k1 output by the second multiplier.
  • the first adder can receive the square fu2 of the high part fu of f output by the third square operation circuit.
  • the first adder can receive the square f12 of the low part f1 of f output by the fourth square operation circuit.
  • the first adder can output the sign bit of the first rounding determination parameter ie.
  • the second adder can be connected to the pre-processing unit, to the second multiplier, to the third square operation circuit, and to the fourth square operation circuit.
  • the second adder can receive the high t3 bits x3 of the mantissa of the floating point number Z output by the pre-processing unit.
  • the first adder can receive the first intermediate parameter k1 output by the second multiplier.
  • the second adder can receive the square fu2 of the high part fu of f output by the third square operation circuit.
  • the second adder can receive the square f12 of the low part f1 of f output by the fourth square operation circuit.
  • the second adder can output the sign bit of the second rounding determination parameter ien.
  • the third adder can be connected to the pre-processing unit, to the second multiplier, to the third square operation circuit, and to the fourth square operation circuit.
  • the third adder can receive the high t3 bits X3 of the mantissa of the floating point number Z output by the pre-processing unit.
  • the third adder can receive the first intermediate parameter k1 output by the second multiplier.
  • the third adder can receive the square fu2 of the high part fu of f output by the third square operation circuit.
  • the third adder can receive the square f12 of the low part f1 of f output by the fourth square operation circuit.
  • the third adder can output the sign bit of the third rounding determination parameter iep.
  • the circuit for calculating the result to be selected may include a fourth adder, a fifth adder, and a sixth adder.
  • the circuit for calculating the result to be selected may have multiple implementations.
  • the fifth adder may be connected to the first polynomial summation circuit and to the rounding circuit.
  • the fifth adder may receive the high part f u of f output by the first polynomial summation circuit.
  • the fifth adder may receive the low part f l of f output by the rounding circuit.
  • the sixth adder may be connected to the first polynomial summation circuit and the rounding circuit.
  • the sixth adder may receive the high part f u of f output by the first polynomial summation circuit.
  • the sixth adder may receive the low part f l of f output by the rounding circuit.
  • the fourth adder can be connected to the first polynomial summation circuit and the rounding circuit.
  • the fourth adder can receive the high part f u of f output by the first polynomial summation circuit.
  • the fourth adder can receive the low part f l of f output by the rounding circuit.
  • the fifth adder may be connected to the first polynomial summation circuit and to the rounding circuit.
  • the fifth adder may receive the high part f u of f output by the first polynomial summation circuit.
  • the fifth adder may receive the low part f l of f output by the rounding circuit.
  • the sixth adder may be connected to the fourth adder.
  • the sixth adder may receive the first candidate result f1 output by the fourth adder.
  • the calculation result selection circuit in the precise rounding unit can be connected to the first adder, the second adder, the third adder, the fourth adder, the fifth adder, and the sixth adder.
  • the calculation result selection circuit can receive the sign bit of the first input discrimination parameter ie output by the first adder.
  • the calculation result selection circuit can receive the sign bit of the second rounding discrimination parameter ien output by the second adder.
  • the calculation result selection circuit can receive the sign bit of the third rounding discrimination parameter iep output by the third adder.
  • the calculation result selection circuit can receive the first candidate result f1 output by the fourth adder.
  • the calculation result selection circuit can receive the second candidate result f2 output by the fifth adder.
  • the calculation result selection circuit can receive the third candidate result f3 output by the sixth adder.
  • the calculation result selection circuit may receive a rounding mode configuration parameter.
  • the process of the calculation result selection circuit outputting the calculation result of the square root of the target mantissa X can be referred to the relevant introduction in the above embodiment, which will not be repeated here.
  • FIG13 shows a floating point calculation module according to an exemplary embodiment.
  • the floating point calculation module may include a pre-processing unit, a high-order calculation unit, a low-order calculation unit, and a precise rounding unit.
  • the low-order calculation unit includes a second high-order reciprocal calculation circuit and the aforementioned low-order calculation circuit. run.
  • the second high-order inverse calculation circuit can be connected to the pre-processing unit and the high-order calculation unit.
  • the second high-order inverse calculation circuit can receive all or part of the bit width of the high-order part f u of f output by the high-order calculation unit.
  • the second high-order inverse calculation circuit can receive all or part of the bit width of the exponent EZ after the floating-point number Z output by the pre-processing unit is normalized.
  • the second high-order reciprocal calculation circuit may determine the target fifth query parameter g1 based on the high-order part f u of the square root f of the target mantissa X.
  • the target fifth query parameter g1 is a partial bit width (recorded as the fifth partial bit width) of the high-order part f u of f.
  • the target fifth query parameter g1 may be the high g1 bit (or low g1 bit) of the fractional part of the high-order part f u of f, g1 is a positive integer, and g1 is less than or equal to the full bit width of the fractional part of the high-order part f u of f.
  • the second high-order inverse calculation circuit can determine the coefficients of the third polynomial fitting equation corresponding to the target fifth query parameter g1 based on the target fifth query parameter g1, and the coefficients of the third polynomial fitting equation can include the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3.
  • the second high-order reciprocal calculation circuit can calculate the reciprocal of the high-order part fu of f according to the coefficients of the third polynomial fitting equation and the partial bit width of the high-order part fu of the square root f of the target mantissa X. in, The second high-order reciprocal calculation circuit can output the reciprocal of the high-order part f u of f g2 is the high g2 bits of the bit width of the high-order part f u of the decimal part of f excluding the bit width of the aforementioned fifth part, and g2 is a positive integer.
  • the second high-order reciprocal calculation circuit may obtain or be configured with a third polynomial coefficient lookup table.
  • the third polynomial coefficient lookup table may characterize the correspondence between multiple third fitting parameter combinations and multiple third query parameter combinations.
  • each third query parameter combination has a corresponding third fitting parameter combination.
  • Each third fitting parameter combination may include a seventh fitting parameter a3, an eighth fitting parameter b3, and a ninth fitting parameter c3.
  • the third query parameter combination may include a fifth query parameter.
  • the target fifth query parameter g1 received by the second high-order reciprocal calculation circuit may constitute a target third query parameter combination, and the second high-order reciprocal calculation circuit may find the third fitting parameter combination corresponding to the target third query parameter combination from the third polynomial coefficient lookup table, thereby realizing the coefficient of the third polynomial fitting equation corresponding to the target fifth query parameter g1.
  • the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 can be stored in the same third storage module.
  • the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 can be stored in three third storage modules respectively.
  • any two of the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 are stored in the same third storage module, and the other parameters are stored in another third storage module.
  • the third polynomial coefficient lookup table may include the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 corresponding to a preset number of third query parameter combinations.
  • the low-order operation circuit can be connected to the second high-order reciprocal calculation circuit, to the pre-processing unit, and to the high-order calculation unit.
  • the low-order operation circuit can receive the reciprocal of the high-order part f u output f by the second high-order reciprocal calculation circuit (i.e. ), the target mantissa X output by the pre-processing unit, and the high-order part f u output by the high-order calculation unit.
  • the low-order operation circuit can calculate the relationship between the high-order part f u and the low-order part f l according to the relationship between the high-order part f u and the low-order part f l.
  • the low-order part fl of f is calculated.
  • the low-order operation circuit can output the low-order part fl of f, and the low-order calculation unit can also output the low-order part fl of f .
  • the introduction of the precise rounding unit can refer to the above embodiment, which will not be repeated here.
  • the second high-order reciprocal calculation circuit uses all or part of the bit width of fu to obtain the reciprocal of the high-order part fu Compared with the first high-order reciprocal calculation circuit that utilizes all or part of the bit width of the target mantissa X, the circuit scale is smaller and occupies a smaller chip area.
  • FIG14 is a schematic diagram showing the specific structure of some units in the floating-point calculation module.
  • the specific structure of the high-order calculation unit can refer to the high-order calculation unit shown in FIG10, which will not be repeated here.
  • the second high-order reciprocal calculation circuit in the low-order calculation unit may include a third table lookup circuit, a fifth square operation circuit, and a third polynomial summation circuit.
  • the third table lookup circuit can receive the target fifth query parameter g1 output by the pre-processing unit, and output the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 corresponding to the target fifth query parameter g1.
  • the third table lookup circuit can be implemented in a variety of ways.
  • the third table lookup circuit may be connected to a third storage module storing the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3.
  • the third table lookup circuit may receive the target fifth query parameter g1 output by the high-order calculation unit.
  • the third lookup circuit may use the target third query parameter combination to query the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 corresponding to the target fifth query parameter g1 in the connected third storage module.
  • the second lookup circuit may output the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 corresponding to the found target fifth query parameter g1.
  • the fifth square operation circuit can be connected to the high-order calculation unit, and can receive the high g2 bits (i.e., the aforementioned g2) of the high-order part f u of the decimal part of f output by the high-order calculation unit.
  • the fifth square operation circuit can calculate the square (g2) 2 of the high g2 bits of the decimal part of the high-order part f u , and output (g2) 2 .
  • the third polynomial summation circuit can be connected to the third table lookup circuit, to the pre-processing unit, and to the second square operation circuit.
  • the third polynomial summation circuit can receive the seventh fitting parameter a3, the eighth fitting parameter b3, and the ninth fitting parameter c3 corresponding to the target third query parameter combination output by the third table lookup circuit.
  • the third polynomial summation circuit can receive the high g2 bit g2 of the decimal part of the high part fu of f output by the high bit calculation unit.
  • the third polynomial summation circuit can receive (g2) 2 output by the fifth square operation circuit.
  • the third polynomial summation circuit may include a multiplier and an adder, so that the third polynomial summation circuit can calculate the reciprocal of the high part fu of f according to the seventh fitting parameter a3, the eighth fitting parameter b3, the ninth fitting parameter c3, the high g2 bit g2 of the decimal part of the high part fu, and (g2) 2 corresponding to the received target third query parameter combination.
  • the specific structure of the low-bit operation circuit in the low-bit calculation unit can refer to the low-bit calculation unit shown in FIG10.
  • the low-bit operation circuit can include a third square operation circuit, a subtractor, a first multiplier, and a rounding circuit.
  • the first multiplier can be connected to the third polynomial summation circuit and to the subtractor.
  • the first multiplier can receive the output of the subtractor.
  • the first multiplier can receive the inverse of the high part f u of f output by the third polynomial summation circuit
  • the first multiplier can be calculated to get
  • the function of the subtractor in the low-bit operation circuit can be implemented by an adder.
  • the specific structure of the precise rounding unit can refer to the precise rounding unit provided in any of the aforementioned embodiments, and will not be repeated here.
  • FIG15 exemplarily shows a floating-point number calculation module.
  • the floating-point number calculation module may include a pre-processing unit, a high-order calculation unit, a low-order calculation unit, and an addition processing unit.
  • the low-order calculation unit may include the aforementioned first high-order inverse calculation circuit and a low-order operation circuit.
  • the floating-point number calculation module may also include an exponential processing unit.
  • the functions of each unit in the pre-processing unit, the high-order calculation unit, the low-order calculation unit, and the exponential processing unit can refer to the relevant introduction in any of the aforementioned embodiments, and will not be repeated here.
  • the sum processing unit may be connected to the high-order calculation unit to receive the high-order part f u of f output by the high-order calculation unit.
  • the sum processing unit may be connected to the low-order calculation unit to receive the low-order part f l of f output by the low-order calculation unit.
  • the sum processing unit may perform sum processing on the high-order part f u of f and the low-order part f l of f to determine f u +f l and obtain the square root of the target mantissa X.
  • the sum processing unit may perform sum processing on the high-order part f u of f and the low-order part f l of f.
  • the relevant introduction in FIG8 can be referred to for the sum processing process.
  • the sum processing unit may include the fourth adder in the aforementioned precise rounding unit, or the precise rounding unit in the embodiment of the present application may perform the function of the sum processing unit.
  • FIG16 exemplarily shows a floating-point number calculation module.
  • the floating-point number calculation module may include a pre-processing unit, a high-order calculation unit, a low-order calculation unit, and an addition processing unit.
  • the low-order calculation unit may include the aforementioned second high-order reciprocal calculation circuit and the low-order operation circuit.
  • the floating-point number calculation module may also include an exponential processing unit.
  • the functions of each unit in the pre-processing unit, the high-order calculation unit, the low-order calculation unit, and the exponential processing unit can refer to the relevant introduction in any of the aforementioned embodiments, and will not be repeated here.
  • the sum processing unit may be connected to the high-order calculation unit to receive the high-order part f u of f output by the high-order calculation unit.
  • the sum processing unit may be connected to the low-order calculation unit to receive the low-order part f l of f output by the low-order calculation unit.
  • the sum processing unit may perform sum processing on the high-order part f u of f and the low-order part f l of f to determine f u +f l and obtain the square root of the target mantissa X.
  • the sum processing unit may perform sum processing on the high-order part f u of f and the low-order part f l of f.
  • the relevant introduction in FIG8 can be referred to for the sum processing process.
  • the sum processing unit may include the fourth adder in the aforementioned precise rounding unit, or the precise rounding unit in the embodiment of the present application may perform the function of the sum processing unit.
  • the processor or calculator includes hardware structures and/or software modules corresponding to the execution of each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application scenario and design constraints of the technical solution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

本申请实施例提供一种浮点数平方根计算方法及浮点数计算模块,计算延迟较短,具有高吞吐率,及对应提供优化的舍入方法,可以输出满足IEEE754标准舍入方式的精确结果。浮点数平方根计算方法中通过确定目标尾数的平方根,确定第一浮点数的平方根的尾数,其中,目标尾数包括第一浮点数的尾数,所述第一浮点数为正常值化(normalized)的浮点数。分别计算目标尾数的平方根的高位位宽部分和低位位宽部分,确定目标尾数的平方根,所述目标尾数的平方根的小数部分为第一浮点数的平方根的尾数,从而实现确定第一浮点数的平方根的尾数。

Description

一种浮点数平方根计算方法及浮点数计算模块
相关申请的交叉引用
本申请要求在2022年10月13日提交中华人民共和国知识产权局、申请号为202211250294.4、申请名称为“一种浮点数平方根计算方法及浮点数计算模块”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子技术领域,尤其涉及一种浮点数平方根计算方法及浮点数计算模块。
背景技术
浮点数平方根计算已经发展为一种处理器支持的基础运算。目前,在支持浮点数计算的处理器中都有着广泛的应用。例如,中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、人工智能(artificial intelligence,AI)处理器等。浮点数平方根计算广泛地应用于数字信号处理,图形计算和高性能计算等多个领域。
现有浮点数的平方根求解方法中,如巴比伦方法或者牛顿-拉夫森方法。利用平方根求解方程,以初始近似值作为输入,每次迭代计算后可以得到非精确的全位宽平方根值。通过多轮迭代获得满足高精度要求的全位宽的计算结果。现有浮点数平方根求解过程中,存在迭代次数较多,收敛速度较慢的缺陷。
发明内容
本申请提供一种浮点数平方根计算方法及浮点数计算模块,计算延迟较短,具有高吞吐率。
第一方面,本申请提供一种浮点数平方根计算方法,可以由处理器、计算器、处理设备或者计算设备等执行或实施。下面以处理器执行本申请提供的浮点数平方根计算方法作为举例进行说明。处理器可以接收浮点数计算指令,该指令可以携带待计算浮点数(Z)。处理器可以获取目标尾数(X),目标尾数(X)包括第一浮点数(W)的尾数,所述第一浮点数(W)为正常值化(normalized)的浮点数,且待计算浮点数的数值与第一浮点数(W)的数值相同。待计算浮点数(Z)的尾数和阶码可以与第一浮点数(W)的尾数和阶码不同或者相同,也即待计算浮点数(Z)的表达格式与第一浮点数(W)的表达格式可以不同或者相同。一些应用场景中,处理器接收的待计算浮点数(Z)的格式与第一浮点数(W)的格式不同,处理器可以将待计算浮点数(Z)处理为第一浮点数(W),这个过程中不改变接收的浮点数的数值,仅改变待计算浮点数(Z)的格式。所述目标尾数(X)与第一浮点数(W)的尾数的关系可以为:若所述第一浮点数(W)的阶码为偶数,所述目标尾数(X)与所述第一浮点数(W)的尾数相同;若所述第一浮点数(W)的阶码为奇数,所述目标尾数(X)为所述第一浮点数(W)的尾数的Q倍,其中Q为浮点数的基数,Q为正数,且Q为偶数。例如,通过对第一浮点数(W)的尾数进行左移一位可以得到目标尾数(X)。
处理器可以根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu),所述第一位宽部分(fu)包含所述目标尾数(X)的平方根的最高位。处理器可以基于第一关系,所述第一位宽部分(fu)和所述目标尾数(X)的全部或部分位宽,计算所述目标尾数(X)的平方根的第二位宽部分(fl),其中,所述第一关系表征所述目标尾数(X)的平方根的第一位宽部分(fu)、所述目标尾数(X)以及所述目标尾数(X)的平方根的第二位宽部分(fl)之间的关系。处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,并将所述目标尾数(X)的平方根的小数部分确定为所述待计算浮点数(Z)的平方根的尾数。例如目标尾数(X)的平方根的小数部分的最高位可以确定为浮点数(Z)的平方根的尾数的整数部分,目标尾数(X)的平方根的小数部分除最高位之外的位宽确定为浮点数(Z)的平方根的尾数的小数部分。
本申请实施例中,处理器计算待计算浮点数(Z)的平方根的尾数部分,也是计算第一浮点数(W)的平方根的尾数,并且可以通过确定目标尾数(X)的平方根实现。处理器可以分别确定目标尾数(X)的平方根的高位部分和低位部分,也即第一位宽部分(fu)和第二位宽部分(fl)。处理器可以利用确定出的第一位宽部分(fu)和第二位宽部分(fl),确定目标尾数(X)的平方根。可见处理器确定目标尾数(X)的平方根过程中不需要迭代,从而计算延迟较短,具有高吞吐率。可选的,处理器可以并行确 定所述第一位宽部分(fu)和所述第二位宽部分(fl)。或者处理器可以串行确定所述第一位宽部分(fu)和所述第二位宽部分(fl)。例如,处理器确定第一位宽部分(fu)后,确定第二位宽部分(fl)。
可以理解的是,本申请实施例中,浮点数的尾数的部分位宽包括多个位宽时,多个位宽是连续的。也即部分位宽也指部分连续位宽。浮点数的部分可以为浮点数的尾数的一部分,尾数的一部分包括多个位宽时,多个位宽是连续的。
一种可能的实施方式中,所述第一关系符合如下关系:其中,X为所述目标尾数,fu为所述第一位宽部分,fl为所述第二位宽部分。本申请实施例中,处理器可以通过软件或者硬件方式,实现利用第一关系确定第二位宽部分(fl)的操作。本申请实施例对此不作过多限定。
一种可能的实施方式中,所述第二位宽部分(fl)包含所述目标尾数(X)的平方根的部分位宽,且包含所述目标尾数(X)的平方根的最低位,其中,所述第一位宽部分(fu)的位宽长度与所述第二位宽部分(fl)的位宽长度的总和大于或等于所述目标尾数(X)的平方根的全位宽长度。
本申请实施例中,第一位宽部分(fu)可指包含目标尾数(X)平方根的最高位的部分连续位宽。第二位宽部分(fl)可以指包含目标尾数(X)平方根的最低位的部分连续位宽。第一位宽部分的位宽(fu)与第二位宽部分(fl)的位宽的总和大于或等于目标尾数(X)平方根的全位宽。
一种可能的实施方式中,本申请实施例提供的浮点数计算方法还包括:处理器根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu)时,可以基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数,其中,所述目标第一查询参数(r1)为所述第一浮点数(W)的尾数的第一部分,所述目标第二查询参数(r2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽。处理器可以根据所述第一多项式拟合方程的系数和所述第一浮点数(W)的尾数的第二部分,计算所述第一位宽部分(fu),所述第一浮点数(W)的尾数的第二部分对应的位宽与所述第一浮点数(W)的尾数的第一部分对应的位宽不重叠。
本申请实施例中,处理器根据所述第一多项式拟合方程的系数和所述第一浮点数(W)的部分位宽确定所述第一位宽部分(fu)时,第一浮点数(W)的部分可以指第一浮点数(W)的部分位宽、部分位宽比特或者部分位宽数据。目标第一查询参数(r1)可以为第一浮点数(W)的尾数的第一部分。第一浮点数(W)的尾数的第二部分可以用于计算所述第一位宽部分(fu)。可选的,第一浮点数(W)的尾数的第二部分为第一浮点数(W)中除所述第一部分之外的部分位宽比特或者部分位宽数据。所述目标第二查询参数(r2)为所述第一浮点数(W)的阶码的部分位宽,所述第一浮点数(W)的阶码的部分位宽包括所述第一浮点数(W)的阶码的最低位宽,可见目标第二查询参数(r2)可以反映第一浮点数(W)的阶码的奇偶性。
一些示例中,处理器基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数时,可以若所述目标第二查询参数(r2)为奇数,从第一奇数查找子表中查询所述目标第一查询参数(r1)对应的第一多项拟合方程的系数,其中所述第一奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系。若所述目标第二查询参数(r2)为偶数,从第一偶数查找子表中查询所述目标第一查询参数(r1)对应的第一多项式拟合方程的系数,其中所述第一偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系。可选的,处理器可以获取或者配置所述第一奇数查找子表和所述第一偶数查找子表。这样的设计中可以减少处理器的处理开销。
另一些示例中,处理器可以获取或者配置第一多项式查找表,第一多项式查找表可以包括多个第一查询参数组合和多个第一拟合参数组合的对应关系。其中,一个第一查询参数组合可以作为一个索引。一个索引对应一个第一拟合参数组合,一个第一拟合参数组合包括一组第一多项式拟合方程的系数。处理器可以将目标第一查询参数(r1)和目标第二查询参数(r2)作为一个索引。从第一多项式系数查找表中,查找该索引对应的第一拟合参数组合。从而实现确定目标第一查询参数(r1)和目标第二查询参数(r2)对应的第一多项式拟合方程的系数。
一种可能的实施方式中,处理器可以利用第一浮点数(W)计算第一位宽部分(fu)的倒数。例如采用牛顿-拉夫森方法、斯维尼-罗伯森-托切尔算法(SRT算法)等计算第一位宽部分(fu)的倒数。为提升计算速度,减少计算开销,本申请还提供几种计算第一位宽部分(fu)的倒数的设计方案。
一种可能的设计中,处理器可以基于目标第三查询参数(h1)、目标第四查询参数(h2),确定预设 的第二多项式拟合方程的系数,其中,所述目标第三查询参数(h1)为所述第一浮点数(W)的尾数的第三部分,所述目标第四查询参数(h2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽。根据所述第二多项式拟合方程的系数和所述第一浮点数(W)的尾数的第四部分,确定所述第一位宽部分(fu)的倒数,所述第一浮点数(W)的尾数的第三部分对应的位宽与所述第一浮点数(W)的尾数的第四部分对应的位宽不重叠。本设计中,处理器可以并行计算所述第一位宽部分(fu)、所述第二位宽部分(fl)。处理器可以利用目标尾数(X)的平方根的倒数逼近所述第一位宽部分(fu)的倒数。
一些示例中,处理器若所述目标第四查询参数(h2)为奇数,从第二奇数查找子表中查询所述目标第三查询参数(h1)对应的第二多项拟合方程的系数,其中所述第二奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系。处理器若所述目标第四查询参数(h2)为偶数,从第二偶数查找子表中查询所述目标第三查询参数(h1)对应的第二多项式拟合方程的系数,其中所述第二偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系。
另一些示例中,处理器可以获取或者配置第二多项式查找表,第二多项式查找表可以包括多个第二查询参数组合和多个第二拟合参数组合的对应关系。其中,一个第二查询参数组合可以作为一个索引。一个索引对应一个第二拟合参数组合,一个第二拟合参数组合包括一组第二多项式拟合方程的系数。处理器可以将目标第三查询参数(h1)和目标第四查询参数(h2)作为一个索引。从第二多项式系数查找表中,查找该索引对应的第二拟合参数组合。从而实现确定目标第三查询参数(h1)和目标第四查询参数(h2)对应的第二多项式拟合方程的系数。
另一种可能的设计中,处理器可以基于目标第五查询参数(g1),确定预设的第三多项式拟合方程的系数,其中,所述目标第五查询参数(g1)为所述第一位宽部分(fu)的第五部分。根据所述第三多项式拟合方程的系数和所述第一位宽部分(fu)的第六部分,确定所述第一位宽部分(fu)的倒数,所述第一位宽部分(fu)的第五部分对应的位宽与所述第一位宽部分(fu)的第六部分对应的位宽不重叠。
示例性的,处理器可以获取或配置第三多项式查找表,第三多项式查找表可以包括多个第五查询参数和多个第三拟合参数组合的对应关系。处理器可以利用目标第五查询参数(g1)作为索引,在第三多项式查找表中,查找目标第五查询参数(g1)对应的第三拟合参数组合。从而实现确定目标第五查询参数(g1)对应的第三多项式拟合方程的系数。
一种可能的实施方式中,处理器基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根时,处理器可以对所述第一位宽部分(fu)和所述第二位宽部分(fl)加和处理,将加和处理后的结果确定为所述目标尾数(X)的平方根。处理器可以将所述目标尾数(X)的平方根的小数部分确定为所述第一浮点数(W)平方根的尾数。
一种可能的实施方式中,处理器基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根时,可以基于配置的舍入方式,确定目标尾数(X)的平方根。
处理器可以根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定两个待选结果;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第一舍入判别参数(ie),其中,所述第一舍入判别参数(ie)表征第一数值与所述目标尾数(X)之间的偏差,所述第一数值为所述目标尾数(X)的平方根的平方;根据所述第一舍入判别参数(ie)与预设数值的比较结果,从所述两个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
一种可能的设计中,处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie),其中第一舍入判别参数(ie)可以采用如下公式计算,ie=fu 2+fl 2+2×fu×fl-X,其中,ie为所述第一舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数。
在实际应用场景中,第一舍入判别参数(ie)可以是极小的正数或者极小的负数。处理器可以利用第一舍入判别参数(ie)的有效符号位以及有效符号位之后的所有位,进行待选结果选择。可选的,处理器可以利用(fu 2+fl 2+2×fu×fl)的低位部分与目标尾数(X)低位部分计算第一舍入判别参数(ie),可以减少电路开销,减小电路占片面积。
一个示例中,处理器可以执行向正值舍入(round toward positive,RP)方式。处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie),确定第一舍入判别参数 (ie)。处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定多个待选结果,多个待选结果可以包括第一待选结果f1和第二待选结果f2。其中,f1=fu+fl,f2=f1+ulp。ulp表征目标尾数(X)的平方根的全位宽中能够表达的最小的有效的数。处理器可以根据第一舍入判别参数(ie)与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数(X)的平方根。示例性的,预设数值可以配置为0。其中,处理器可以根据第一舍入判别参数(ie)大于或等于0,确定第一待选结果f1为目标尾数(X)的平方根。处理器可以根据第一舍入判别参数(ie)小于0,确定第二待选结果f2为目标尾数(X)的平方根。
另一个示例中,处理器可以执行向零舍入(round toward zero,RZ)方式。处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie)。处理器可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定多个待选结果,多个待选结果可以包括第一待选结果f1和第三待选结果f3。其中,f1=fu+fl,f3=f1-ulp。ulp表征目标尾数(X)的平方根的全位宽中能够表达的最小的有效的数。处理器可以根据第一舍入判别参数(ie)与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数(X)的平方根。示例性的,预设数值可以配置为0。其中,处理器可以根据第一舍入判别参数(ie)小于或等于0,确定第一待选结果f1为目标尾数(X)的平方根。处理器可以根据第一舍入判别参数(ie)大于0,确定第三待选结果f3为目标尾数(X)的平方根。
一种可能的设计中,处理器基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根时,可以根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果,所述多个待选结果包括第一待选结果、第二待选结果以及第三待选结果,其中,所述第二待选结果大于所述第一待选结果,所述第一待选结果大于所述第三待选结果;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第二舍入判别参数(ien),其中,所述第二舍入判别参数(ien)表征第一距离的平方与第二距离的平方之间的偏差,所述第一距离为所述第一待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第二距离表征所述目标尾数(X)的平方根的实数与所述第三待选结果之间的距离;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第三舍入判别参数(iep),其中,所述第三舍入判别参数(iep)表征第三距离的平方与第四距离的平方之间的偏差,所述第三距离为所述第二待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第四距离表征所述目标尾数(X)的平方根的实数与所述第一待选结果之间的距离;根据所述第二舍入判别参数(ien)与预设数值的比较结果,以及所述第三舍入判别参数(iep)与预设数值的比较结果,从所述多个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
可选的,所述第二待选结果与所述第一待选结果的差值小于或等于一个最小精度单位。所述第一待选结果与所述第三待选结果的差值小于或等于一个最小精度单位。
本申请实施例中,处理器可以执行舍入到最接近值(round half,RH)方式。可选的,第二舍入判别参数(ien)与第一舍入判别参数(ie)之间的关系可以用ien=ie-ulp×f1,其中ie=fu 2+fl 2+2×fu×fl-X。第三舍入判别参数(iep)与第一舍入判别参数(ie)之间的关系可以用iep=ie+ulp×f1,其中,ie=fu 2+fl 2+2×fu×fl-X。处理器在舍入到最接近值(RH)方式中,可以基于公式确定目标尾数(X)的平方根,其中else可指iep≥0的情形,或者ien<0的情形。
可选的,处理器可以计算前述第一舍入判别参数(ie),并利用第一舍入判别参数(ie)计算第二舍入判别参数(ien)和第三舍入判别参数(iep),可以减少电路开销,优化电路占片面积。
第二方面,本申请实施例还提供一种浮点数计算模块,可以用于浮点数计算场景中,例如计算浮点数平方根。本申请实施例提供的浮点数计算模块可以应用在处理器或者计算器中,实现处理器或者计算器执行浮点数计算的功能。所述浮点数计算模块用于接收浮点数计算指令,所述指令携带待计算浮点数(Z);获取目标尾数(X),所述目标尾数(X)包括第一浮点数(W)的尾数,所述第一浮点数(W)为正常值化的浮点数,所述第一浮点数(W)的数值与所述待计算浮点数(Z)的数值相同;所述浮点数计算模块包括:高位计算单元,用于根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu),所述第一位宽部分(fu)包含所述目标尾数(X)的平方根的最 高位;低位计算单元,用于基于第一关系,所述第一位宽部分(fu)和所述目标尾数(X)的全部或部分位宽,计算所述目标尾数(X)的平方根的第二位宽部分(fl),其中,所述第一关系表征所述目标尾数(X)的平方根的第一位宽部分(fu)、所述目标尾数(X)以及所述目标尾数(X)的平方根的第二位宽部分(fl)之间的关系;精确舍入单元,用于基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,并将所述目标尾数(X)的平方根的小数部分确定为所述待计算浮点数(Z)的平方根的尾数。
本申请实施例中,浮点数计算模块计算第一浮点数(W)的平方根的尾数,可以通过确定目标尾数(X)的平方根实现。浮点数计算模块可以分别确定目标尾数(X)的平方根的第一位宽部分(fu)和第二位宽部分(fl),利用确定出的第一位宽部分(fu)和第二位宽部分(fl),确定目标尾数(X)的平方根。可见浮点数计算模块确定目标尾数(X)的平方根过程中不需要迭代,从而计算延迟较短,具有高吞吐率。可选的,高位计算单元和低位计算单元可以并行工作,或者高位计算单元和低位计算单元之间串行工作。
一种可能的实施方式中,若所述第一浮点数(W)的阶码为偶数,所述目标尾数(X)与所述第一浮点数(W)的尾数相同;若所述第一浮点数(W)的阶码为奇数,所述目标尾数(X)为所述第一浮点数(W)的尾数的Q倍,其中Q为浮点数的基数,Q为正数,且Q为偶数。
一种可能的实施方式中,所述第一关系符合如下关系:其中,X为所述目标尾数,fu为所述第一位宽部分,fl为所述第二位宽部分。
一种可能的实施方式中,所述第二位宽部分(fl)包含所述目标尾数(X)的平方根的部分位宽,且包含所述目标尾数(X)的平方根的最低位,其中,所述第一位宽部分(fu)的位宽长度与所述第二位宽部分(fl)的位宽长度的总和大于或等于所述目标尾数(X)的平方根的全位宽长度。
一种可能的实施方式中,所述高位计算单元具体用于:基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数,其中,所述目标第一查询参数(r1)为所述第一浮点数(W)的尾数的第一部分,所述目标第二查询参数(r2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽;根据所述第一多项式拟合方程的系数和所述第一浮点数(W)的尾数的第二部分,计算所述第一位宽部分(fu),所述第一浮点数(W)的尾数的第二部分对应的位宽与所述第一浮点数(W)的尾数的第一部分对应的位宽不重叠。
示例性的,所述高位计算单元可以包括第一查表电路、第一平方运算电路、第一多项式求和电路。第一查表电路可以与存储模块耦合。存储模块或者存储电路,用于存储第一奇数查找子表和第一偶数查找子表,其中所述第一奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系,所述第一偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系。
第一查表电路可以在所述目标第二查询参数(r2)为奇数的情形下,从第一奇数查找子表中查询所述目标第一查询参数(r1)对应的第一多项拟合方程的系数。或者第一查表电路可以在所述目标第二查询参数(r2)为偶数的情形下,从第一偶数查找子表中查询所述目标第一查询参数(r1)对应的第一多项式拟合方程的系数。
第一平方运算电路可以基于第一浮点数(W)的尾数的第二部分,确定第一浮点数(W)的尾数的第二部分的平方。第一多项式求和电路可以利用第一查表电路查询到的第一多项式拟合方程的系数、第一浮点数(W)的尾数的第二部分、第一浮点数(W)的尾数的第二部分的平方,计算目标尾数(X)的第一位宽部分(fu)。
一种可能的实施方式中,低位计算单元可以利用第一浮点数(W)计算第一位宽部分(fu)的倒数。例如采用牛顿-拉夫森方法、斯维尼-罗伯森-托切尔算法(SRT算法)等计算第一位宽部分(fu)的倒数。为提升计算速度,减少计算开销,本申请还提供几种计算第一位宽部分(fu)的倒数的设计方案。
一种可能的设计中,低位计算单元可以包括第一高位倒数计算电路和低位运算电路。第一高位倒数计算电路可以基于目标第三查询参数(h1)、目标第四查询参数(h2),确定预设的第二多项式拟合方程的系数,其中,所述目标第三查询参数(h1)为所述第一浮点数(W)的尾数的第三部分,所述目标第四查询参数(h2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽。根据所述第二多项式拟合方程的系数和所述第一浮点数(W)的尾数的第四部分,确定所述第一位宽部分(fu)的倒数,所述第一浮点数(W)的尾数的第三部分对应的位宽与所述第一浮点数(W) 的尾数的第四部分对应的位宽不重叠。本设计中,第一高位倒数计算电路可以并行计算所述第一位宽部分(fu)、所述第二位宽部分(fl)。第一高位倒数计算电路可以利用目标尾数(X)的平方根的倒数逼近所述第一位宽部分(fu)的倒数。低位运算电路用于利用所述第一位宽部分(fu)、所述第一位宽部分(fu)的倒数、所述目标尾数(X)之间的关系,确定所述第二位宽部分(fl)。
本申请实施例中,低位计算单元计算第一位宽部分(fu)的倒数的过程可以与高位计算单元计算第一位宽部分(fu)的过程是并行的。实现高位计算单元和低位计算单元并行工作。
一些示例中,第一高位倒数计算电路在所述目标第四查询参数(h2)为奇数的情形中,从第二奇数查找子表中查询所述目标第三查询参数(h1)对应的第二多项拟合方程的系数,其中所述第二奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系。第一高位倒数计算电路在所述目标第四查询参数(h2)为偶数的情形中,从第二偶数查找子表中查询所述目标第三查询参数(h1)对应的第二多项式拟合方程的系数,其中所述第二偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系。
另一些示例中,第一高位倒数计算电路可以获取或者配置第二多项式查找表,第二多项式查找表可以包括多个第二查询参数组合和多个第二拟合参数组合的对应关系。其中,一个第二查询参数组合可以作为一个索引。一个索引对应一个第二拟合参数组合,一个第二拟合参数组合包括一组第二多项式拟合方程的系数。第一高位倒数计算电路可以将目标第三查询参数(h1)和目标第四查询参数(h2)作为一个索引。从第二多项式系数查找表中,查找该索引对应的第二拟合参数组合。从而实现确定目标第三查询参数(h1)和目标第四查询参数(h2)对应的第二多项式拟合方程的系数。
可见,第一高位倒数计算电路的执行过程可以与高位计算单元并行的,从而低位计算单元可以与高位计算单元并行。
一种可能的设计中,低位计算单元可以包括第二高位倒数计算电路和低位运算电路。第二高位倒数计算电路可以基于目标第五查询参数(g1),确定预设的第三多项式拟合方程的系数,其中,所述目标第五查询参数(g1)为所述第一位宽部分(fu)的第五部分。根据所述第三多项式拟合方程的系数和所述第一位宽部分(fu)的第六部分,确定所述第一位宽部分(fu)的倒数,所述第一位宽部分(fu)的第五部分对应的位宽与所述第一位宽部分(fu)的第六部分对应的位宽不重叠。低位运算电路用于利用所述第一位宽部分(fu)、所述第一位宽部分(fu)的倒数、所述目标尾数(X)之间的关系,确定所述第二位宽部分(fl)。
示例性的,第二高位倒数计算电路可以获取或配置第三多项式查找表,第三多项式查找表可以包括多个第五查询参数和多个第三拟合参数组合的对应关系。处理器可以利用目标第五查询参数(g1)作为索引,在第三多项式查找表中,查找目标第五查询参数(g1)对应的第三拟合参数组合。从而实现第二高位倒数计算电路确定目标第五查询参数(g1)对应的第三多项式拟合方程的系数。
可见,第二高位倒数计算电路与高位计算单元为串行关系,从而低位计算单元可以与高位计算单元串行。
一种可能的实施方式中,精确舍入单元可以对所述第一位宽部分和所述第二位宽部分加和处理,所述加和处理的结果为所述目标尾数(X)的平方根。
一种可能的实施方式中,精确舍入单元可以根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定两个待选结果;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第一舍入判别参数(ie),其中,所述第一舍入判别参数(ie)表征第一数值与所述目标尾数(X)之间的偏差,所述第一数值为所述目标尾数(X)的平方根的平方;根据所述第一舍入判别参数(ie)与预设数值的比较结果,从所述两个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
一种可能的设计中,精确舍入单元可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie),其中第一舍入判别参数(ie)可以采用如下公式计算,ie=fu 2+fl 2+2×fu×fl-X,其中,ie为所述第一舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数。
在实际应用场景中,第一舍入判别参数(ie)可以是极小的正数或者极小的负数。精确舍入单元可以利用第一舍入判别参数(ie)的有效符号位以及有效符号位之后的所有位,进行待选结果选择。可选的,精确舍入单元可以利用(fu 2+fl 2+2×fu×fl)的低位部分与目标尾数(X)低位部分计算第一舍 入判别参数(ie),可以减少电路开销,减小电路占片面积。
一个示例中,精确舍入单元可以执行向正值舍入(RP)方式。精确舍入单元可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie),确定第一舍入判别参数(ie)。精确舍入单元可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定多个待选结果,多个待选结果可以包括第一待选结果f1和第二待选结果f2。其中,f1=fu+fl,f2=f1+ulp。ulp表征目标尾数(X)的平方根的全位宽中能够表达的最小的有效的数。精确舍入单元可以根据第一舍入判别参数(ie)与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数(X)的平方根。示例性的,预设数值可以配置为0。其中,精确舍入单元可以根据第一舍入判别参数(ie)大于或等于0,确定第一待选结果f1为目标尾数(X)的平方根。精确舍入单元可以根据第一舍入判别参数(ie)小于0,确定第二待选结果f2为目标尾数(X)的平方根。
另一个示例中,精确舍入单元可以执行向零舍入(RZ)方式。精确舍入单元可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定第一舍入判别参数(ie)。精确舍入单元可以基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定多个待选结果,多个待选结果可以包括第一待选结果f1和第三待选结果f3。其中,f1=fu+fl,f3=f1-ulp。ulp表征目标尾数(X)的平方根的全位宽中能够表达的最小的有效的数。精确舍入单元可以根据第一舍入判别参数(ie)与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数(X)的平方根。示例性的,预设数值可以配置为0。其中,精确舍入单元可以根据第一舍入判别参数(ie)小于或等于0,确定第一待选结果f1为目标尾数(X)的平方根。精确舍入单元可以根据第一舍入判别参数(ie)大于0,确定第三待选结果f3。为目标尾数(X)的平方根。
一种可能的设计中,精确舍入单元基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根时,可以根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果,所述多个待选结果包括第一待选结果、第二待选结果以及第三待选结果,其中,所述第二待选结果大于所述第一待选结果,所述第一待选结果大于所述第三待选结果;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第二舍入判别参数(ien),其中,所述第二舍入判别参数(ien)表征第一距离的平方与第二距离的平方之间的偏差,所述第一距离为所述第一待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第二距离表征所述目标尾数(X)的平方根的实数与所述第三待选结果之间的距离;基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第三舍入判别参数(iep),其中,所述第三舍入判别参数(iep)表征第三距离的平方与第四距离的平方之间的偏差,所述第三距离为所述第二待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第四距离表征所述目标尾数(X)的平方根的实数与所述第一待选结果之间的距离;根据所述第二舍入判别参数(ien)与预设数值的比较结果,以及所述第三舍入判别参数(iep)与预设数值的比较结果,从所述多个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
可选的,所述第二待选结果与所述第一待选结果的差值小于或等于一个最小精度单位。所述第一待选结果与所述第三待选结果的差值小于或等于一个最小精度单位。
本申请实施例中,精确舍入单元可以执行舍入到最接近值(RH)方式。可选的,第二舍入判别参数(ien)与第一舍入判别参数(ie)之间的关系可以用ien=ie-ulp×f1,其中ie=fu 2+fl 2+2×fu×fl-X。第三舍入判别参数(iep)与第一舍入判别参数(ie)之间的关系可以用iep=ie+ulp×f1,其中,ie=fu 2+fl 2+2×fu×fl-X。精确舍入单元在舍入到最接近值(RH)方式中,可以基于公式确定目标尾数(X)的平方根,其中else可指iep≥0的情形,或者ien<0的情形。
可选的,精确舍入单元可以计算前述第一舍入判别参数(ie),并利用第一舍入判别参数(ie)计算第二舍入判别参数(ien)和第三舍入判别参数(iep),可以减少电路开销,优化电路占片面积。
一些应用场景中,精确舍入单元预先配置多种舍入方式。精确舍入单元可以获取舍入方式配置参数;按照所述舍入方式配置参数对应的舍入方式,所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果;以及按照所述舍入方式配置参数对应的舍入方式,基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X),计算舍入判别参数;并基于所述判舍入判别参数与预设 数值的比较结果,从所述多个待选结果选择一个待选结果作为所述目标尾数(X)的平方根。本示例中,精确舍入单元可以执行前述实施例中提供的任意一种舍入方式,此处不再赘述。
第三方面,本申请实施例还提供一种处理装置,可以包括第一寄存器、第二寄存器以及如第二方面及其任一设计中的浮点数计算模块。所述第一寄存器存储有待计算浮点数。所述浮点数计算模块用于从所述第一寄存器中获取所述待计算浮点数;以及计算所述待计算浮点数的平方根的尾数。所述第二寄存器用于存储所述待计算浮点数的平方根的尾数。
一种可能的设计中,所述处理装置还包括第三寄存器;所述第三寄存器存储有舍入方式配置参数;所述浮点数计算模块,还用于获取所述舍入方式配置参数,执行所述舍入方式配置参数对应的舍入方式。
可以理解的是,为了实现上述方法实施例中功能,处理器或者计算器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的模块及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
附图说明
图1为一种浮点数的格式示意图;
图2A为一种五级流水线的示意图;
图2B为本申请实施例提供的一种处理装置的结构示意图;
图3为本申请实施例提供的浮点数平方根计算方法;
图4为一种对浮点数处理的示意图;
图5为一种浮点数计算模块的结构示意图;
图6为一种尾数的高位部分和低位部分的示意图;
图7为一种存储模块与拟合参数的关系示意图;
图8为加和处理的示意图;
图9为多个待选结果间关系的示意图;
图10为一种浮点数计算模块的具体结构示意图;
图11为一种精确舍入单元的具体结构示意图;
图12为另一种精确舍入单元的具体结构示意图;
图13为另一种浮点数计算模块的结构示意图;
图14为另一种浮点数计算模块的具体结构示意图;
图15为又一种浮点数计算模块的结构示意图;
图16为又一种浮点数计算模块的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。
在计算机中,以近似表示任意某个实数称之为浮点数。浮点数通常由尾数(mantissa)和指数偏移值(也称阶码,exponent)作为组合来表示。例如,浮点数可以通过尾数与某个基数的整数次指数的乘积进行标识。
电子与电气工程师协会(institute of electrical and electronics engineers,IEEE)754标准定义浮点运算标准和表达形式,也是目前最为广泛支持和使用的二进制浮点数算数标准。其中,标准IEEE 754中规定了浮点数由符号位(sign bit),指数偏移值(exponent bits)和尾数值(mantissa bits)组合而成。标准IEEE 754中,浮点数具有多种类型,如单精确度(single precision,SP)浮点数、双精确度(double precision,DP)浮点数、延伸单精确度浮点数、延伸双精确度浮点数等。不同类型的浮点数具有不同的位宽,浮点数的全位宽可包括符号位、指数偏移值的全部位宽、尾数值的全部位宽。示例性的,单精确度浮点数的位宽为32位(bit),双精确度浮点数的位宽为64位,延伸单精确度浮点数的位宽为43位,延伸双精确度浮点数的位宽为79位。作为示例,图1中示出单精确度浮点数,单精确度浮点数的全位宽为32位。其中,第0位至第22位表征尾数数值,第23位至第30位表征指数偏移值,第31位表征符号位。
目前计算浮点数的平方根的方法可以包括两类方法,一类方法为按方程迭代方法求解浮点数的平方根,如巴比伦方法或者牛顿-拉夫森方法。利用平方根求解方程,以初始近似值作为输入,每次迭代计算后可以得到非精确的全位宽平方根值。通过多轮迭代获得满足高精度要求的全位宽的计算结果。但是有着迭代次数较多,收敛速度较慢的缺陷。
另一类方法为按位迭代求解浮点数的平方根。这类方法中,每次迭代计算的结果是固定比特位精确的非全位宽结果。其中,固定比特位数有斯维尼-罗伯森-托切尔算法(Sweeney-Robertson-Tocher algorithm,SRT)基础迭代运算器件所选择的基值(radix)有关,如果基值为4(=22),意味着迭代运算器件每次得出的精确结果为2位。如果最终精确要求双精度浮点要求,那么运算迭代次数至少是26次。按位迭代求解浮点数的平方根方式,吞吐率较低,难以实现流水化处理。
有鉴于此,本申请实施例提供一种浮点数平方根计算方法及浮点数计算模块,计算延迟较短,具有高吞吐率。
本申请实施例提供一种浮点数平方根计算方法可以由浮点数计算模块实施。浮点数计算模块可以应用于处理器(或者计算器)中。例如,应用于CPU、GPU或者数字信号处理器(digital signal processor,DSP)中。
示例性的,通常CPU采用五级流水线100执行计算任务。如图2A所示,五级流水线可以包括取指101、译码102、执行103、访存104和写回105,共五个阶段。取指101阶段可以提取指令。译码102阶段可以将提取的指令转译为可识别运算的指令和参数。执行103阶段可指逻辑运算和数学运算阶段。一些场景中,由CPU实施本申请实施例提供的浮点数平方根计算方法的情形下,CPU可以在执行103阶段实施本申请实施例提供的浮点数平方根计算方法。访存104阶段中CPU可以与存储模块之间交互指令,例如从存储模块中读取或者由存储模块存储数据。写回105阶段可以将最终的输出结果更新到寄存器中。图2B示出一种本申请实施例处理装置的结构示意图,所述处理装置可以实施为CPU、GPU、AI处理器等。处理装置200可以包括寄存器组201和本申请提供的浮点数计算模块202。寄存器组201可以包括多个寄存器。多个寄存器中第一寄存器可以存储有待计算浮点数。浮点数计算模块202可以从第一寄存器中获取待计算浮点数,以及计算待计算浮点数的平方根。多个寄存器中第二寄存器可以存储待计算浮点数的平方根。
一种可能的设计中,多个寄存器中可以包括第三寄存器。第三寄存器可以存储舍入配置方式。浮点数计算模块202可以从第三寄存器获取舍入方式配置参数,执行舍入方式配置参数对应的舍入方式,得到待计算浮点数的平方根。
可选的,处理装置200可以包括控制模块203。控制模块203可以执行前述取指101、译码102等过程。可选的,处理装置200可以包括存储模块204,存储模块204可以包括缓存,用于存储数据。可选的,处理装置200可以包括整数计算模块205,用于处理整数运算。可选的,处理装置200还可以包括其它运算模块206,可以执行逻辑运算,例如逻辑移位运算。一些场景中,其它运算模块206可以实施为图像专用计算模块。或者其它运算模块206可以执行大型阵列的乘加法运算。本申请对此不作具体限定。可选的,处理装置200还可以包括I/O接口207。
图3示出本申请实施例提供的浮点数平方根计算方法,可以由处理器(或者计算器)执行。本申请实施例提供的浮点数平方根计算方法,可以包括如下步骤:
步骤S100,接收浮点数计算指令,所述指令携带待计算浮点数。
步骤S101,获取目标尾数,目标尾数包括第一浮点数的尾数,所述第一浮点数为正常值化的浮点数所述第一浮点数的数值与所述待计算浮点数的数值相同。
待计算浮点数与第一浮点数仅表达形式不同,也即浮点数的尾数和阶码不同。由此可见,计算浮点数Z的平方根,也是计算浮点数W的平方根。处理器可以获取或者接收待计算平方根结果的浮点数Z(也即前述待计算浮点数),浮点数Z可以为正常值化的浮点数,也可以为非正常值化的浮点数。便于介绍,将目标尾数记为目标尾数X,第一浮点数记为浮点数W。本申请实施例中,一个“尾数”的全位宽可以包括整数部分和小数部分。便于介绍,尾数的整数部分和小数部分依次排列。其中,在整数部分中,由最高位至最低位排列;在小数部分中,由最高位至最低位排列。
处理器可以对浮点数Z进行正常值化(或者称规格化),得到第一浮点数,也即浮点数W。浮点数W与浮点数Z是相同数值,仅表达形式不同。通常对一个浮点数正常值化处理,可使该浮点数的尾数的整数部分不为0。其中,第一浮点数W的尾数记为尾数M1,阶码记为E0。
一般来说,浮点数W的平方根的符号与该浮点数相同。浮点数W平方根的计算包括两个部分,分别为浮点数W平方根的尾数计算和浮点数W平方根的阶码计算。浮点数W平方根的阶码与浮点数W的阶码EW的关系为,浮点数W平方根的阶码为在计算机中,通常用浮点数的指数偏移值指示浮点数的阶码。浮点数W的平方根的其中,指数偏移量与浮点数W的精度类型有关。示例性的,浮点数W为单精度浮点数,则指数偏移量为127;浮点数W为双精度浮点数,则指数偏移量为1023。由此可见,求解浮点数W的平方根的关键路径(critical path)在于求解浮点数W的尾数的平方根。
本申请实施例中,处理器获取的目标尾数X与浮点数W的关系为目标尾数X包括浮点数W的尾数,目标尾数X可以包括浮点数W的尾数的全部数据。若所述第一浮点数W的阶码为偶数,所述目标尾数X与所述第一浮点数W尾数相同;若所述第一浮点数W的阶码为奇数,所述目标尾数X为所述第一浮点数W的尾数的Q倍,其中Q为浮点数的基数,Q为正数,且Q为偶数。例如,通过对第一浮点数W的尾数进行左移一位可以得到目标尾数X。
下面以浮点数的基数为2作为举例介绍,若浮点数W的阶码EW为偶数,且为正数,则浮点数W平方根的阶码为EW。在此情形中,目标尾数X与浮点数W的尾数相同。也即目标尾数X的整数部分与浮点数W的尾数的整数部分相同,且目标尾数X的小数部分与浮点数W的小数部分相同。目标尾数X的平方根的小数部分为浮点数W平方根的尾数。
若浮点数W的阶码EW为奇数,且为正数,则浮点数W平方根的阶码为(EW-1)。在此情形中,目标尾数X为浮点数W的尾数的2倍,其中2为浮点数的基数。通过对浮点数W的尾数左移一位可以得到目标尾数X。目标尾数X的平方根的小数部分为浮点数W平方根的尾数。
处理器可以对浮点数W进行阶码奇偶判定处理、阶码变换处理、尾数变换处理中的一种或多种,得到目标尾数X。下面举例介绍,处理器可以对接收的待计算浮点数也即浮点数Z进行正常值化处理,转化为浮点数W。浮点数W的阶码记为阶码EW和尾数M1。便于介绍,下面以处理器执行浮点数计算处理时,浮点数的基数为2的场景作为举例进行说明。
如图4中的(a)所示,处理器可以对浮点数W的阶码EW进行阶码奇偶判定处理。一种可能的情形中,若阶码EW为奇数,处理器可以对尾数M1进行第一尾数变换处理,如对尾数M1进行乘Q操作,得到尾数M2。示例性的,Q为2。如图4中的(b)所示,尾数M1包括整数部分和小数部分。黑色方框示出整数部分的比特,白色方框示出小数部分的比特。浮点数W的尾数M1的整数部分为1个比特,尾数M1的整数部分为第s1位示出的比特,尾数M1的小数部分为第0位至第v位示出的比特。此时尾数M1的数值范围为[1,2)。
处理器对尾数M1进行乘2操作,也即将尾数M1的各比特的向左移一位,得到尾数M2。此时尾数M2的整数部分为2个比特,尾数M2的整数部分为第s1位和第s2位示出的比特,尾数M2的小数部分为第0位至第v位示出的比特。此时尾数M2的数值范围为[2,4)。可以理解的是,尾数M2的整数部分相比于尾数M1的整数部分额外增添一个比特,补齐IEEE754格式缺省的整数位。
在此情形中,也即浮点数W的阶码EW为奇数的情形中,目标尾数X为浮点数W的尾数M1的2倍,也即目标尾数X与尾数M2相同。目标尾数X的整数部分可以包括2个比特。此时目标尾数X的数值范围为[2,4)。
另一种可能的情形中,浮点数W的阶码EW为偶数的情形中,目标尾数X与浮点数W的尾数M1相同。如图4中的(c),浮点数W的尾数M1的整数部分为1个比特,尾数M1的整数部分为第s1位示出的比特,尾数M1的小数部分为第0位至第v位示出的比特。此时尾数M1的数值范围为[1,2)。浮点数W的尾数与浮点数Z正常值化后的尾数M1相同,则浮点数W的尾数的数值范围为[1,2)。从而目标尾数X的数值范围为[1,2)。可选的,目标尾数X的整数部分相比于尾数M1的整数部分可以额外增添一个比特,配置为0,补齐IEEE754格式缺省的整数位。如增加第s2位比特,且数值配置为0。这样的操作不改变目标尾数X的数值范围。
可见,在浮点数W的阶码EW为奇数时,或者浮点数W的阶码EW为偶数时,目标尾数X均包括浮点数W的尾数M1。
通过上述介绍可以明晰处理器执行浮点数计算处理时,浮点数的基数为2的场景中,目标尾数X可以为预设集合内的任意一个数值。示例性的,所述预设集合可以为[1,4)。所述预设集合的最小数值可以为1,预设集合的最大数值可以接近4,但预设集合不包括4。
本申请实施例提供的浮点数平方根计算方法中,目标尾数X的平方根的小数部分为浮点数W的平方根中的尾数。通过确定目标尾数X的平方根,实现确定浮点数W的平方根的尾数,从而得到浮点数W的平方根的计算结果。便于介绍,将目标尾数X的平方根也即记为f。f可以为定点数,包括整数部分和小数部分。处理器可以通过确定出的目标尾数X的平方根的小数部分,即f的小数部分,实现确定浮点数W的平方根的尾数部分。
虽然图3中示出步骤S102和步骤S103为并行关系,但并不意味着处理器只能并行执行步骤S102和步骤S103中的操作。在一些应用场景中,处理器可以串行地执行步骤S102和步骤S103中的操作。在另一些应用场景中,处理器可以并行执行步骤S102和步骤S103中的操作。可以理解的是并行执行可以包括但不限于同时开始执行、同步开始执行。在预设时长内,同步或者异步地开始执行步骤S102中的操作和步骤S103中的操作,也可以视为并行执行步骤S102和步骤S103中的操作。
步骤S102,根据所述目标尾数的全部或部分位宽,确定所述目标尾数的平方根的第一位宽部分,所述第一位宽部分包含所述目标尾数的平方根的最高位。
便于区分的实数与处理器确定的目标尾数X的平方根,本申请中目标尾数X的平方根记为f可以表征处理器确定出的目标尾数X的平方根。目标尾数X的平方根f的第一位宽部分包括目标尾数X的平方根f的最高位。由于目标尾数X的平方根f为定点数,包括整数部分和小数部分。整数部分和小数部分可以依次排列,目标尾数X的平方根f的最高位也是目标尾数X的平方根f的整数部分的最高位。
示例性的,目标尾数X的数值范围为[1,4),目标尾数X的平方根f的数值范围为[1,2),目标尾数X的平方根f的整数部分小于2,则目标尾数X的平方根f的整数部分的全位宽可以为1。本申请中,目标尾数X的平方根f的全位宽可以理解为平方根f的有效数据部分的全位宽。目标尾数X的平方根f的整数部分的最高位也可以理解为目标尾数X的平方根f的整数部分有效部分的最高位。所述第一位宽部分可以称为目标尾数X的平方根f的高位部分,也可以简称为f的高位部分fu。f的高位部分fu可以指f的高m位,高m位可指f的全位宽中由最高位向最低位方向上的前m位,也可以理解为f的全位宽中最高m位。可见,目标尾数X的平方根的第一位宽部分的位宽为m。f的高位部分fu可以理解为目标尾数X的平方根的近似计算结果。可选的,m为正整数,且m为小于或等于f的全位宽的数值。
一种可能的实现方式中,处理器可以配置有预设的尾数全部或部分位宽与尾数的平方根的第一位宽部分的对应关系。处理器可以基于目标尾数X的全部或部分位宽,预设的尾数全部或部分位宽与第一位宽部分的对应关系,将所述目标尾数X的全部或部分位宽所对应的第一位宽部分,作为所述目标尾数X的平方根f的第一位宽部分,也即f的高位部分fu。但预设尾数全部或部分位宽与尾数的平方根的第一位宽部分的对应关系通常需要占用较大存储资源,处理器查询速度较慢。
另一种可能的实现方式中,处理器可以基于多项式近似的方式,根据目标尾数X的全部或部分位宽,确定目标尾数X的平方根f的高位部分fu。由于目标尾数X包括浮点数W的尾数M1的全部位宽,处理器可以根据浮点数W的尾数M1的全部位宽或部分位宽,确定目标尾数X的平方根f的高位部分fu
一种可能的设计中,处理器可以基于浮点数W的尾数M1确定目标第一查询参数r1,处理器可以基于浮点数W的阶码EW确定目标第二查询参数r2。本申请实施例中,目标第一查询参数可以为浮点数W的尾数M1小数部分中的第一部分(第一部分位宽)。示例性的,目标第一查询参数可以为浮点数W的尾数M1小数部分的高nr1位(或者低nr1位),nr1为正整数,且nr1小于或等于浮点数W的尾数M1小数部分的全位宽。所述目标第二查询参数为浮点数W阶码EW的部分位宽,且包括浮点数W的阶码EW的最低位宽。目标第二查询参数r2可以为浮点数W的阶码EW的低nr2位,nr2为正整数,且nr2为小于或等于阶码EW的全部位宽,可见阶码EW的低nr2位包括阶码EW的最低位。可选的,目标第二查询参数r2可以为浮点数W的阶码EW的低1位比特的数据,该数据可以反映出阶码EW为奇数或者偶数。
处理器可以基于目标第一查询参数r1和目标第二查询参数r2,确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数,第一多项式拟合方程的系数可以包括第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。
处理器可以根据第一多项式拟合方程的系数和浮点数W的尾数M1小数部分中全部或部分位宽,计算f的高位部分fu。示例性的,处理器可以根据第一多项式拟合方程的系数,浮点数W的尾数M1小数部分中第二部分(第二部分位宽),计算f的高位部分fu,其中,fu=a1×(X1)2+b1×X1+c1。 X1为浮点数W的尾数M1小数部分的第二部分(第二部分位宽),并且所述浮点数W的尾数M1小数部分中的第二部分对应的位宽与所述浮点数W的尾数M1小数部分中第一部分对应的位宽不重叠。可选的,X1为浮点数W的尾数M1小数部分中除前述第一部分的位宽之外其它位宽中的高t1位,t1为正整数。本申请实施例中,处理器利用浮点数W的尾数M1小数部分的第一部分确定多项式拟合方程的系数,利用浮点数W的尾数M1小数部分的第二部分参与多项式拟合方程的计算,确定目标尾数X的平方根的近似解,即f的高位部分fu
处理器可以获取或者配置有第一多项式系数查找表。第一多项式系数查找表可以包括多个第一拟合参数组合与多个第一查询参数组合的对应关系。每个第一拟合参数组合可以包括第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。处理器可以采用包括但不限于如下示例A1、A2中任一示例中方式,确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数。
示例A1、
一种可能的实施方式中,第一多项式系数查找表可以包括第一奇数查找子表和第一偶数查找子表。第一奇数查找子表中表征第二查询参数为奇数时第一查询参数对应的第一拟合参数组合。第一偶数查找子表中包括第二查询参数为偶数时第一查询参数对应的第一拟合参数组合。
处理器可以根据目标第二查询参数r2为偶数,从第一偶数查找子表中,查找目标第一查询参数r1对应的第一拟合参数组合。或者,处理器可以根据目标第二查询参数r2为奇数,从第一奇数查找子表中,查找目标第一查询参数r1对应的第一拟合参数组合。实现处理器从第一多项式系数查找表中查找出目标第一查询参数组合对应的第一拟合参数组合,从而实现确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数。
可见,本实施方式中,处理器可以利用目标第二查询参数r2作为第一索引,确定第一偶数查找子表或者第一奇数查找子表。并能利用目标第二查询参数r1作为第二索引,从确定出的子表中,查找对应的第一拟合参数组合。
示例A2、
另一种可能的实施方式中,第一多项式系数查找表可以包括多个第一拟合参数组合与多个第一查询参数组合的对应关系,其中,一个第一查询参数组合可以作为一个索引。一个索引对应一个第一拟合参数组合。处理器可以将目标第一查询参数r1和目标第二查询参数r2作为一个索引,从第一多项式系数查找表中,查找该索引对应的第一拟合参数组合。从而实现确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数。
步骤S103,基于第一关系,所述第一位宽部分和所述目标尾数的全部或部分位宽,计算所述目标尾数的平方根的第二位宽部分,其中,所述第一关系表征所述目标尾数的平方根的第一位宽部分、所述目标尾数以及所述目标尾数的平方根的第二位宽部分之间的关系。
本申请实施例中,所述第二位宽部分可以包含所述目标尾数X的平方根尾数的最低位,且所述第二位宽部分的位数与所述第一位宽部分的位数的总和,大于或等于所述目标尾数X的平方根的位数,其中,第一位宽部分的位宽与第二位宽部分的位宽的总和大于或等于目标尾数X的平方根的全位宽。
本申请实施例中,目标尾数X的平方根的第二位宽部分,也可以称为目标尾数X的平方根的低位部分fl。f的低位部分fl可指f的低n位,低n位可指f的全位宽中由最高位开始向最低位方向上后n位。可见,目标尾数X的平方根的第二位宽部分的位宽为n。可选的,n为正整数,且n为小于f的全位宽的数值。m与n的总和大于或等于f的全位宽。
f的低位部分fl、f的高位部分fu以及f的之间的关系为f2=(fu+fl)2,结合f的低位部分fl、f的高位部分fu以及目标尾数X的关系为简化f的低位部分fl的计算过程,利用已知量和有限个数的变量求解。本申请实施例中的第一关系,也即f的低位部分fl、f的高位部分fu以及目标尾数X的第一关系可以配置为
处理器可以基于f的低位部分fl、f的高位部分fu以及目标尾数X的第一关系,以及步骤S101中的目标尾数X,步骤S102中确定出的f的高位部分fu,计算得到f的低位部分fl
可选的,若处理器利用第一关系计算得到的的位宽大于n,处理器可以保留高n位作为f的低位部分fl。也即对处理器利用第一关系计算得到的fl的位宽的高n+1位进行四舍五入。例如,处理器可以将的高n+1位与“1”进行加和处理,并保留加和处理的结果高n位,作为f 的低位部分fl
本申请实施例中,f的高位部分fu的倒数可以通过近似逼近。下面对处理器确定f的高位部分fu的倒数的过程进行举例说明。处理器可以采用包括但不限于如下方式一和方式二中的任一操作,确定f的高位部分fu的倒数
方式一、
为提升计算速度,处理器可以并行确定f的高位部分fu以及确定f的高位部分fu的倒数处理器可以确定f的高位部分fu可以参见步骤S102中的相关介绍,此处不再赘述。处理器可以基于多项式近似的方式,根据目标尾数X的全部或部分位宽,确定目标尾数X的平方根f的高位部分fu的倒数
处理器可以基于浮点数W的尾数M1确定目标第三查询参数h1,处理器可以基于浮点数W的阶码EW确定目标第四查询参数h2。本申请实施例中,所述目标第三查询参数为浮点数W的尾数M1的第三部分(第三部分位宽)。示例性的,目标第三查询参数h1可以指浮点数W的尾数M1小数部分的高nh1位(或者低nh1位),nh1为正整数,且nh1小于或等于浮点数W的尾数M1的小数部分的全位宽。所述目标第四查询参数h2为浮点数W的阶码EW的部分位宽,并且包括浮点数W的阶码EW的最低位宽。示例性的,目标第四查询参数可以指浮点数W的阶码EW的高nh2位(或者低nh2位),h2为正整数,且nh2为小于或等于浮点数W的阶码EW的全部位宽。可选的,目标第四查询参数h2可以为浮点数W的阶码EW的低1位比特的数据,该数据可以反映出阶码EW为奇数或者偶数。
处理器可以基于目标第三查询参数h1和目标第四查询参数h2,确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数,第二多项式拟合方程的系数可以包括第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。
处理器可以根据第二多项式拟合方程的系数,浮点数W的小数部分的全部或部分位宽,计算得到f的高位部分fu的倒数示例性的,处理器可以根据第二多项式拟合方程的系数,浮点数W的小数部分的第四部分(第四部分位宽),计算得到f的高位部分fu的倒数其中,c2。处理器可以输出f的高位部分fu的倒数X2为浮点数W的尾数小数部分的第四部分(第四部分位宽),并且所述浮点数W的小数部分的第四部分对应的位宽与所述浮点数W的小数部分的第三部分对应的位宽不重叠。可选的,X2为浮点数W的小数部分中除前述第三部分位宽之外的位宽中的高t2位,t2为正整数。本申请实施例中,处理器利用浮点数W的小数部分的第三部分确定多项式拟合方程的系数,利用浮点数W的小数部分的第四部分参与多项式拟合方程的计算,确定f的高位部分fu的倒数
处理器可以获取或者配置有第二多项式系数查找表。第二多项式系数查找表可以表征多个第二拟合参数组合与多个第二查询参数组合的对应关系。每个第二拟合参数组合可以包括第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。处理器可以采用包括但不限于如下示例B1、B2中任一示例中方式,确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数。
示例B1、
一种可能的实施方式中,第二多项式系数查找表可以包括第二奇数查找子表和第二偶数查找子表。第二奇数查找子表中表征第四查询参数为奇数时第三查询参数对应的第二拟合参数组合。第二偶数查找子表中包括第四查询参数为偶数时第三查询参数对应的第二拟合参数组合。
处理器可以根据目标第四查询参数h2为偶数,从第二偶数查找子表中,查找目标第三查询参数h1对应的第二拟合参数组合。或者,处理器可以根据目标第四查询参数h2为奇数,从第二奇数查找子表中,查找目标第三查询参数h1对应的第二拟合参数组合。实现处理器从第二多项式系数查找表中查找出目标第二查询参数组合对应的第二拟合参数组合,从而实现确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数。
可见,本实施方式中,处理器可以利目标第四查询参数h2作为第三索引,确定第二偶数查找子表或者第二奇数查找子表。并能利用目标第三查询参数h1作为第四索引,从确定出的子表中,查找对应的第二拟合参数组合。
示例B2、
另一种可能的实施方式中,第二多项式系数查找表可以包括多个第二拟合参数组合与多个第二查询 参数组合的对应关系,其中,一个第二查询参数组合可以作为一个索引,且一个索引对应一个第二拟合参数组合。处理器可以将目标第三查询参数h1和目标第四查询参数h2作为一个索引,从第二多项式系数查找表中,查找该索引对应的第二拟合参数组合。从而实现确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数。
方式二、
为简化处理器中的计算电路,处理器可以串行地确定f的高位部分fu以及确定f的高位部分fu的倒数处理器可以确定f的高位部分fu可以参见步骤S102中的相关介绍,此处不再赘述。处理器可以基于多项式近似的方式,根据f的高位部分fu的全部或部分位宽,确定目标尾数X的平方根f的高位部分fu的倒数
处理器可以基于目标尾数X的平方根f的高位部分fu,确定目标第五查询参数g1。处理器可以基于目标第五查询参数g1,确定预设的第三多项式拟合方程的系数,其中,所述目标第五查询参数g1为f的高位部分fu的第五部分(第五部分位宽)。示例性的,目标第五查询参数g1可以为f的高位部分fu的小数部分的高g1位(或者低g1位),g1为正整数,且g1小于或等于f的高位部分fu的小数部分的全位宽。
处理器可以根据所述第三多项式拟合方程的系数和所述第一位宽部分的第六部分(第六位宽部分),确定f的高位部分fu的倒数f的高位部分fu的第五部分对应的位宽与f的高位部分fu的第六部分对应的位宽不重叠。示例性的,处理器可以根据第三多项式拟合方程的系数,f的高位部分fu的全部或部分位宽,计算得到f的高位部分fu的倒数其中,可选的,g2为f的高位部分fu小数部分中除前述第五部分位宽之外的位宽中的高g2位,g2为正整数。
处理器可以基于目标第五查询参数g1,确定目标第五查询参数g1对应的第三多项式拟合方程的系数,第三多项式拟合方程的系数可以包括第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。下面以处理器基于目标第五查询参数g1,确定目标第五查询参数g1对应的第三多项式拟合方程的系数,第三多项式拟合方程的系数可以包括第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。作为举例进行介绍。
处理器可以采用包括但不限于如下示例C1中方式,确定目标第五查询参数g1对应的第三多项式拟合方程的系数。
示例C1、
一种可能的实施方式中,处理器可以获取或者配置有第三多项式系数查找表。第三多项式系数查找表可以表征多个第三拟合参数组合与多个第五查询参数的对应关系。其中,各第五查询参数具有对应的第三拟合参数组合。每个第三拟合参数组合可以包括第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。处理器可以从第三多项式系数查找表中查找出目标第五查询参数对应的第三拟合参数组合,从而实现目标第五查询参数g1对应的第三多项式拟合方程的系数。
f的低位部分fl、f的高位部分fu以及目标尾数X的关系可以配置为处理器可以基于f的低位部分fl、f的高位部分fu以及目标尾数X的关系,以及步骤S101中的目标尾数X,步骤S102中确定出的f的高位部分fu,计算得到f的低位部分fl
可以理解的是,处理器可以采用包括但不限于前述方式一、方式二中提供的方式计算f的高位部分fu的倒数,还可以采用现有技术计算f的高位部分fu的倒数。例如,处理器可以采用SRT方法计算f的高位部分fu的倒数。本申请对此不作过多限定。
步骤S104,基于所述第一位宽部分和所述第二位宽部分,确定所述目标尾数的平方根,并将所述目标尾数的平方根的小数部分确定为所述待计算浮点数的平方根的尾数。
一种可能的实施方式中,处理器可以将第一位宽部分和第二位宽部分进行加和操作,得到目标尾数X的平方根,并将所述目标尾数X的平方根的小数部分确定为所述浮点数W的平方根的尾数。
另一种可能的实施方式中,处理器可以基于配置的舍入方式、所述目标尾数X、所述第一位宽部分、所述第二位宽部分,计算所述舍入方式对应的舍入判别参数;以及基于所述第一位宽部分、所述第二位宽部分,计算所述舍入方式对应的多个待选结果;并根据所述舍入判别参数与预设数值的比较结果,从所述多个待选结果中选择一个待选结果作为所述目标尾数X的平方根。
一些场景中,处理器可以配置一种舍入方式,其中,所述配置的舍入方式可以为如下中的任意一种:舍入到最接近值RH,向正值舍入RP,向零舍入RZ中任意一种舍入方式。可选的,RH方式、RP方式、RZ方式可以为IEEE 754中规定的舍入方式。
另一些场景中,处理器可以配置多种舍入方式,其中,所述多个舍入方式可以为RP方式、RH方式、RZ方式中的至少两种。所述多种舍入方式与多个舍入方式配置参数一一对应,例如,第一舍入配置参数表征(或者对应)的舍入方式为RP方式。第二舍入配置参数对应的舍入方式为RZ方式。第三舍入配置参数对应的舍入方式为RH方式。
处理器可以根据接收的舍入方式配置参数,执行接收的舍入方式配置参数对应的舍入方式。一个示例中,处理器可以根据接收到的舍入配置参数为第一舍入配置参数,执行RP方式。另一种示例中,处理器可以根据接收到的舍入配置参数为第二舍入配置参数,执行RZ方式。又一种示例中,处理器可以根据接收到的舍入配置参数为第三舍入配置参数,执行RH方式。
下面介绍处理器执行RP方式的过程。处理器可以基于所述第一位宽部分(也即fu)、所述第二位宽部分(也即fl)以及目标尾数X的部分位宽,确定第一舍入判别参数ie。第一舍入判别参数ie可以表征第一数值与目标尾数X的偏差,所述第一数值为所述目标尾数X的平方根的平方,也即f。
示例性的,第一舍入判别参数ie可以采用公式ie=fu 2+fl 2+2×fu×fl-X计算得到。其中,X为目标尾数X。
处理器可以基于所述第一位宽部分(也即fu)、所述第二位宽部分(也即fl),确定多个待选结果,多个待选结果可以包括第一待选结果f1和第二待选结果f2。其中,其中,f1=fu+fl,f2=f1+ulp。ulp表征的计算结果的全位宽中能够表达的最小的有效的数。
处理器可以根据第一舍入判别参数ie与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数X的平方根。示例性的,预设数值可以配置为0。
处理器可以根据第一舍入判别参数ie大于或等于0,确定第一待选结果f1为目标尾数X的平方根。处理器可以根据第一舍入判别参数ie小于0,确定第二待选结果f2为目标尾数X的平方根。
下面介绍处理器执行RZ方式的过程。处理器可以基于所述第一位宽部分(也即fu)和所述第二位宽部分(fl),确定第一舍入判别参数ie,其中ie=fu 2+fl 2+2×fu×fl-X。其中,X为目标尾数X。
处理器可以基于所述第一位宽部分和所述第二位宽部分,确定多个待选结果,多个待选结果可以包括第一待选结果f1和第三待选结果f3。其中,f1=fu+fl,f3=f1-ulp。ulp表征的计算结果的全位宽中能够表达的最小的有效的数。
处理器可以根据第一舍入判别参数ie与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数X的平方根。示例性的,预设数值可以配置为0。
处理器可以根据第一舍入判别参数ie小于或等于0,确定第一待选结果f1为目标尾数X的平方根。处理器可以根据第一舍入判别参数ie大于0,确定第三待选结果f3。为目标尾数X的平方根。
假设目标尾数X无精度损失的实数平方根为fr,在RZ和RP方式中,通常根据f和fr比较结果,从多个待选结果中选择一个待选结果。但是考虑计算的便利性和比较值大小的目的性,通过比较f2和fr2也可以获得相同的比较结果,即可以计算f2和目标尾数X之间的差值,这样只需要计算f2。第一舍入判别参数ie可以表征f2和目标尾数X之间的差值。
在RP方式中,处理器可以基于公式确定目标尾数X的平方根f。在RZ方式中,处理器可以基于公式确定目标尾数X的平方根f。
下面介绍处理器执行RH方式的过程。处理器可以根据第一位宽部分(也即fu)和所述第二位宽部分(fl),确定多个待选结果,所述多个待选结果包括前述第一待选结果f1、第二待选结果f2以及第三待选结果f3。通过前述介绍可知f1=fu+fl,f2=f1+ulp,f3=f1-ulp。可见第二待选结果f2大于第一待选结果f1,第一待选结果f1大于第三待选结果f3。
假设目标尾数X无精度损失的实数平方根为fr,在RH方式时,可以通过比较可能取到两个待选结果分别到fr之间的距离,将这两个待选结果到fr之间的距离最小的待选结果确定为目标尾数X的平方根。
在第一舍入参数ie大于0的情形中,此时,有f1>fr>f3。将fr与前述f1之间的偏差记为第一距 离(f1-fr),f3与fr之间的偏差记为第二距离(fr-f3)。第一距离与第二距离的偏差为(f1-fr)-(fr-f3),记为第一偏差。舍入判别参数ie1可以为第一距离的平方与第二距离的平方的差值,则有ie1=[(f1-fr)-(fr-f3)]×[(f1-fr)+(fr-f3)],也即ie1=(f1-fr)2-(fr-f3)2
(f1-fr)+(fr-f3)的结果为正数,第一偏差的正负性(也即第一偏差为正数、0、负数的情况),与舍入判别参数ie1的正负性(ie1为正数、0、负数的情况)相同。对ie1进行等式变换,有ie1=[(2f1-ulp)2-4fr2]/4=ie-ulp×f1+ulp2/4。可见,舍入判别参数ie1可以基于第一舍入判别参数ie和f1进行计算得到。考虑到第一舍入判别参数ie的位宽以及ulp×f1运算后的位宽,其中最小的有效数字是在小数点后2N位,N为目标尾数X的小数部分的位宽。而ulp2/4的有效数字是在小数点后2N+2位,可见ulp2/4位于第一舍入判别参数ie以及计算ulp×f1这两个运算的有效数据范围之外,从ie1中去掉ulp2/4的结果,与ie1的符号相同,并且不为零。由此处理器可以确定第二舍入判别参数ien用于RH方式,其中第二舍入判别参数ien与第一舍入判别参数ie之间的关系可以用ien=ie-ulp×f1。第二舍入判别参数ien可以表征第一距离的平方与第二距离平方之间的偏差。
在第一舍入参数ie小于0的情形中,此时有f2>fr>f1。将fr与前述f2之间的偏差记为第三距离(f2-fr),前述f1与fr之间的偏差记为第四距离(fr-f1)。第三距离与第二距离的偏差为(f2-fr)-(fr-f1),记为第二偏差。舍入判别参数ie2可以为第三距离的平方与第四距离的平方的差值,则有ie2=[(f2-fr)-(fr-f1)]×[(f2-fr)+(fr-f1)],也即ie2=(f2-fr)2-(fr-f1)2
(f2-fr)+(fr-f1)的结果为整数,第二偏差的正负性(也即第二偏差为正数、0、负数的情况),与舍入判别参数ie2的正负性(ie2为正数、0、负数的情况)相同。对ie2进行等式变换,有ie2=[(2f1+ulp)2-4fr2]/4=ie+ulp×f1+ulp2/4。可见,舍入片别参数ie2可以基于第一舍入判别参数ie和f1进行计算得到。考虑到ie的位宽以及ulp×f1运算后的位宽,其中最小的有效数字是在小数点后2N位,N为目标尾数X的小数部分的位宽。而ulp2/4的有效数字是在小数点后2N+2位,可见ulp2/4位于确定ie以及计算ulp×f1这两个运算的有效数据范围之外,从ie2中去掉ulp2/4的结果,与ie2的符号相同,并且不为零。由此处理器可以确定第三舍入判别参数iep用于RH方式,其中第三舍入判别参数iep与第一舍入判别参数ie之间的关系可以用iep=ie+ulp×f1。第三舍入判别参数iep可以表征第三距离的平方与第四距离的平方的差值。
可见,处理器可以基于所述第一位宽部分(也即fu)和所述第二位宽部分(fl),确定第二舍入判别参数ien和第三舍入判别参数iep,其中,ien=ie-ulp×f1,iep=ie+ulp×f1,ie=fu 2+fl 2+2×fu×fl-X。其中,X为目标尾数X。
处理器可以根据第二舍入判别参数ien与预设数值的比较结果以及第三舍入判别参数iep与预设数值的比较结果,从多个待选结果中选择一个待选结果,并将选择的待选结果确定为目标尾数X的平方根。示例性的,预设数值可以配置为0。
处理器可以根据第三舍入判别参数iep小于0,确定第二待选结果f2为目标尾数X的平方根。处理器可以根据第二舍入判别参数ien大于或等于0,确定第三待选结果f3为目标尾数X的平方根。处理器可以根据第三舍入判别参数iep大于或等于0,或者第二舍入判别参数ien小于0,确定第一待选结果f1为目标尾数X的平方根。
本申请实施例中,处理器确定出的第一舍入判别参数ie、第二舍入判别参数ien、第三舍入判别参数iep,可以保障目标尾数X的平方根与的实数之间的误差小于1ulp(2-N),其中N为目标尾数X的小数部分的位宽,也即ulp为最小精度单位(unit of least precision,ulp),ulp表征目标尾数X的平方根(的计算结果)的全位宽中能够表达的最小的有效的数。在RP方式中,确定出的目标尾数X的平方根f是从第一待选结果f1和第二待选结果f2中进行选择。在RZ方式中,定出的目标尾数X的平方根f是从第一待选结果f1和第三待选结果f3中进行选择。在RH方式中,确定出的目标尾数X的平方根f是从第一待选结果f1、第二待选结果f2和第三待选结果f3中进行选择。
在第一舍入参数ie=0的情形中,可以直接选择f1作为目标尾数X的平方根f。
处理器可以基于第二舍入判别参数ien的正负性,从第一待选结果f1和第三待选结果f3中选择一个作为目标尾数X的平方根f。处理器可以基于公式确定目标尾数X的平方根f。
处理器可以基于第三舍入判别参数iep的正负性,从第一待选结果f1和第二待选结果f2中选择一个作为目标尾数X的平方根f。处理器可以基于公式确定目标尾数X的平方根f。
由于iep<0时,ie是小于0的。ien≥0时,ie是大于0的。处理器在RH方式中,可以基于公式确定目标尾数X的平方根f,其中else可指iep≥0的情形,或者ien<0的情形。
本申请实施例中处理器确定出的第二舍入判别参数ien和第三舍入判别参数iep可以保障计算精度,并且具有更小的计算量。
通过上述对处理器可以执行的舍入方式的介绍可知,利用目标尾数X的平方根的高位部分和低位部分计算目标尾数X的平方根f时,不会出现IEEE754的舍入标准中的tie to even以及tie to away的情况。从而处理器可以不需要配置执行tie to even以及tie to away的舍入方式的电路,可以减小处理器电路面积。
处理器可以基于所述阶码EW,确定浮点数W的平方根的指数偏移值。示例性的,若浮点数W的阶码EW为偶数,且为正数,则处理器可以将指数偏移量确定为浮点数W的平方根的指数偏移值。若浮点数W的阶码EW为奇数,且为正数,则处理器可以将+指数偏移量确定为浮点数W的平方根的指数偏移值。
处理器可以输出浮点数W的平方根其中,的符号位与所述浮点数W的符号位相同,的尾数与所述目标尾数X的平方根的小数部分相同。
为保障处理器确定的目标尾数X的平方根与的实数之间的误差小于1ulp,目标尾数X的小数部分的位宽可以被配置为N位,第一位宽部分的全位宽可以被配置为d+2位,此时第一位宽部分的小数部分的位宽为d+1位,其中,d与N的关系可以符合预设条件,预设条件可以为
下面结合确定目标尾数X的平方根过程中的误差进行说明。一种可能的实施方式中,处理器利用目标尾数X的部分位宽,例如,目标尾数X小数部分的高t4位(记为Xt),确定所述第一位宽部分(即fu),其中t4为正整数,且t4小于目标尾数X的全位宽。此时fu的小数部分的位宽为d+1位。此时目标尾数X与Xt的关系为,X=Xt+Xr,Xr表征目标尾数X除了高t4位的部分,由于X∈[1,4),此时Xr∈[0,2-(d+1))。处理器采用前述方式一中的操作,基于Xt确定所述第一位宽部分的倒数(即),此时的小数部分的位宽为d+1位。
处理器确定第一位宽部分(即fu)过程中产生的误差为其中,|ε1|<2-(d+1)。处理器确定第一位宽部分的倒数过程中产生的误差为其中,|ε2|<2-(d+1)。处理器确定第二位宽部分的过程中,基于f的低位部分fl、f的高位部分fu以及目标尾数X的关系,即在实际场景中,的位宽可能超过n位,而第二位宽部分的位宽为n,则处理器保留的高n位作为第二位宽部分(即fl),会产生误差eRH也即eRH为2-(N+1)。处理器在确定第二位宽部分的过程中产生的误差包括计算乘法过程产生的误差ec以及保留操作中产生的误差eRH
此时,确定的误差err可以表达为|e+eRH|,并且 为使确定的误差err小于1ulp,则有|err|<2-N。结合|err|<2-N,可以得到前述预设条件
基于前面的介绍,目标第一查询参数r1可以为目标尾数X的小数部分中的第一部分(第一部分位宽)。示例性的,目标第一查询参数可以为目标尾数X的小数部分的高nr1位(或者低nr1位),nr1为正整数,且nr1小于或等于目标尾数X的小数部分的全位宽。在一些可能的设计中,所述预设条件可以配置为处理器可以利用目标尾数X的小数部分中除前述第一部分位宽之外的位宽中的高t5位,以及目标尾数X的小数部分中的第一部分(即目标尾数X的小数部分的高nr1位),确定所述第一位宽部分(即fu)和所述第一位宽部分的倒数(即),其中,t5为正整数。可见,处理器利用目标尾数X的小数部分的高nr1+t5位,也即前述t4=nr1+t5。Xt的小数部分的位宽t4的数值为d+2位(即),并且第一位宽部分(即fu)的小数部分的位宽为d+1位,可以实现目标尾数X的平方根与的实数之间的误差小于1ulp(即)。可见,本申请实施例中,处理器可以利用目标尾数X的小数部分的高位的数据计算第一位宽部分(即fu)。其中,目标尾数X的小数部分的高位中的高nr1位可以作为索引位,用于确定目标第一查询参数r1,便于确定第一多项式拟合方程的系数。目标尾数X的小 数部分的高位中除了高nr1位之外的其它位可以作为计算位,用于作为第一多项式拟合方程中的变量值,参与计算第一位宽部分(即fu)。
可以理解的是,为了实现上述方法实施例中步骤(或功能),处理器可以包括执行各个步骤(或功能)相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的模块及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个步骤(或功能)究竟以硬件还是计算机软件驱动硬件的方式来执行,可以取决于技术方案的特定应用场景和设计约束条件。
基于相同发明构思,本申请还提供一种浮点数计算模块。下面结合浮点数计算模块的结构进行介绍。图5根据一示例性实施例示出一种浮点数计算模块。浮点数计算模块可以包括高位计算单元、低位计算单元、精确舍入单元。高位计算单元、低位计算单元、精确舍入单元可以获取目标尾数X的全部或部分位宽。
可选的,浮点数计算模块还可以包括前处理单元。前处理单元可以接收待计算的浮点数Z,并将浮点数Z通过前处理方式,转换为浮点数W。下面结合浮点数表示形式进行介绍。输入到前处理单元的浮点数Z表示形式为Z=S×M0×QEk-指数偏移量,其中,S表征浮点数的符号位,M0表征浮点数的尾数,Q表征浮点数的基数,Ek表征浮点数的指数偏移值,(Ek-指数偏移量)表征浮点数Z的阶码值EZ。指数偏移量为预设的数,且与浮点数Z的类型有关。示例性的,浮点数Z为单精度浮点数,指数偏移量为127;浮点数为双精度浮点数,指数偏移量为1023。
前处理单元可以通过多个信号线接收浮点数Z。每个信号线中,第一电平的信号可以表征“0”,第二电平的信号可以表征“1”。可选的,第一电平可以为高电平,第二电平可以为低电平。或者第一电平可以为低电平,第二电平可以为高电平。一个信号线可以对应浮点数Z的全位宽中的一个位宽。本申请实施例提供的浮点数计算模块中各单元之间的连线表征单元之间的交互,并不表征单元之间的实际连线方式。
前处理单元可以具有对浮点数Z进行前处理的能力。前处理单元可以包括但不限于如下功能,正常值化处理功能、阶码奇偶判定功能、阶码变换处理功能、尾数变换处理功能,实现支持前处理单元对浮点数Z进行前处理。
前处理单元接收的浮点数Z可以为正常值化(normalized)的浮点数,也即规则化的浮点数。前处理单元也可以接收到的浮点数Z也可以为非正常值化的浮点数,也即非规则化(denormalized)的浮点数。
图4中的(a)根据一示例性实施例示出浮点数Z的前处理过程。前处理单元具有对非正常值化的浮点数进行正常值化处理的能力。前处理单元的正常值化处理中,正常值化处理后的浮点数Z可以记为浮点数W,浮点数W可以表征为S×M1×QEW。其中S为符号位、Q为基数、EW为阶码值,其中M1为尾数,并且M1中每一位的数值介于0和基数Q之间,M1的最高位的数字不为零。尾数M1为定点数,其中最高位为整数部分,除最高位之外的部分为小数部分,正常值化后的浮点数的尾数M1的整数部分不为0。
前处理单元的阶码奇偶判定功能中,前处理单元对浮点数W(浮点数Z正常值化处理后的表达格式)的阶码EW进行奇偶判定过程中,前处理单元可以根据阶码EW的最低位为0,反映浮点数W的阶码EW为奇数。浮点数W的阶码EW的最低位为1,可以反映浮点数W的阶码EW为偶数。
一种可能的情形中,如图4中的(b)所示,前处理单元可以根据浮点数W的阶码EW为奇数,对浮点数W的尾数M1进行尾数变换处理,以及对浮点数W的阶码EW进行阶码变换处理。
在浮点数W的阶码EW为奇数的情形中,前处理单元可以根据浮点数W的阶码EW为奇数,对浮点数W的尾数M1进行第一尾数变换处理,得到尾数M2。示例性的,第一尾数变换处理可以为乘Q操作。前处理单元可以根据浮点数W的阶码EW为奇数,对尾数M1进行乘Q操作,得到尾数M2。
示例性的,Q为2。如图4中的(b)所示,尾数M1包括整数部分和小数部分。黑色方框示出整数部分的比特,白色方框示出小数部分的比特。浮点数W的尾数M1的整数部分为1个比特,尾数M1的整数部分为第s1位示出的比特,尾数M1的小数部分为第0位至第v位示出的比特。此时尾数M1的数值范围为[1,2)。
前处理单元对尾数M1进行乘2操作,也即将尾数M1的各比特的向左移一位,得到尾数M2。此时尾数M2的整数部分为2个比特,尾数M2的整数部分为第s1位和第s2位示出的比特,尾数M2的小数部分为第0位至第v位示出的比特。此时尾数M2的数值范围为[2,4)。可以理解的是,尾数M2 的整数部分相比于尾数M1的整数部分额外增添一个比特,补齐IEEE754格式缺省的整数位。在浮点数W的阶码EW为奇数的情形中,目标尾数X为浮点数W的尾数M1的2倍,也即目标尾数X与尾数M2相同。目标尾数X的整数部分可以包括2个比特。此时目标尾数X的数值范围为[2,4)。
上述介绍中,前处理单元对尾数M1执行第一尾数变换处理得到尾数M2,从而得到目标尾数X,用于明晰浮点数W的阶码EW为奇数的情形中,前处理单元由尾数M1得到目标尾数X的过程。在一些应用场景中,前处理单元可以基于预设的尾数变换处理方式,由尾数M1直接得到目标尾数X,并输出目标尾数X。
在浮点数W的阶码EW为奇数的情形下,求解浮点数Z的平方根(也即得到的计算结果)可以转化为计算其中,
浮点数计算模块还可以包括指数处理单元。如图4中的(b)所示,前处理单元可以根据浮点数W的阶码EW为奇数,对浮点数W的阶码EW进行阶码变换处理,将阶码EW的全位宽中的最低位进行减1操作,得到阶码EW-1,阶码EW-1为偶数。前处理单元可以根据浮点数W的阶码EW为奇数,向指数处理单元输出阶码值EW-1,以便于指数处理单元确定浮点数W的平方根的阶码或者指数偏移值。
前处理单元可以根据浮点数W的阶码EW为奇数,向指数处理单元提供阶码值EW-1。前处理单元可以对阶码值EW-1进行移位,如向阶码位宽的低位方向移动一位,得到的阶码为(EW-1),也是的阶码。指数处理单元可以基于的阶码与预设的指数偏移量的总和,计算的指数偏移值并输出,其中的指数偏移值为+指数偏移量。预设的指数偏移量与浮点数Z的类型有关。示例性的,浮点数Z为单精度浮点数,指数偏移量可以为127;浮点数为双精度浮点数,指数偏移量可以为1023。
另一种可能的情形中,浮点数W的阶码EW为偶数的情形中,目标尾数X与浮点数W的尾数M1相同。前处理单元可以输出目标尾数X,且目标尾数与浮点数W的尾数M1相同。如图4中的(c),浮点数W的尾数M1的整数部分为1个比特,尾数M1的整数部分为第s1位示出的比特,尾数M1的小数部分为第0位至第v位示出的比特。此时尾数M1的数值范围为[1,2)。浮点数W的尾数与浮点数Z正常值化后的尾数M1相同,则浮点数W的尾数的数值范围为[1,2)。从而目标尾数X的数值范围为[1,2)。可选的,目标尾数X的整数部分相比于尾数M1的整数部分可以额外增添一个比特,配置为0,补齐IEEE754格式缺省的整数位。如增加第s2位比特,且数值配置为0。这样的操作不改变目标尾数X的数值范围。
在浮点数W的阶码EW为偶数的情形下,求解浮点数Z的平方根(也即得到的计算结果)可以转化为计算其中,前处理单元可以根据浮点数W的阶码EW为偶数,向指数处理单元提供阶码值EW。指数处理单元可以基于的阶码与预设的指数偏移量的总和,计算的指数偏移值并输出,其中的指数偏移值为指数偏移量。预设的指数偏移量与浮点数Z的类型有关。示例性的,浮点数Z为单精度浮点数,指数偏移量可以为127;浮点数为双精度浮点数,指数偏移量可以为1023。
下面对浮点数计算模块确定目标尾数X的平方根f的过程进行介绍。f一般为定点数。若目标尾数X的小数部分的全位宽是p位,则f的全位宽为p+1位,其中f的全位宽中,最高位为第p位,最低位为第0位,其中第p-1位至第0位,为定点数的小数点后的小数部分。
本申请实施例提供的浮点数计算模块,可以分别确定f的高m位的数据(简称为f的高位部分),以及低n位的数据(简称为f的低位部分)。浮点数计算模块可以基于确定出的f的高位部分和低位部分,确定出的计算结果。高m位可指f的全位宽中由最高位向最低位方向上的前m位。低n位可指f的全位宽中由最高位开始向最低位方向上后n位。可选的,f的高m位与f的低n位可以存在重叠的位,如图6中的(a)所示,假设浮点数Z为单精确度浮点数,尾数的全位宽为23位,则f的全位宽为24位,最高位为第23位,最低位为第0位。f的高m位为第23位向第0位方向上的前m位,f的低n位为第23位向第0位方向上的后n位。如图6中的(b)所示,f的高m位可以与f的后n位不存在重叠的位。前述f的高m位和f的低n位中,m和n的数值可以是预先配置的。一些应用场景中,m和n可以根据浮点数Z的类型配置。
本申请实施例中,便于介绍,“A”的高w位,可指由“A”的最高位向最低位方向上的前w个位的数据。“A”的低w位,可指“A”的最高位向最低位方向上的后w个位的数据。
便于介绍,f的高m位用fu表示,f中低n位用fl表示,则X=f2=(fu+fl)2。此时,f的高位部分fu和低位部分fl之间的关系为简化f的计算过程,高位部分fu和低位部分fl之间的关系可以近似为本申请实施例中,f的高位部分fu表征的近似值,也即f的近似值。(fu+fl)可以表征的精确值,也即f的精确值。
前处理单元可以输出目标尾数X,以便于其它单元使用目标尾数X的全位宽或者部分位宽。本申请实施例中,目标尾数X表征浮点数Z前处理后的尾数,下面简称为浮点数Z的目标尾数X。EW表征浮点数Z正常值处理后的阶码,下面简称为浮点数Z的阶码EW。
请继续参见图5,高位计算单元可以与前处理单元连接。高位计算单元可以接收前处理单元输出的目标尾数X的全部或部分位宽。高位计算单元可以接收前处理单元输出的浮点数W的尾数M1的全部或部分位宽,浮点数W的阶码EW的全部或部分位宽。
高位计算单元可以利用多项式近似的方法,基于第一多项式拟合方程、目标尾数X的全部或部分位宽,确定目标尾数X的平方根f的高位部分fu
高位计算单元可以基于浮点数W的尾数M1确定目标第一查询参数r1,高位计算单元可以基于浮点数W的阶码EW确定目标第二查询参数r2。本申请实施例中,目标第一查询参数可以为浮点数W的尾数M1小数部分中的第一部分(第一部分位宽)。示例性的,目标第一查询参数可以为浮点数W的尾数M1小数部分的高nr1位(或者低nr1位),nr1为正整数,且nr1小于或等于浮点数W的尾数M1小数部分的全位宽。所述目标第二查询参数为浮点数W阶码EW的部分位宽,且包括浮点数W的阶码EW的最低位宽。目标第二查询参数r2可以为浮点数W的阶码EW的低nr2位,nr2为正整数,且nr2为小于或等于阶码EW的全部位宽,可见阶码EW的低nr2位包括阶码EW的最低位。可选的,目标第二查询参数r2可以为浮点数W的阶码EW的低1位比特的数据,该数据可以反映出阶码EW为奇数或者偶数。
高位计算单元可以基于目标第一查询参数r1和目标第二查询参数r2,确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数,第一多项式拟合方程的系数可以包括第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。
高位计算单元可以根据第一多项式拟合方程的系数和浮点数W的尾数M1小数部分中全部或部分位宽,计算f的高位部分fu。示例性的,高位计算单元可以根据第一多项式拟合方程的系数,浮点数W的尾数M1小数部分中第二部分(第二部分位宽),计算f的高位部分fu,其中,fu=a1×(X1)2+b1×X1+c1。X1为浮点数W的尾数M1小数部分的第二部分(第二部分位宽),并且所述浮点数W的尾数M1小数部分中的第二部分对应的位宽与所述浮点数W的尾数M1小数部分中第一部分对应的位宽不重叠。可选的,X1为浮点数W的尾数M1小数部分中除前述第一部分的位宽之外其它位宽中的高t1位,t1为正整数。本申请实施例中,高位计算单元利用浮点数W的尾数M1小数部分的第一部分确定多项式拟合方程的系数,利用浮点数W的尾数M1小数部分的第二部分参与多项式拟合方程的计算,确定目标尾数X的平方根的近似解,即f的高位部分fu
在一些应用场景中,目标尾数X小数部分的位宽为N位,高位计算单元可以接收目标尾数X的小数部分除去前述第一部分位宽之外的位宽中高t1位,以及接收目标第一查询参数r1,用于计算f的高位部分fu,其中,前述X1的全位宽为t1位,目标第一查询参数r1的全位宽为nr1位。其中,t1,nr1与N的关系为示例性的,浮点数Z为DP浮点数,前处理后的浮点数Z的目标尾数X小数部分的位宽为N=52位。高位计算单元可以利用目标尾数X的小数部分高29(也即)位,计算f的高位部分fu。其中,目标尾数X的小数部分高29位中的高nr1位可以作为目标第一查询参数r1,其它部分作为前述X1,并且nr1的数值可以灵活配置。可选的,浮点数Z为DP浮点数,f的高位部分fu的全位宽可以为29位。
一种可能的设计中,高位计算单元可以获取或者配置有第一多项式系数查找表。第一多项式系数查找表可以表征多个第一拟合参数组合与多个第一查询参数组合的对应关系。其中,各第一查询参数组合具有对应的第一拟合参数组合。每个第一拟合参数组合可以包括第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。每个第一查询参数组合可以包括第一查询参数和第二查询参数。目标第一查询参数r1和目标第二查询参数r2可以构成目标第一查询参数组合。
高位计算单元可以从第一多项式系数查找表中查找出目标第一查询参数组合对应的第一拟合参数组合,其中,从而实现确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的 系数。
示例性的,浮点数Z为DP浮点数,目标第二查询参数可以为浮点数W(浮点数Z正常值化后的)的阶码EW的低1位(最小位),目标第一查询参数可以为目标尾数X的小数部分的高7位,也即目标尾数X除最高位之外的部分中的高7位,也即目标尾数X的小数部分中第11位至第18位。高位计算单元可以利用8位数据进行查表。可选的,第一多项式系数查找表中可以具有256个表项,分别为256个第一查询参数组合与各第一查询组合对应的第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。
另一种可能的设计中,第一多项式系数查找表可以包括每个第一查询参数对应的多个第一拟合参数组合。一个第一查询参数可以具有在第二查询参数为奇数情形下对应的第一拟合参数组合,以及在第二查询参数为偶数情形下对应的第一拟合参数组合。第一多项式系数查找表可以包括第一奇数查找子表和第一偶数查找子表。其中,第一奇数查找子表中包括在第二查询参数为奇数情形下第一查询参数对应的第一拟合参数组合。第一偶数查找子表中包括在第二查询参数为偶数情形下第一查询参数对应的第一拟合参数组合。
高位计算单元可以根据目标第二查询参数r2为偶数,从第一偶数查找子表中查找目标第一查询参数r1对应的第一拟合参数组合。或者,高位计算单元可以根据目标第二查询参数r2为奇数,从第一奇数查找子表中查找目标第一查询参数r1对应的第一拟合参数组合。从而实现确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数。
示例性的,浮点数Z为DP浮点数,目标第二查询参数可以为浮点数W(浮点数Z正常值化后的)的阶码EW的低1位(最小位),目标第一查询参数可以为目标尾数X的小数部分的高7位,也即目标尾数X除最高位之外的部分中的高7位,也即目标尾数X的小数部分中第11位至第18位。高位计算单元可以利用8位数据进行查表。第一多项式系数查找表可以包括第一奇数查找子表和第一偶数查找子表。第一奇数查找子表可以包括128个表项,各表项表征浮点数W的阶码为奇数情形下,各目标第一查询参数对应的第一多项式拟合方程的系数。类似地,第一偶数查找子表可以包括128个表项,各表项表征浮点数W的阶码为偶数的情形下,各目标第一查询参数对应的第一多项式拟合方程的系数。
高位计算单元可以根据目标第二查询参数r2为奇数,从第一奇数查找子表中查询目标第一查询参数r1对应的第一多项式拟合方程的系数。或者,高位计算单元可以根据目标第二查询参数r2为偶数,从第一偶数查找子表中查询目标第一查询参数r1对应的第一多项式拟合方程的系数。
一种可能的实施方式中,本申请实施例中,预先配置的第一多项式系数查找表,也即第一拟合参数a1、第二拟合参数b1、第三拟合参数c1、以及第一查询参数组合的对应关系中,第一拟合参数a1、第二拟合参数b1、第三拟合参数c1可以存储在同一个第一存储模块中,如图7中的(a)所示。或者,第一拟合参数a1、第二拟合参数b1、第三拟合参数c1可以分别存储在三个第一存储模块中,如图7中的(b)所示。又或者,第一拟合参数a1、第二拟合参数b1、第三拟合参数c1中的任意两种参数存储在同一个第一存储模块中,如图7中的(c)所示,第一拟合参数a1和第二拟合参数b1存储在同一个第一存储模块中,第三拟合参数c1存储在另一个第一存储模块中。其中,第一多项式系数查找表中可以包括预设数量个第一查询参数组合分别对应的第一拟合参数a1、第二拟合参数b1、以及第三拟合参数c1。
低位计算单元可以包括第一高位倒数计算电路和低位运算电路。其中,第一高位倒数计算电路可以与高位计算单元可以并行运行,从而低位计算单元可以与高位计算单元并行运行。
第一高位倒数计算电路可以与前处理单元连接。第一高位倒数计算电路可以接收前处理单元所输出的目标尾数X的全部或部分位宽。第一高位倒数计算电路可以接收前处理单元所输出的阶码EW的全部或部分位宽。
第一高位倒数计算电路可以基于浮点数W的尾数M1确定目标第三查询参数h1,第一高位倒数计算电路可以基于浮点数W的阶码EW确定目标第四查询参数h2。本申请实施例中,所述目标第三查询参数为浮点数W的尾数M1的第三部分(第三部分位宽)。示例性的,目标第三查询参数h1可以指浮点数W的尾数M1小数部分的高nh1位(或者低nh1位),nh1为正整数,且nh1小于或等于浮点数W的尾数M1的小数部分的全位宽。所述目标第四查询参数h2为浮点数W的阶码EW的部分位宽,并且包括浮点数W的阶码EW的最低位宽。示例性的,目标第四查询参数可以指浮点数W的阶码EW的高nh2位(或者低nh2位),h2为正整数,且nh2为小于或等于浮点数W的阶码EW的全部位宽。可选的, 目标第四查询参数h2可以为浮点数W的阶码EW的低1位比特的数据,该数据可以反映出阶码EW为奇数或者偶数。
第一高位倒数计算电路可以基于目标第三查询参数h1和目标第四查询参数h2,确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数,第二多项式拟合方程的系数可以包括第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。
第一高位倒数计算电路可以根据第二多项式拟合方程的系数,浮点数W的小数部分的全部或部分位宽,计算得到f的高位部分fu的倒数示例性的,第一高位倒数计算电路可以根据第二多项式拟合方程的系数,浮点数W的小数部分的第四部分(第四部分位宽),计算得到f的高位部分fu的倒数其中,第一高位倒数计算电路可以输出f的高位部分fu的倒数X2为浮点数W的尾数小数部分的第四部分(第四部分位宽),并且所述浮点数W的小数部分的第四部分对应的位宽与所述浮点数W的小数部分的第三部分对应的位宽不重叠。可选的,X2为浮点数W的小数部分中除前述第三部分位宽之外的位宽中的高t2位,t2为正整数。
在一些应用场景中,目标尾数X小数部分的全位宽为N位,第一高位倒数计算电路可以接收目标第三查询参数h1,以及接收目标尾数X的小数部分中除前述第三部分位宽之外的位宽中的高t2位,用于计算f的高位部分fu的倒数目标第三查询参数h1的全位宽为nh1,X2的全位宽为t2位。其中,nh1,t2与N的关系为示例性的,浮点数Z为DP浮点数,目标尾数X小数部分全位宽为N=52位。第一高位倒数计算电路可以利用目标尾数X的小数部分高29位,计算f的高位部分fu的倒数其中,目标尾数X的小数部分高29位中的高nh1位可以作为目标第三查询参数h1,其它部分作为前述X2,并且nh1的数值可以灵活配置。可选的,浮点数Z为DP浮点数,第一高位倒数计算电路可以输出f的高位部分fu的倒数的全位宽为29位。
一种可能的设计中,第一高位倒数计算电路可以配置有第二多项式系数查找表,第二多项式系数查找表可以表征多个第二拟合参数组合与多个第二查询参数组合的对应关系。其中,各第二查询参数组合具有对应的第二拟合参数组合。每个第二拟合参数组合可以包括第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。第一高位倒数计算电路可以从第二多项式系数查找表中查找出目标第二查询参数组合对应的第二拟合参数组合,其中,目标第二查询参数组合对应的第二拟合参数组合中的拟合参数用于作为目标第二查询参数组合对应的第二多项式拟合方程的系数。
示例性的,浮点数Z为DP浮点数,第四查询参数可以指浮点数W的阶码EW的低1位(最小位),第三查询参数可以指目标尾数X小数部分的高8位。或者,第四查询参数可以指浮点数W的阶码EW的低2位,第三查询参数可以指目标尾数X小数部分的高7位。可见,第一高位倒数计算电路可以利用第三查询参数、第四查询参数,共9位数据进行查表。可选的,第二多项式系数查找表中可以具有29,即512个表项,分别为512个第二查询组合与各第二查询组合对应的第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。
另一种可能的设计中,第二多项式系数查找表可以包括每个第三查询参数对应的多个第二拟合参数组合。一个第三查询参数可以具有在第四查询参数为奇数情形下对应的第二拟合参数组合,以及在第四查询参数为偶数情形下对应的第二拟合参数组合。第二多项式系数查找表可以包括第二奇数查找子表和第二偶数查找子表。其中,第二奇数查找子表中表征第四查询参数为奇数时第三查询参数对应的第二拟合参数组合。第二偶数查找子表中包括第四查询参数为偶数时第三查询参数对应的第二拟合参数组合。
第一高位倒数计算电路可以根据目标第四查询参数h2为偶数,从第二偶数查找子表中查找目标第三查询参数h1对应的第二拟合参数组合。或者,第一高位倒数计算电路可以根据目标第四查询参数h2为奇数,从第二奇数查找子表中查找目标第三查询参数h1对应的第二拟合参数组合。实现第一高位倒数计算电路从第二多项式系数查找表中查找出目标第二查询参数组合对应的第二拟合参数组合,从而实现确定目标第三查询参数h1和目标第四查询参数h2对应的第一多项式拟合方程的系数。
一种可能的实施方式中,预先配置的第二多项式系数查找表,也即第四拟合参数a2、第五拟合参数b2、第六拟合参数c2、以及第二查询参数组合的对应关系中,第二拟合参数a2、第五拟合参数b2、第六拟合参数c2可以存储在同一个第二存储模块中。或者,第二拟合参数a2、第五拟合参数b2、第六拟合参数c2可以分别存储在三个第二存储模块中。又或者,第二拟合参数a2、第五拟合参数b2、第六 拟合参数c2中的任意两种参数存储在同一个第二存储模块中,其它参数存储在另一个第二存储模块中。其中,第二多项式系数查找表中可以包括预设数量个第二查询参数组合分别对应的第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。
低位运算电路可以与第一高位倒数计算电路连接,与前处理单元连接、以及与高位计算单元连接。低位运算电路可以接收到用第一高位倒数计算电路输出f的高位部分fu的倒数(即),前处理单元输出的目标尾数X,以及高位计算单元输出的f的高位部分fu。低位计算单元可以根据f的高位部分fu和低位部分fl之间的关系计算得到f的低位部分fl。低位运算电路可以输出f的低位部分fl,也是实现低位计算单元输出f的低位部分fl
精确舍入单元可以接收到高位计算单元输出的f的高位部分fu,低位计算单元输出的输出f的低位部分fl。精确舍入单元可以支持多种舍入方式的情形中,精确舍入单元可以预先获取到舍入方式配置参数,根据舍入方式配置参数对应的舍入方式,目标尾数X的平方根的计算结果。可选的,RH方式、RP方式、RZ方式可以为IEEE 754中规定的舍入方式。
精确舍入单元可以根据高位计算单元输出的fu,以及接收到低位计算单元输出的fl确定多个待选计算结果。精确舍入单元可以基于预先配置的舍入方式,计算多个待选结果。或者基于获取到的舍入方式配置参数,计算多个待选结果。可选的,多个待选计算结果可以包括如下中的至少两个:第一待选结果f1,第二待选结果f2以及第三待选结果f3。
示例性的,精确舍入单元可以基于RP方式,计算第一待选结果f1和第二待选结果f2。精确舍入单元可以基于RZ方式,计算第一待选结果f1和第三待选结果f3。精确舍入单元可以基于RH方式,计算第一待选结果f1,第二待选结果f2以及第三待选结果f3。
精确舍入单元计算第一待选结果f1的过程进行介绍。根据高位计算单元输出的fu,以及接收到低位计算单元输出的fl,得到第一待选结果f1,可以简记为f1=fu+fl。下面对确定f1的过程进行介绍,也即对fu与fl的加和处理(fu+fl)的含义进行说明。
一种可能的情形中,目标尾数X的平方根f的高m位与目标尾数X的平方根f的低n位不存在重叠位,且m与n的总和等于目标尾数X的平方根f的计算结果的全位宽。精确舍入单元接收到高位计算单元输出的fu,以及接收到低位计算单元输出的fl后,将fu与fl进行拼接处理,可以得到f1。示例性的,如图8中的(a)所示,精确舍入单元得到的f1的高m位与fu相同,f1的低n位与fl相同。
另一种可能的情形中,目标尾数X的平方根f的高m位与目标尾数X的平方根f的低n位存在重叠位,便于介绍将假设目标尾数X的平方根f的高m位中位数最低的q位与目标尾数X的平方根f的低n位中位数最高的q位重叠。例如,图8中的(b)虚线椭圆框示出高m位与低n位重叠位,其中高m位中位数最低的2位,与低n位中位数最高的2位是重叠的。在此情形中,精确舍入单元接收到高位计算单元输出的fu,以及接收到低位计算单元输出的fl后,将fu与fl进行加和处理。示例性的,精确舍入单元得到的第一待选结果f1的低n-q位,与fl的低n-q位相同。fu与fl的高q位加和处理后的结果是精确舍入单元得到的第一待选结果f1的高m+q位。
精确舍入单元可以根据第一待选结果f1与ulp确定(或者得到)第二待选结果f2,其中,f2=f1+ulp。ulp表征目标尾数X的平方根f的全位宽中能够表达的最小的有效的数。如图9中的(a)所示,假设目标尾数X的平方根f的全位宽为24位,最低位为第0位,最高位为第23位,则ulp表征的数为第0位为1,其它位均为0。目标尾数X的小数部分的全位宽为Nt,则ulp为Q-Nt。示例性的,现有计算机执行浮点数计算时浮点数的基数Q一般为2。下面对确定f2的过程进行介绍,也即对f2=f1+ulp的含义进行说明。如图9中的(b)所示,精确舍入单元可以将第一待选结果f1与ulp加和处理,得到第二待选结果f2。精确舍入单元可以根据第一待选结果f1与ulp确定(或者得到)第三待选结果f3,其中f3=f1-ulp。如图9中的(c)所示,精确舍入单元可以将第一待选结果f1与ulp做减法处理,得到第三待选结果f3。
一个示例中,精确舍入单元可以根据fu和fl,计算第一舍入判别参数ie。第一舍入判别参数ie表征(fu+fl)2与X之间的偏差。具体的,ie=(fu+fl)2-X,也即ie=fu 2+fl 2+2×fu×fl-X。其中,X为目标尾数。可选的,浮点数Z为DP浮点数时,t3的数值可以为3。可选的,精确舍入单元可以利用(fu 2+fl 2+2×fu×fl)的低位部分与目标尾数(X)低位部分计算第一舍入判别参数(ie),可以减少电路开销,减小电路占片面积。
精确舍入单元执行RP方式时,可以根据第一舍入判别参数ie大于或等于0,输出第一待选结果f1, 也即的计算结果为第一待选结果f1。或者根据第一舍入判别参数ie小于0,输出第二待选结果f2,也即的计算结果为f2。
精确舍入单元执行RZ方式时,可以根据第一舍入判别参数ie小于或等于0,输出第一待选结果f1,也即的计算结果为第一待选结果f1。或者根据第一舍入判别参数ie大于0,输出第三待选结果f3,也即的计算结果为第三待选结果f3。
另一个示例中,精确舍入单元可以根据第一待选结果f1和第一舍入判别参数ie,确定第二舍入判别参数ien。这样的设计中,可以复用计算第一舍入判别参数ie的电路,减少电路开销,优化电路占片面积。的实数fr与前述f1之间的偏差记为第一距离(f1-fr),前述f3与的真实结果fr之间的偏差记为第二距离(fr-f3)。第二舍入判别参数ien可以表征第一距离的平方与第二距离的平方的偏差。
精确舍入单元可以利用将第一待选结果f1的有效数字的最高位与第一舍入判别参数ie的有效数字的最高位对齐,可以通过ulp与第一待选结果f1相乘实现,第一待选结果f1的有效数字的最高位与第一舍入判别参数ie的有效数字的最高位对齐后的数据可以用ulp×f1表达。精确舍入单元可以将第一舍入判别参数ie与数据ulp×f1做减法处理,得到第二舍入判别参数ien。可见,第二舍入判别参数ien与第一舍入判别参数ie之间的关系可以用ien=ie-ulp×f1表示。
精确舍入单元可以根据第一待选结果f1和第一舍入判别参数ie,确定第三舍入判别参数iep。的实数fr与前述f2之间的偏差记为第三距离(f2-fr),前述f1与的实数fr之间的偏差记为第四距离(fr-f1)。第三舍入判别参数iep可以表征第三距离的平方与第四距离的平方的偏差。精确舍入单元可以将第一舍入判别参数ie与数据ulp×f1做加和处理,得到第三舍入判别参数iep。也即,第三舍入判别参数iep与第一舍入判别参数ie之间的关系可以用iep=ie+ulp×f1表示。
精确舍入单元执行RH方式时,精确舍入单元可以根据第三舍入判别参数iep小于0,输出第二待选结果f2,也即的计算结果为第二待选结果f2。精确舍入单元可以根据第二舍入判别参数ien大于或等于0,输出第三待选结果,也即的计算结果为前述f3。精确舍入单元可以根据第三舍入判别参数iep大于或等于0,或者第二舍入判别参数ien小于0,输出第一待选结果f1,也即的计算结果为前述第一待选结果f1。
本申请实施例中,浮点数计算模块可以输出的浮点数Z的平方根的计算结果其中,浮点数Z的平方根的计算结果中,符号位与浮点数Z的符号位相同,尾数为精确舍入单元输出的的计算结果的小数部分,计算结果的指数偏移值为指数处理单元输出的指数偏移量。若浮点数W(浮点数Z正常值化后的浮点数)的阶码EW为偶数的情形下,指数处理单元可以输出+指数偏移量。若浮点数W(浮点数Z正常值化后的浮点数)的阶码EW为奇数的情形下,指数处理单元可以输出(EW-1)+指数偏移量。
图10示例性的示出浮点数计算模块中部分单元的具体结构示意图。本申请实施例提供的浮点数计算模块中,高位计算单元可以包括第一查表电路、第一平方运算电路以及第一多项式求和电路。
第一查表电路可以接收前处理单元输出的目标第一查询参数r1和目标第二查询参数r2,输出目标第一查询参数r1和目标第二查询参数r2对应的第一拟合参数a1、第二拟合参数b1、第三拟合参数c1。第一查表电路可以具有多种实现方式。
一个示例中,如图10所示,第一查表电路可以与存储有多个第一拟合参数a1、多个第二拟合参数b1、多个第三拟合参数c1的第一存储模块连接。第一查表电路可以与前处理单元连接,第一查表电路可以接收前处理单元输出的目标第一查询参数r1和目标第二查询参数r2。目标第一查询参数可以为浮点数W的尾数M1小数部分中的第一部分(第一部分位宽)。所述目标第二查询参数为浮点数W阶码EW的部分位宽,且包括浮点数W的阶码EW的最低位宽。
可选的,第一多项式系数查找表可以包括第一奇数查找子表和第一偶数查找子表。第一奇数查找子表中表征第二查询参数为奇数时第一查询参数对应的第一拟合参数组合。第一偶数查找子表中包括第二查询参数为偶数时第一查询参数对应的第一拟合参数组合。第一查表电路可以根据目标第二查询参数r2为偶数,从第一偶数查找子表中,查找目标第一查询参数r1对应的第一拟合参数组合。或者,第一查表电路可以根据目标第二查询参数r2为奇数,从第一奇数查找子表中,查找目标第一查询参数r1对应的第一拟合参数组合。实现第一查表电路从第一多项式系数查找表中查找出目标第一查询参数组合对应的第一拟合参数组合,从而实现确定目标第一查询参数r1和目标第二查询参数r2对应的第一多项式 拟合方程的系数。
第一平方运算电路可以与前处理单元连接,可以接收前处理单元输出的浮点数W的尾数M1小数部分中除前述第一部分位宽之外的位宽中的高t1位(也是前述目标尾数X的部分位宽)。第一平方运算电路可以计算浮点数W的尾数M1的第二部分X1的平方(X1)2,并输出X12
第一多项式求和电路可以与第一查表电路连接、与前处理单元连接,以及与第一平方运算电路连接。第一多项式求和电路可以接收到第一查表电路输出的目标第一查询参数r1和目标第二查询参数r2对应的第一多项式拟合方程的系数。第一多项式求和电路可以接收到前处理单元输出的浮点数W的尾数M1的第二部分X1。第一多项式求和电路可以接收到第一平方运算电路输出的(X1)2
第一多项式求和电路可以根据接收到的目标第一查询参数组合对应的第一拟合参数a1、第二拟合参数b1、第三拟合参数c1、浮点数W的尾数M1的第二部分X1,以及目标尾数X的第二部分X1的平方(X1)2,计算得到f的高部分fu,并输出,其中fu=a1×(X1)2+b1×X1+c1。可选的,第一多项式求和电路可以包括乘法器和加法器,或者第一多项式求和电路可以包括乘加器。
本申请实施例提供的浮点数计算模块中,低位计算单元可以包括第一高位倒数计算电路和低位运算电路。其中,第一高位倒数计算电路可以包括第二查表电路、第二平方运算电路以及第二多项式求和电路。
第二查表电路可以接收前处理单元输出的目标第三查询参数h1和目标第四查询参数h2,输出目标第三查询参数h1和目标第四查询参数h2对应的第四拟合参数a2、第五拟合参数b2、第六拟合参数c2。第二查表电路可以具有多种实现方式。
一个示例中,如图10所示,第二查表电路可以与存储有多个第四拟合参数a2、多个第五拟合参数b2、多个第六拟合参数c2的第二存储模块连接。第二查表电路可以与前处理单元连接,第二查表电路可以接收前处理单元输出的目标第三查询参数h1和目标第四查询参数h2。所述目标第三查询参数为浮点数W的尾数M1的第三部分(第三部分位宽)。所述目标第四查询参数h2为浮点数W的阶码EW的部分位宽,并且包括浮点数W的阶码EW的最低位宽。
可选的,第二多项式系数查找表可以包括第二奇数查找子表和第二偶数查找子表。第二奇数查找子表中表征第四查询参数为奇数时第三查询参数对应的第二拟合参数组合。第二偶数查找子表中包括第四查询参数为偶数时第三查询参数对应的第二拟合参数组合。
第二查表电路可以根据目标第四查询参数h2为偶数,从第二偶数查找子表中,查找目标第三查询参数h1对应的第二拟合参数组合。或者,第二查表电路可以根据目标第四查询参数h2为奇数,从第二奇数查找子表中,查找目标第三查询参数h1对应的第二拟合参数组合。实现第二查表电路从第二多项式系数查找表中查找出目标第二查询参数组合对应的第二拟合参数组合,从而实现确定目标第三查询参数h1和目标第四查询参数h2对应的第二多项式拟合方程的系数。
第二平方运算电路可以与前处理单元连接,可以接收前处理单元输出的浮点数W的尾数M1的第四部分X2。可选的,X2为浮点数W的尾数M1的小数部分中除前述第三部分位宽之外的位宽中的高t2位,t2为正整数。第二平方运算电路可以计算浮点数W的尾数M1的第四部分X2的平方(X2)2,并输出(X2)2
第二多项式求和电路可以与第二查表电路连接、与前处理单元连接,以及与第二平方运算电路连接。第二多项式求和电路可以接收到第二查表电路输出的目标第二查询参数组合对应的第二多项式方程的系数。第二多项式求和电路可以接收到前处理单元输出的浮点数W的尾数M1的第四部分X2。
第二多项式求和电路可以接收到第二平方运算电路输出的(X2)2。第二多项式求和电路可以根据接收到的目标第二查询参数组合对应的第四拟合参数a2、第五拟合参数b2、第六拟合参数c2、目标尾数X的第四部分X2,以及目标尾数X的第四部分X2的平方(X2)2,计算得到f的高部分fu的倒数其中可选的,第二多项式求和电路可以包括乘法器和加法器。或者,第二多项式求和电路可以包括乘加器。
本申请实施例提供的浮点数计算模块中,低位计算单元中的低位运算电路可以包括第三平方运算电路、减法器、第一乘法器以及舍入电路。
第三平方运算电路可以与第一多项式求和电路连接。第三平方运算电路可以接收第一多项式求和电 路输出的f的高部分fu。第三平方运算电路可以计算f的高部分fu的平方fu 2,并输出f的高部分fu的平方fu 2
减法器可以与第三平方运算电路连接,以及与前处理单元连接。减法器可以接收第三平方运算电路输出的f的高部分fu的平方fu 2。减法器可以接收前处理单元输出的目标尾数X。减法器可以根据目标尾数X与f的高部分fu的平方fu 2的差值X-fu 2,计算得到并输出。一些可能的应用场景中,低位运算电路中可以采用加法器实现前述减法器的功能,本申请对此不作过多限定。
第一乘法器可以与第二多项式求和电路连接,以及与减法器连接。第一乘法器可以接收减法器输出的第一乘法器可以接收第二多项式求和电路输出的f的高部分fu的倒数第一乘法器可以计算得到
舍入电路可以与第一乘法器连接,舍入电路可以接收第一乘法器输出的并基于对第一乘法器输出的进行四舍五入的操作,得到f的低位部分fl。示例性的,舍入电路可以对第一乘法器输出的的高n+1位与“1”进行加和处理,并保留加和处理的结果高n位,作为f的低位部分fl
本申请实施例提供的浮点数计算模块中,精确舍入单元可以包括舍入判别参数计算电路、待选结果计算电路、计算结果选择电路。
舍入判别参数计算电路可以与第一多项式求和电路连接,与舍入电路连接、以及与第三平方运算电路连接。舍入判别参数计算电路可以接收第一多项式求和电路输出的f的高部分fu。舍入判别参数计算电路可以接收舍入电路输出的f的低位部分fl。舍入判别参数计算电路可以接收第三平方运算电路输出的f的高部分fu的平方fu 2。舍入判别参数计算电路可以具有计算至少一种舍入方式对应的舍入判别参数的能力。
可选的,舍入判别参数计算电路可以根据f的低位部分fl、f的高部分fu、f的高部分fu的平方fu 2、目标尾数X,计算得到前述第一舍入判别参数ie。或者,舍入判别参数计算电路可以根据f的低位部分fl、f的高部分fu、f的高部分fu的平方fu 2,计算得到前述第二舍入判别参数ien以及第三舍入判别参数iep。
本申请实施例中,舍入判别参数计算电路输出第一舍入判别参数ie的符号位,可以表征第一舍入判别参数ie是正负性。换句话说,第一舍入判别参数ie的符号位可以指示ie是正数(ie大于),或者ie是负数(ie小于0),或者,ie等于0。类似地,舍入判别参数计算电路可以输出第二舍入判别参数ien的符号位,用于指示第二舍入判别参数ien的正负性。舍入判别参数计算电路可以输出第三舍入判别参数iep的符号位,用于指示第三舍入判别参数iep的正负性。
待选结果计算电路可以与舍入电路连接,以及与第一多项式求和电路连接。待选结果计算电路可以接收舍入电路输出的f的低位部分fl。待选结果计算电路可以接收f的高部分fu。待选结果计算电路可以根据f的低位部分fl以及f的高部分fu,计算得到多个待选结果,并输出。
可选的,多个待选结果包括第一待选结果f1和第二待选结果f2。其中,第一待选结果f1与f的低位部分fl以及f的高部分fu的关系为f1=fu+fl。第二待选结果f2与f的低位部分fl以及f的高部分fu的关系为f2=f1+ulp,ulp为最小精度单位。或者,多个待选结果包括第一待选结果f1和第三待选结果f3。其中,第三待选结果f3与f的低位部分fl以及f的高部分fu的关系为f3=f1-ulp。又或者,多个待选结果包括第一待选结果f1、第二待选结果f2、以及第三待选结果f3。
一些场景中,精确舍入单元可以支持多种舍入方式的情形中,待选结果计算电路可以输出第一待选结果f1、第二待选结果f2、以及第三待选结果f3。
计算结果选择电路可以与舍入判别参数计算电路连接,以及与待选结果计算电路连接。计算结果选择电路可以接收舍入判别参数计算电路输出的舍入判别参数的符号位。计算结果选择电路可以接收待选结果计算电路输出的多个待选结果。
一种可能的设计中,精确舍入单元可以支持一种舍入方式的情形中,计算结果选择电路可以根据预先配置的舍入方式,根据接收的舍入判别参数,从接收的多个待选结果中选择一个待选结果作为的计算结果,并输出。其中,预先配置的舍入方式可以为RH方式、RP方式、RZ方式中的任意一种。
另一种可能的设计中,精确舍入单元可以支持多种舍入方式的情形中,计算结果选择电路可以接收 到舍入方式配置参数,并根据舍入方式配置参数对应的舍入方式,结合该舍入方式相应的舍入判别参数,从多个待选结果中选择一个待选结果作为目标尾数X的平方根,并输出。便于介绍,本申请实施例中,第一舍入配置参数表征的舍入方式为RP方式。第二舍入配置参数表征的舍入方式为RZ方式。第三舍入配置参数表征的舍入方式为RH方式。
一个示例中,计算结果选择电路接收到的舍入方式配置参数为第一舍入方式配置参数,计算结果选择电路可以根据第一舍入判别参数ie为e大于或等于0,输出第一待选结果f1。计算结果选择电路可以根据第一舍入判别参数ie为小于0,输出第二待选结果f2。
另一个示例中,计算结果选择电路接收到的舍入方式配置参数为第二舍入配置参数,计算结果选择电路可以根据第一舍入判别参数ie小于或等于0,输出第一待选结果f1。计算结果选择电路可以根据第一舍入判别参数ie大于0,输出第二待选结果f2。
又一个示例中,计算结果选择电路接收到的舍入方式配置参数为第三舍入配置参数,计算结果选择电路可以根据第三舍入判别参数iep小于0,输出第二待选结果f2。计算结果选择电路可以根据第二舍入判别参数ien大于或等于0,输出第三待选结果f3。计算结果选择电路可以根据第三舍入判别参数iep大于或等于0,或者第二舍入判别参数ien小于0,输出第一待选结果f1。
图11示例性的示出一种精确舍入单元的具体结构示意图。本申请实施例中,精确舍入单元中,舍入判别参数计算电路可以包括第二乘法器、第一加法器、第二加法器、第三加法器、第四平方运算电路。
第二乘法器可以与第一多项式求和电路连接,与舍入电路连接。第二乘法器可以接收第一多项式求和电路输出的f的高部分fu。第二乘法器可以接收到舍入电路输出的f的低位部分fl。第二乘法器可以根据接收的f的高部分fu与f的低位部分fl进行乘法运算,计算得到第一中间参数k1,其中第一中间参数k1=2×fu×fl,并输出第一中间参数。
第四平方运算电路可以与舍入电路连接。第四平方运算电路可以接收到舍入电路输出的f的低位部分fl。第四平方运算电路可以计算接收到的f的低位部分fl的平方,得到f的低位部分fl的平方fl 2,并输出。
第一加法器可以与前处理单元连接、与第二乘法器连接,与第三平方运算电路连接,以及与第四平方运算电路连接。第一加法器可以接收到前处理单元输出的浮点数Z的尾数的高t3位X3。第一加法器可以接收到第二乘法器输出的第一中间参数k1。第一加法器可以接收到第三平方运算电路输出的f的高部分fu的平方fu 2。第一加法器可以接收到第四平方运算电路输出的f的低位部分fl的平方fl 2
第一加法器可以基于接收到的f的低位部分fl的平方fl 2、第一中间参数k1(其中k1=2×fu×fl)、f的高部分fu的平方fu 2、目标尾数X,计算第一舍入判别参数ie,其中,ie=(fu+fl)2-X,也即ie=fu 2+fl 2+2×fu×fl-X。第一加法器可以输出第一舍入判别参数ie的符号位。
第二加法器可以与前处理单元连接、与第二乘法器连接,与第三平方运算电路连接,以及与第四平方运算电路连接。第二加法器可以接收到前处理单元输出的浮点数Z的尾数的高t3位X3。第一加法器可以接收到第二乘法器输出的第一中间参数k1。第二加法器可以接收到第三平方运算电路输出的f的高部分fu的平方fu 2。第二加法器可以接收到第四平方运算电路输出的f的低位部分fl的平方fl 2
第二加法器可以基于接收到的f的低位部分fl的平方fl 2、第一中间参数k1(其中k1=2×fu×fl)、f的高部分fu的平方fu 2、浮点数Z的尾数的高t3位X3,计算第二舍入判别参数ien,其中,ien=ie-ulp×f1。第二加法器可以输出第二舍入判别参数ien的符号位。
第三加法器可以与前处理单元连接、与第二乘法器连接,与第三平方运算电路连接,以及与第四平方运算电路连接。第三加法器可以接收到前处理单元输出的浮点数Z的尾数的高t3位X3。第三加法器可以接收到第二乘法器输出的第一中间参数k1。第三加法器可以接收到第三平方运算电路输出的f的高部分fu的平方fu 2。第三加法器可以接收到第四平方运算电路输出的f的低位部分fl的平方fl 2
第三加法器可以基于接收到的f的低位部分fl的平方fl 2、第一中间参数k1(其中k1=2×fu×fl)、f的高部分fu的平方fu 2、浮点数Z的尾数的高t3位X3,计算第三舍入判别参数iep,其中,iep=ie+ulp×f1。第三加法器可以输出第三舍入判别参数iep的符号位。
本申请实施例中,精确舍入单元中,待选结果计算电路可以包括第四加法器、第五加法器以及第六加法器。待选结果计算电路可以具有多种实现方式。
一种可能的设计中,第四加法器可以与第一多项式求和电路连接,与舍入电路连接。如图11所示,第四加法器可以接收第一多项式求和电路输出的f的高部分fu。第四加法器可以接收到舍入电路输出的f的低位部分fl。第四加法器可以根据接收的f的高部分fu与f的低位部分fl进行加和运算,计算得到第 一待选结果f1,其中f1=fu+fl,并输出第一待选结果f1。
第五加法器可以与第一多项式求和电路连接,与舍入电路连接。第五加法器可以接收第一多项式求和电路输出的f的高部分fu。第五加法器可以接收到舍入电路输出的f的低位部分fl。第五加法器可以根据接收的f的高部分fu,f的低位部分fl,以及ulp,计算得到第二待选结果f2,其中f2=f1+ulp,f1=fu+fl,并输出第二待选结果f2。
第六加法器可以与第一多项式求和电路连接,与舍入电路连接。第六加法器可以接收第一多项式求和电路输出的f的高部分fu。第六加法器可以接收到舍入电路输出的f的低位部分fl。第六加法器可以根据接收的f的高部分fu,f的低位部分fl,以及ulp进行运算,计算得到第三待选结果f3,其中f3=f1-ulp,并输出第三待选结果f3。
另一种可能的设计中,如图12所示,第四加法器可以与第一多项式求和电路连接,与舍入电路连接。第四加法器可以接收第一多项式求和电路输出的f的高部分fu。第四加法器可以接收到舍入电路输出的f的低位部分fl。第四加法器可以根据接收的f的高部分fu与f的低位部分fl进行加和运算,计算得到第一待选结果f1,其中f1=fu+fl,并输出第一待选结果f1。图12示出的精确舍入单元与图11示出的精确舍入单元的相同之处可以参见图11输出的精确舍入单元的相关介绍,此处不再赘述。
第五加法器可以与第一多项式求和电路连接,与舍入电路连接。第五加法器可以接收第一多项式求和电路输出的f的高部分fu。第五加法器可以接收到舍入电路输出的f的低位部分fl。第五加法器可以根据接收的f的高部分fu,f的低位部分fl,以及ulp进行运算,计算得到第二待选结果f2,其中f2=f1+ulp,并输出第二待选结果f2。
第六加法器可以与第四加法器连接。第六加法器可以接收第四加法器输出的第一待选结果f1。第六加法器可以根据接收的第一待选结果f1以及ulp进行减法运算,计算得到第三待选结果f3,其中f3=f1-ulp,并输出第三待选结果f3。
基于上述任意一种待选结果计算电路,本申请实施例中,精确舍入单元中计算结果选择电路可以与第一加法器连接,与第二加法器连接,与第三加法器连接,与第四加法器连接,与第五加法器连接,与第六加法器连接。计算结果选择电路可以接收第一加法器输出的第一输入判别参数ie的符号位。计算结果选择电路可以接收第二加法器输出的第二舍入判别参数ien的符号位。计算结果选择电路可以接收第三加法器输出的第三舍入判别参数iep的符号位。计算结果选择电路可以接收第四加法器输出的第一待选结果f1。计算结果选择电路可以接收第五加法器输出的第二待选结果f2。计算结果选择电路可以接收第六加法器输出第三待选结果f3。
可选的,计算结果选择电路可以接收舍入方式配置参数。计算结果选择电路输出目标尾数X的平方根的计算结果的过程可以参见前述实施例中的相关介绍,此处不再赘述。
图13根据一示例性实施例示出一种浮点数计算模块。浮点数计算模块可以包括前处理单元、高位计算单元、低位计算单元、精确舍入单元。其中,低位计算单元包括第二高位倒数计算电路和前述低位运行。
下面对第二高位倒数计算电路的连接关系以及工作过程进行介绍。图13示出的浮点数计算模块与图5示出的浮点数计算模块的相同之处不再赘述。
第二高位倒数计算电路可以与前处理单元连接,以及与高位计算单元连接。第二高位倒数计算电路可以接收高位计算单元所输出的f的高位部分fu的全部或部分位宽。可选的,第二高位倒数计算电路可以接收前处理单元输出的浮点数Z正常值化后的阶码EZ的全部或部分位宽。
第二高位倒数计算电路可以基于目标尾数X的平方根f的高位部分fu,确定目标第五查询参数g1。其中,所述目标第五查询参数g1为f的高位部分fu的部分位宽(记为第五部分位宽)。示例性的,目标第五查询参数g1可以为f的高位部分fu的小数部分的高g1位(或者低g1位),g1为正整数,且g1小于或等于f的高位部分fu的小数部分的全位宽。
第二高位倒数计算电路可以基于目标第五查询参数g1,确定目标第五查询参数g1对应的第三多项式拟合方程的系数,第三多项式拟合方程的系数可以包括第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。
第二高位倒数计算电路可以根据第三多项式拟合方程的系数,目标尾数X的平方根f的高位部分fu的部分位宽,计算得到f的高位部分fu的倒数其中,第二高位倒数计算电路可以输出f的高位部分fu的倒数g2为f的高位部分fu小数部分中除前述第五部分位宽之外的位宽中的高g2位,g2为正整数。
一种可能的设计中,第二高位倒数计算电路可以获取或者配置有第三多项式系数查找表。第三多项式系数查找表可以表征多个第三拟合参数组合与多个第三查询参数组合的对应关系。其中,各第三查询参数组合具有对应的第三拟合参数组合。每个第三拟合参数组合可以包括第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。第三查询参数组合可以包括第五查询参数。第二高位倒数计算电路接收的目标第五查询参数g1可以构成目标第三查询参数组合,第二高位倒数计算电路可以从第三多项式系数查找表中查找出目标第三查询参数组合对应的第三拟合参数组合,从而实现目标第五查询参数g1对应的第三多项式拟合方程的系数。
一种可能的实施方式中,类似地,预先配置的第三多项式系数查找表,也即第七拟合参数a3、第八拟合参数b3、第九拟合参数c3、以及第三查询参数组合的对应关系中,第七拟合参数a3、第八拟合参数b3、第九拟合参数c3可以存储在同一个第三存储模块中。或者,第七拟合参数a3、第八拟合参数b3、第九拟合参数c3可以分别存储在三个第三存储模块中。又或者,第七拟合参数a3、第八拟合参数b3、第九拟合参数c3中的任意两种参数存储在同一个第三存储模块中,其它参数存储在另一个第三存储模块中。其中,第三多项式系数查找表中可以包括预设数量个第三查询参数组合分别对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。
低位运算电路可以与第二高位倒数计算电路连接,与前处理单元连接、以及与高位计算单元连接。低位运算电路可以接收到用第二高位倒数计算电路输出f的高位部分fu的倒数(即),前处理单元输出的目标尾数X,以及高位计算单元输出的f的高位部分fu。低位运算电路可以根据f的高位部分fu和低位部分fl之间的关系计算得到f的低位部分fl。低位运算电路可以输出f的低位部分fl,也实现低位计算单元输出f的低位部分fl。精确舍入单元的相关介绍可以参见前述实施例,此处不再赘述。
本申请实施例中,第二高位倒数计算电路利用fu的全部或部分位宽,得到高位部分fu的倒数相比于第一高位倒数计算电路利用目标尾数X的全部或部分位宽,电路规模较小,具有更小的占片面积。
图14示例性的示出浮点数计算模块中部分单元的具体结构示意图。本申请实施例提供的浮点数计算模块中,高位计算单元的具体结构可以参见图10示出的高位计算单元,此处不再赘述。本申请实施例中,低位计算单元中的第二高位倒数计算电路可以包括第三查表电路、第五平方运算电路以及第三多项式求和电路。
第三查表电路可以接收前处理单元输出的目标第五查询参数g1,输出目标第五查询参数g1对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。第三查表电路可以具有多种实现方式。
一个示例中,如图14所示,第三查表电路可以与存储有第七拟合参数a3、第八拟合参数b3、第九拟合参数c3的第三存储模块连接。第三查表电路可以接收高位计算单元输出的目标第五查询参数g1。第三查找电路可以利用目标第三查询参数组合在连接的第三存储模块中查询目标第五查询参数g1对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。第二查找电路可以输出查找到的目标第五查询参数g1对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。
第五平方运算电路可以与高位计算单元连接,可以接收高位计算单元输出的f的高部分fu小数部分的高g2位(也即前述g2)。第五平方运算电路可以计算f的高部分fu小数部分的高g2位的平方(g2)2,并输出(g2)2
第三多项式求和电路可以与第三查表电路连接、与前处理单元连接,以及与第二平方运算电路连接。第三多项式求和电路可以接收到第三查表电路输出的目标第三查询参数组合对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3。第三多项式求和电路可以接收到高位计算单元输出的f的高部分fu小数部分的高g2位g2。第三多项式求和电路可以接收到第五平方运算电路输出的(g2)2。第三多项式求和电路可以包括乘法器和加法器,使得第三多项式求和电路可以根据接收到的目标第三查询参数组合对应的第七拟合参数a3、第八拟合参数b3、第九拟合参数c3、高部分fu小数部分的高g2位g2,以及(g2)2,计算得到f的高部分fu的倒数其中
低位计算单元中低位运算电路的具体结构可以参见图10示出的低位计算单元,低位运算电路可以包括第三平方运算电路、减法器、第一乘法器以及舍入电路。其中,第一乘法器可以与第三多项式求和电路连接,以及与减法器连接。第一乘法器可以接收减法器输出的第一乘法器可以接收第三多项式求和电路输出的f的高部分fu的倒数第一乘法器可以计算得到可选的,低位运算电路中减法器的功能可以通过加法器实现。
精确舍入单元的具体结构可以参见前述任意一个实施例提供的精确舍入单元,此处不再赘述。
图15示例性的示出一种浮点数计算模块。浮点数计算模块可以包括前处理单元、高位计算单元、低位计算单元以及加和处理单元。其中,低位计算单元可以包括前述第一高位倒数计算电路和低位运算电路。可选的,浮点数计算模块还可以包括指数处理单元。本申请实施例中,前处理单元、高位计算单元、低位计算单元、指数处理单元中,各单元的功能可以参见前述任意实施例中的相关介绍,此处不再赘述。
其中,加和处理单元可以与高位计算单元连接,接收高位计算单元所输出的f的高位部分fu。加和处理单元可以与低位计算单元连接,接收低位计算单元所输出的f的低位部分fl。加和处理单元可以对f的高位部分fu和f的低位部分fl进行加和处理,实现确定fu+fl,得到目标尾数X的平方根。加和处理单元可以对f的高位部分fu和f的低位部分fl进行加和处理过程可以参见图8中的相关介绍。可选的,加和处理单元可以包括前述精确舍入单元中的第四加法器,或者本申请实施例中精确舍入单元可以执行加和处理单元的功能。
图16示例性的示出一种浮点数计算模块。浮点数计算模块可以包括前处理单元、高位计算单元、低位计算单元以及加和处理单元。其中,低位计算单元您可以包括前述第二高位倒数计算电路和低位运算电路。可选的,浮点数计算模块还可以包括指数处理单元。本申请实施例中,前处理单元、高位计算单元、低位计算单元、指数处理单元中,各单元的功能可以参见前述任意实施例中的相关介绍,此处不再赘述。
其中,加和处理单元可以与高位计算单元连接,接收高位计算单元所输出的f的高位部分fu。加和处理单元可以与低位计算单元连接,接收低位计算单元所输出的f的低位部分fl。加和处理单元可以对f的高位部分fu和f的低位部分fl进行加和处理,实现确定fu+fl,得到目标尾数X的平方根。加和处理单元可以对f的高位部分fu和f的低位部分fl进行加和处理过程可以参见图8中的相关介绍。可选的,加和处理单元可以包括前述精确舍入单元中的第四加法器,或者本申请实施例中精确舍入单元可以执行加和处理单元的功能。
可以理解的是,为了实现上述方法实施例中功能,处理器或者计算器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的模块及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (30)

  1. 一种浮点数平方根计算方法,应用于处理器或者计算器,其特征在于,包括:
    接收浮点数计算指令,所述指令携带待计算浮点数(Z);
    获取目标尾数(X),所述目标尾数(X)包括第一浮点数(W)的尾数,所述第一浮点数(W)为正常值化的浮点数,所述第一浮点数(W)的数值与所述待计算浮点数(Z)的数值相同;
    根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu),所述第一位宽部分(fu)包含所述目标尾数(X)的平方根的最高位;
    基于第一关系,所述第一位宽部分(fu)和所述目标尾数(X)的全部或部分位宽,计算所述目标尾数(X)的平方根的第二位宽部分(fl),其中,所述第一关系表征所述目标尾数(X)的平方根的第一位宽部分(fu)、所述目标尾数(X)以及所述目标尾数(X)的平方根的第二位宽部分(fl)之间的关系;
    基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,并将所述目标尾数(X)的平方根的小数部分确定为所述待计算浮点数(Z)的平方根的尾数。
  2. 如权利要求1所述的方法,其特征在于,
    若所述第一浮点数(W)的阶码为偶数,所述目标尾数(X)与所述第一浮点数(W)的尾数相同;
    若所述第一浮点数(W)的阶码为奇数,所述目标尾数(X)为所述第一浮点数(W)的尾数的Q倍,其中Q为浮点数的基数,Q为正数,且Q为偶数。
  3. 如权利要求1或2所述的方法,其特征在于,所述第一关系符合如下关系:
    其中,X为所述目标尾数,fu为所述第一位宽部分,fl为所述第二位宽部分。
  4. 如权利要求1-3任一所述的方法,其特征在于,所述第二位宽部分(fl)包含所述目标尾数(X)的平方根的部分位宽,且包含所述目标尾数(X)的平方根的最低位,其中,所述第一位宽部分(fu)的位宽长度与所述第二位宽部分(fl)的位宽长度的总和大于或等于所述目标尾数(X)的平方根的全位宽长度。
  5. 如权利要求2-4任一所述的方法,其特征在于,所述根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu),包括:
    基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数,其中,所述目标第一查询参数(r1)为所述第一浮点数(W)的尾数的第一部分,所述目标第二查询参数(r2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽;
    根据所述第一多项式拟合方程的系数和所述第一浮点数(W)的尾数的第二部分,计算所述第一位宽部分(fu),所述第一浮点数(W)的尾数的第二部分对应的位宽与所述第一浮点数(W)的尾数的第一部分对应的位宽不重叠。
  6. 如权利要求5所述的方法,其特征在于,基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数,包括:
    若所述目标第二查询参数(r2)为奇数,从第一奇数查找子表中查询所述目标第一查询参数(r1)对应的第一多项拟合方程的系数,其中所述第一奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系;
    若所述目标第二查询参数(r2)为偶数,从第一偶数查找子表中查询所述目标第一查询参数(r1)对应的第一多项式拟合方程的系数,其中所述第一偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系。
  7. 如权利要求1-6任一所述的方法,其特征在于,所述基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,包括:
    根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定两个待选结果;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第一舍入判别参数(ie),其中,所述第一舍入判别参数(ie)表征第一数值与所述目标尾数(X)之间的偏差,所述第一数值为所述目标尾数(X)的平方根的平方;
    根据所述第一舍入判别参数(ie)与预设数值的比较结果,从所述两个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
  8. 如权利要求7所述的方法,其特征在于,所述第一舍入判别参数(ie)采用如下公式计算:
    ie=fu 2+fl 2+2×fu×fl-X
    其中,ie为所述第一舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数。
  9. 如权利要求1-7任一所述的方法,其特征在于,所述基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,包括:
    根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果,所述多个待选结果包括第一待选结果、第二待选结果以及第三待选结果,其中,所述第二待选结果大于所述第一待选结果,所述第一待选结果大于所述第三待选结果;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第二舍入判别参数(ien),其中,所述第二舍入判别参数(ien)表征第一距离的平方与第二距离的平方之间的偏差,所述第一距离为所述第一待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第二距离表征所述目标尾数(X)的平方根的实数与所述第三待选结果之间的距离;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第三舍入判别参数(iep),其中,所述第三舍入判别参数(iep)表征第三距离的平方与第四距离的平方之间的偏差,所述第三距离为所述第二待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第四距离表征所述目标尾数(X)的平方根的实数与所述第一待选结果之间的距离;
    根据所述第二舍入判别参数(ien)与预设数值的比较结果,以及所述第三舍入判别参数(iep)与预设数值的比较结果,从所述多个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
  10. 如权利要求9所述的方法,其特征在于,所述第二待选结果与所述第一待选结果的差值小于或等于一个最小精度单位;所述第一待选结果与所述第三待选结果的差值小于或等于一个最小精度单位。
  11. 如权利要求9或10所述的方法,其特征在于,所述第二舍入判别参数(ien)采用如下公式计算:
    ien=fu 2+fl 2+2×fu×fl-X-ulp×(fu+fl)
    其中,ien为所述第二舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数,ulp为最小精度单位。
  12. 如权利要求9-11任一所述的方法,其特征在于,所述第三舍入判别参数(iep)采用如下公式计算:
    iep=fu 2+fl 2+2×fu×fl-X+ulp×(fu+fl)
    其中,iep为所述第三舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数,ulp为最小精度单位。
  13. 如权利要求1-6任一所述的方法,其特征在于,所述基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X)的平方根,包括:
    对所述第一位宽部分(fu)和所述第二位宽部分(fl)加和处理,将加和处理后的结果确定为所述目标尾数(X)的平方根。
  14. 一种浮点数计算模块,其特征在于,所述浮点数计算模块用于接收浮点数计算指令,所述指令携带待计算浮点数(Z);获取目标尾数(X),所述目标尾数(X)包括第一浮点数(W)的尾数,所述第一浮点数(W)为正常值化的浮点数,所述第一浮点数(W)的数值与所述待计算浮点数(Z)的数值相同;
    所述浮点数计算模块包括:
    高位计算单元,用于根据所述目标尾数(X)的全部或部分位宽,确定所述目标尾数(X)的平方根的第一位宽部分(fu),所述第一位宽部分(fu)包含所述目标尾数(X)的平方根的最高位;
    低位计算单元,用于基于第一关系,所述第一位宽部分(fu)和所述目标尾数(X)的全部或部分位宽,计算所述目标尾数(X)的平方根的第二位宽部分(fl),其中,所述第一关系表征所述目标尾数(X)的平方根的第一位宽部分(fu)、所述目标尾数(X)以及所述目标尾数(X)的平方根的第二位宽部分(fl)之间的关系;
    精确舍入单元,用于基于所述第一位宽部分(fu)和所述第二位宽部分(fl),确定所述目标尾数(X) 的平方根,并将所述目标尾数(X)的平方根的小数部分确定为所述待计算浮点数(Z)的平方根的尾数。
  15. 如权利要求14所述的浮点数计算模块,其特征在于,
    若所述第一浮点数(W)的阶码为偶数,所述目标尾数(X)与所述第一浮点数(W)的尾数相同;
    若所述第一浮点数(W)的阶码为奇数,所述目标尾数(X)为所述第一浮点数(W)的尾数的Q倍,其中Q为浮点数的基数,Q为正数,且Q为偶数。
  16. 如权利要求14或15所述的浮点数计算模块,其特征在于,所述第一关系符合如下关系:
    其中,X为所述目标尾数,fu为所述第一位宽部分,fl为所述第二位宽部分。
  17. 如权利要求14-16任一所述的浮点数计算模块,其特征在于,所述第二位宽部分(fl)包含所述目标尾数(X)的平方根的部分位宽,且包含所述目标尾数(X)的平方根的最低位,其中,所述第一位宽部分(fu)的位宽长度与所述第二位宽部分(fl)的位宽长度的总和大于或等于所述目标尾数(X)的平方根的全位宽长度。
  18. 如权利要求15-17任一所述的浮点数计算模块,其特征在于,所述高位计算单元,具体用于:
    基于目标第一查询参数(r1)、目标第二查询参数(r2),确定预设的第一多项式拟合方程的系数,其中,所述目标第一查询参数(r1)为所述第一浮点数(W)的尾数的第一部分,所述目标第二查询参数(r2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽;
    根据所述第一多项式拟合方程的系数和所述第一浮点数(W)的尾数的第二部分,计算所述第一位宽部分(fu),所述第一浮点数(W)的尾数的第二部分对应的位宽与所述第一浮点数(W)的尾数的第一部分对应的位宽不重叠。
  19. 如权利要求18所述的浮点数计算模块,其特征在于,所述高位计算单元,具体用于:
    若所述目标第二查询参数(r2)为奇数,从第一奇数查找子表中查询所述目标第一查询参数(r1)对应的第一多项拟合方程的系数,其中所述第一奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系;
    若所述目标第二查询参数(r2)为偶数,从第一偶数查找子表中查询所述目标第一查询参数(r1)对应的第一多项式拟合方程的系数,其中所述第一偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第一查询参数与第一多项式拟合方程的系数对应关系。
  20. 如权利要求14-19任一所述的浮点数计算模块,其特征在于,所述精确舍入单元,具体用于:
    根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定两个待选结果;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第一舍入判别参数(ie),其中,所述第一舍入判别参数(ie)表征第一数值与所述目标尾数(X)之间的偏差,所述第一数值为所述目标尾数(X)的平方根的平方;
    根据所述第一舍入判别参数(ie)与预设数值的比较结果,从所述两个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
  21. 如权利要求20所述的浮点数计算模块,其特征在于,所述第一舍入判别参数(ie)采用如下公式计算:
    ie=fu 2+fl 2+2×fu×fl-X
    其中,ie为所述第一舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数。
  22. 如权利要求14-19任一所述的浮点数计算模块,其特征在于,所述精确舍入单元,具体用于:
    根据所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果,所述多个待选结果包括第一待选结果、第二待选结果以及第三待选结果,其中,所述第二待选结果大于所述第一待选结果,所述第一待选结果大于所述第三待选结果;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第二舍入判别参数(ien),其中,所述第二舍入判别参数(ien)表征第一距离的平方与第二距离的平方之间的偏差,所述第一距离为所述第一待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第二距离表征所述目标尾数(X)的平方根的实数与所述第三待选结果之间的距离;
    基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X)的部分位宽,计算第三舍入判别参数(iep),其中,所述第三舍入判别参数(iep)表征第三距离的平方与第四距离的平方之间的偏差,所述第三距离为所述第二待选结果与所述目标尾数(X)的平方根的实数之间的距离,所述第四距离表征所述目标尾数(X)的平方根的实数与所述第一待选结果之间的距离;
    根据所述第二舍入判别参数(ien)与预设数值的比较结果,以及所述第三舍入判别参数(iep)与预设数值的比较结果,从所述多个待选结果中选择一个待选结果确定为所述目标尾数(X)的平方根。
  23. 如权利要求22所述的浮点数计算模块,其特征在于,所述第二待选结果与所述第一待选结果的差值小于或等于一个最小精度单位;所述第一待选结果与所述第三待选结果的差值小于或等于一个最小精度单位。
  24. 如权利要求22或23所述的浮点数计算模块,其特征在于,所述第二舍入判别参数(ien)采用如下公式计算:
    ien=fu 2+fl 2+2×fu×fl-X-ulp×(fu+fl)
    其中,ien为所述第二舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数,ulp为最小精度单位。
  25. 如权利要求22-24任一所述的浮点数计算模块,其特征在于,所述第三舍入判别参数(iep)采用如下公式计算:
    iep=fu 2+fl 2+2×fu×fl-X+ulp×(fu+fl)
    其中,iep为所述第三舍入判别参数,fu为所述第一位宽部分,fl为所述第二位宽部分,X为所述目标尾数,ulp为最小精度单位。
  26. 如权利要求14-25任一所述的浮点数计算模块,其特征在于,精确舍入单元,具体用于:
    对所述第一位宽部分(fu)和所述第二位宽部分(fl)加和处理,将加和处理后的结果确定为所述目标尾数(X)的平方根。
  27. 如权利要求14-26任一所述的浮点数计算模块,其特征在于,所述精确舍入单元,具体用于:
    获取舍入方式配置参数;
    按照所述舍入方式配置参数对应的舍入方式,所述第一位宽部分(fu)、所述第二位宽部分(fl),确定多个待选结果;
    以及按照所述舍入方式配置参数对应的舍入方式,基于所述第一位宽部分(fu)、所述第二位宽部分(fl)、以及所述目标尾数(X),计算舍入判别参数;
    并基于所述判舍入判别参数与预设数值的比较结果,从所述多个待选结果选择一个待选结果作为所述目标尾数(X)的平方根。
  28. 如权利要求14-27任一所述的浮点数计算模块,其特征在于,所述第一关系包括所述第一位宽部分(fu)的倒数;所述低位计算单元,还用于:
    基于目标第三查询参数(h1)、目标第四查询参数(h2),确定预设的第二多项式拟合方程的系数,其中,所述目标第三查询参数(h1)为所述第一浮点数(W)的尾数的第三部分,所述目标第四查询参数(h2)为所述第一浮点数(W)的阶码的部分位宽,且包括所述第一浮点数(W)的阶码的最低位宽;
    根据所述第二多项式拟合方程的系数和所述第一浮点数(W)的尾数的第四部分,确定所述第一位宽部分(fu)的倒数,所述第一浮点数(W)的尾数的第三部分对应的位宽与所述第一浮点数(W)的尾数的第四部分对应的位宽不重叠。
  29. 如权利要求28所述的浮点数计算模块,其特征在于,所述低位计算单元,具体用于:
    若所述目标第四查询参数(h2)为奇数,从第二奇数查找子表中查询所述目标第三查询参数(h1)对应的第二多项拟合方程的系数,其中所述第二奇数查找子表包括所述第一浮点数(W)的阶码为奇数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系;
    若所述目标第四查询参数(h2)为偶数,从第二偶数查找子表中查询所述目标第三查询参数(h1)对应的第二多项式拟合方程的系数,其中所述第二偶数查找子表包括所述第一浮点数(W)的阶码为偶数情形下多个第三查询参数与第二多项式拟合方程的系数对应关系。
  30. 如权利要求14-27任一所述的浮点数计算模块,其特征在于,所述第一关系包括所述第一位宽部分(fu)的倒数;所述低位计算单元,还用于:
    基于目标第五查询参数(g1),确定预设的第三多项式拟合方程的系数,其中,所述目标第五查询 参数(g1)为所述第一位宽部分(fu)的第五部分;
    根据所述第三多项式拟合方程的系数和所述第一位宽部分(fu)的第六部分,确定所述第一位宽部分(fu)的倒数,所述第一位宽部分(fu)的第五部分对应的位宽与所述第一位宽部分(fu)的第六部分对应的位宽不重叠。
PCT/CN2023/104073 2022-10-13 2023-06-29 一种浮点数平方根计算方法及浮点数计算模块 WO2024078033A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211250294.4 2022-10-13
CN202211250294.4A CN117932200A (zh) 2022-10-13 2022-10-13 一种浮点数平方根计算方法及浮点数计算模块

Publications (1)

Publication Number Publication Date
WO2024078033A1 true WO2024078033A1 (zh) 2024-04-18

Family

ID=90668661

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104073 WO2024078033A1 (zh) 2022-10-13 2023-06-29 一种浮点数平方根计算方法及浮点数计算模块

Country Status (2)

Country Link
CN (1) CN117932200A (zh)
WO (1) WO2024078033A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598197A (zh) * 2015-01-26 2015-05-06 中国科学院自动化研究所 一种浮点倒数和/或平方根倒数运算方法及其装置
CN109901813A (zh) * 2019-03-27 2019-06-18 苏州中晟宏芯信息科技有限公司 一种浮点运算装置及方法
CN111796795A (zh) * 2019-04-04 2020-10-20 英特尔公司 用于执行单精度浮点扩展数学运算的机制

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598197A (zh) * 2015-01-26 2015-05-06 中国科学院自动化研究所 一种浮点倒数和/或平方根倒数运算方法及其装置
CN109901813A (zh) * 2019-03-27 2019-06-18 苏州中晟宏芯信息科技有限公司 一种浮点运算装置及方法
CN111796795A (zh) * 2019-04-04 2020-10-20 英特尔公司 用于执行单精度浮点扩展数学运算的机制

Also Published As

Publication number Publication date
CN117932200A (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
US8489663B2 (en) Decimal floating-point adder with leading zero anticipation
US8694572B2 (en) Decimal floating-point fused multiply-add unit
WO2021147395A1 (zh) 算数逻辑单元、浮点数乘法计算的方法及设备
JP2622896B2 (ja) 除算装置
US5671170A (en) Method and apparatus for correctly rounding results of division and square root computations
US8751555B2 (en) Rounding unit for decimal floating-point division
US8788561B2 (en) Arithmetic circuit, arithmetic processing apparatus and method of controlling arithmetic circuit
US8166092B2 (en) Arithmetic device for performing division or square root operation of floating point number and arithmetic method therefor
Wahba et al. Area efficient and fast combined binary/decimal floating point fused multiply add unit
Erle et al. Decimal floating-point multiplication via carry-save addition
Bohlender et al. Semantics for exact floating point operations.
US5132925A (en) Radix-16 divider using overlapped quotient bit selection and concurrent quotient rounding and correction
JP3313560B2 (ja) 浮動小数点演算処理装置
US8019805B1 (en) Apparatus and method for multiple pass extended precision floating point multiplication
US20110131262A1 (en) Floating point divider and information processing apparatus using the same
CN117075841B (zh) Srt运算电路
WO2024078033A1 (zh) 一种浮点数平方根计算方法及浮点数计算模块
CN116450085A (zh) 一种可扩展的BFloat16点乘运算器及微处理器
US6598065B1 (en) Method for achieving correctly rounded quotients in algorithms based on fused multiply-accumulate without requiring the intermediate calculation of a correctly rounded reciprocal
JP6919539B2 (ja) 演算処理装置および演算処理装置の制御方法
KR19980082906A (ko) 부동 소수점 숫자의 정수형으로의 변환 방법
Sasidharan et al. VHDL Implementation of IEEE 754 floating point unit
US8041927B2 (en) Processor apparatus and method of processing multiple data by single instructions
CN114077419A (zh) 用于处理浮点数的方法和系统
US8185723B2 (en) Method and apparatus to extract integer and fractional components from floating-point data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876242

Country of ref document: EP

Kind code of ref document: A1